2021-01-31 12:00:00
Llama Rust SDK preview 0.1.3
The last time I released a preview of Llama's Rust SDK (around 8 months ago) the blog entry was filled with caveats about its limitations. Most of those still apply, but there has been significant progress. The most notable thing is that I now have partial support for the Rust std library.
(Background and terminology: Llama is my project focused on using "other languages" with .NET. Its core component is a compiler that translates LLVM bitcode to .NET CIL. SourceGear.Rust.NET.Sdk (aka the Llama Rust SDK) is an MSBuild project SDK that integrates Rust into the regular .NET build process.)
SourceGear.Rust.NET.Sdk version 0.1.3 is available on nuget.org now. This blog entry is a walkthrough of using it to build an actual Rust program.
Reminder: Llama is at the "proof of concept" stage, and is not production ready.
Prerequisites
Llama currently requires .NET 5.0 on one of the following platforms:
Windows (x64)
MacOS (x64)
Ubuntu 18.04 (x64)
And you'll need Rust nightly, plus its `rust-src` component. For this walkthrough, I am using a specific nightly that I know works.
rustup toolchain install nightly-2020-10-13 rustup component add rust-src --toolchain nightly-2020-10-13
ruplacer
For this walkthrough, the demo project is "ruplacer", a utility to find and replace text in multiple files. I found it on GitHub:
https://github.com/TankerHQ/ruplacer
This program has been a handy test case for Llama. It has some dependencies, but not too many. As Llama becomes more capable, I look for slightly-more-difficult projects to build.
If you have Rust installed, you should find it simple to clone the repo and build it with the regular Rust tools:
git clone https://github.com/TankerHQ/ruplacer cd ruplacer cargo build
Both bin and lib
The first challenge with ruplacer is that it builds both a library crate and a binary crate, because src/lib.rs
and src/main.rs
are both present. This is a common practice with Rust, and is perfectly acceptable to Cargo, but it's not really a great fit for how MSBuild works. For example, when working with C#, one csproj file results in building one .NET assembly. Building two assemblies from one project file is not really a thing.
So before we get involved with .NET, let's separate things.
The ruplacer Cargo.toml file looks like this:
[package] name = "ruplacer" version = "0.4.3" authors = ["Dimitri Merejkowsky"] description = "Find and replace text in source files" license = "BSD-3-Clause" readme = "README.md" edition = "2018" keywords = ["ruplacer", "find", "grep", "command", "line"] categories = ["command-line-utilities"] repository = "https://github.com/TankerHQ/ruplacer" [package.metadata.deb] extended-description = "Find and replace text in source files" [dependencies] difference = "2.0" ignore = "0.4" structopt = "0.2" colored = "1.6" regex = "1" isatty = "0.1" Inflector = "0.11" anyhow = "1.0.32" [dev-dependencies] tempdir = "0.3"
What I'm going to do is create a separate "ruexe" crate as a peer of ruplacer.
mkdir ../ruexe mkdir ../ruexe/src cp ./src/main.rs ../ruexe/src cp ./Cargo.toml ../ruexe
Inside ruexe/Cargo.toml
, we need to change the package name from ruplacer to ruexe, and add a dependency for the library:
[dependencies.ruplacer] path = "../ruplacer"
So now the ruplacer executable has its own build directory, but it references the original ruplacer crate, unchanged, as a library.
You should be able to cargo build
this and run it. For example, here I try to replace "pointer" with "terrier" in my copy of the Mono.Cecil tree:
cargo run pointer terrier ../cecil (blah blah blah) Would perform 160 replacements on 5 matching files Re-run ruplacer with --go to write these changes to the filesystem
It's nice that ruplacer's default is to preview what changes would be made instead of making any actual changes to the files.
Now with Llama
To use Rust with .NET we want it to be like any other .NET language. Instead of a Cargo.toml file, we want an .rsproj, which would be an SDK-style project file. Like modern .csproj, except for Rust. Providing this experience is basically what the Llama Rust SDK does.
The content of ruexe.rsproj
looks like this:
<Project Sdk="SourceGear.Rust.NET.Sdk/0.1.3"> <PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>net5.0</TargetFramework> <RustToolChain>+nightly-2020-10-13</RustToolChain> </PropertyGroup> <ItemGroup> <RustReference Include="..\ruplacer" Name="ruplacer" /> <RustCrateReference Include="difference" Version="2.0" /> <RustCrateReference Include="ignore" Version="0.4" /> <RustCrateReference Include="structopt" Version="0.2" /> <RustCrateReference Include="colored" Version="1.6" /> <RustCrateReference Include="regex" Version="1" /> <RustCrateReference Include="isatty" Version="0.1" /> <RustCrateReference Include="Inflector" Version="0.11" /> <RustCrateReference Include="anyhow" Version="1.0.32" /> </ItemGroup> </Project>
This is basically a translation of the Cargo.toml to MSBuild.
The
Project
element at the top specifies that we want to use SourceGear.Rust.NET.Sdk as the SDK for this project. The dotnet build system will retrieve the SDK package from nuget.org.The
PropertyGroup
should look familiar. It's mostly the same as it would be for C#. TheOutputType
specifies that we are building an Exe, and theTargetFramework
says we are targeting .NET 5. The one new piece here isRustToolChain
, which I am using to set the specific nightly for this walkthrough.The
ItemGroup
below that is to specify the references, corresponding to thedependencies
section fromCargo.toml
. In this case, theRustReference
is like aProjectReference
. It references a Rust crate in source form with a path.The
RustCrateReference
element is likePackageReference
. It specifies a reference that will be obtained from crates.io, which is the Rust world's equivalent of nuget.org.
Before we try to build this, we need to make one change to the code itself. Currently, the Llama compiler can't find Rust main()
because its name is mangled. At some point I will figure out a better solution for this problem, but for now, let's tweak the signature of main()
just a bit.
In src/main.rs
, change:
fn main() -> Result<()> {
to:
#[no_mangle] pub extern "C" fn main() -> Result<()> {
So now we should be able to build this .rsproj just like any other .NET project:
dotnet build
And running our .NET build of ruplacer should give the same results as the one built the regular way.
dotnet run --no-build pointer terrier ../cecil (blah blah blah) Would perform 160 replacements on 5 matching files Re-run ruplacer with --go to write these changes to the filesystem
Voila.
How does this work?
In the big picture, the Llama Rust SDK does two things:
Use the regular Rust tools to build, except generate an LLVM bitcode file instead of the usual platform-specific outputs
Run that bitcode file through the Llama compiler to create a .NET assembly
At a more detailed level, there are quite a few steps here. Each step is done inside the obj
directory, and you can look in there to see what has happened.
To get the regular Rust tooling to generate a bitcode file, we need a custom target. The Rust compiler supports custom targets in JSON, and you can see the one Llama uses at obj/aarch64-sourcegear-windows.json
. It looks like this:
{ "arch": "aarch64", "data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", "dynamic-linking": true, "dll-prefix" : "", "dll-suffix" : ".bc", "only-cdylib": true, "no-builtins": true, "allow_asm": false, "requires-lto": true, "executables": false, "is-builtin": false, "linker": "dotnet", "pre-link-args": { "gcc" : [ "C:/Users/eric/.nuget/packages/sourcegear.rust.net.sdk/0.1.3/build/../tools/rsfakelink/rsfakelink.dll" ] }, "linker-flavor": "gcc", "linker-is-gnu": false, "llvm-target": "aarch64-pc-windows-msvc", "max-atomic-width": 64, "obj-is-bitcode": true, "os": "windows", "panic-strategy" : "abort", "target-c-int-width": "32", "target-endian": "little", "target-family": "windows", "target-pointer-width": "64", "vendor": "sourcegear" }
Things to note here:
The
arch
andllvm-target
settings are telling LLVM to generate bitcode for aarch64 (aka arm64). But in practice this doesn't matter very much, because we're stopping the LLVM build process at the bitcode step instead of going all the way to native CPU-specific output. Very often I find that Llama works fine whether LLVM is targeting aarch64 or x86_64 or even riscv64.With the
llvm-target
andos
andtarget-family
settings, we are claiming that we are building for Windows. Note that this does not result in a Windows-specific build. This is just the technique I use to avoid porting std. By telling Rust that the target is Windows, std will use Win32 functions for things like file IO. But Llama doesn't actually connect those calls to the Windows-specific KERNEL32.DLL. Rather, Llama includes a library called sgwin32, which is an implementation of the Win32 API on top of .NET 5.0 BCL.The
requires-lto
andobj-is-bitcode
settings are what specifies that the target should generate bitcode instead of CPU-specific code. The dll-suffix value is part of this as well.The
linker
andpre-link-args
settings specify a custom linker to be used. The custom linker is called "rsfakelink", and it is "fake" because it doesn't really do any linking.
We use this custom target to build a sysroot, and then we use that sysroot to build the project itself, including all of its dependencies.
The build directory for the sysroot is in obj/sr
, and the Cargo.toml
file we generated there looks like this:
[package] authors = ["The Rust Project Developers"] name = "sysroot" version = "0.0.0" [dependencies.std] path = "C:/Users/eric/.rustup/toolchains/nightly-x86_64-pc-windows-msvc/lib/rustlib/src/rust/library/std" [patch.crates-io.rustc-std-workspace-alloc] path = "C:/Users/eric/.rustup/toolchains/nightly-x86_64-pc-windows-msvc/lib/rustlib/src/rust/library/rustc-std-workspace-alloc" [patch.crates-io.rustc-std-workspace-core] path = "C:/Users/eric/.rustup/toolchains/nightly-x86_64-pc-windows-msvc/lib/rustlib/src/rust/library/rustc-std-workspace-core" [patch.crates-io.rustc-std-workspace-std] path = "C:/Users/eric/.rustup/toolchains/nightly-x86_64-pc-windows-msvc/lib/rustlib/src/rust/library/rustc-std-workspace-std" [dependencies.compiler_builtins] features = ['mem']
The paths are obtained from rustc --print sysroot
and will be different on your system. They reference the source code for std, which is why the rust-src component is required as a prerequisite.
The command line to build std looks like this (with some line wrapping because it's so wide):
cargo +nightly-2020-10-13 build --release --manifest-path sr/Cargo.toml --target ./aarch64-sourcegear-windows.json --verbose -p std
After the build is done, we copy the results from obj/sr/target
into obj/sysroot
with the layout that rustc expects. Now we can use that sysroot to build the project itself.
That build directory is in obj/rs
. We generated that Cargo.toml
file as well:
[package] name = "ruexe" version = "1.0.0" edition = "2018" autobins = false autoexamples = false autotests = false autobenches = false [lib] crate-type = ["cdylib"] path = "../../src/main.rs" [dependencies.ruplacer] path = "C:/Users/eric/dev/ruplacer" [dependencies.difference] version = "2.0" [dependencies.ignore] version = "0.4" [dependencies.structopt] version = "0.2" [dependencies.colored] version = "1.6" [dependencies.regex] version = "1" [dependencies.isatty] version = "0.1" [dependencies.Inflector] version = "0.11" [dependencies.anyhow] version = "1.0.32"
This is the content of ruexe.rsproj
rewritten into Cargo.toml
form. Yes, that's right, this walkthrough started with a Cargo file, and we translated it to an MSBuild rsproj, and then the SDK converted it back to Cargo.
The command line to build this is (again with some line wrapping):
cargo +nightly-2020-10-13 -vv build --verbose --release --manifest-path C:\Users\eric\dev\ruexe\obj\rs\Cargo.toml --target ./aarch64-sourcegear-windows.json
But where is the sysroot specified? Well that part is a bit dorky, as it requires specifying the --sysroot
argument in an environment variable called RUSTFLAGS
.
BTW, much of what the Llama Rust SDK does is similar to xargo, and I learned a lot about how to do such things by studying the xargo code:
https://github.com/japaric/xargo
The result of all this is the bitcode file: obj/rs/target/aarch64-sourcegear-windows/release/ruexe.bc
So we run that through the Llama compiler and put the resulting assembly in bin/Debug/net5.0
.
$ ls -l bin/Debug/net5.0/*.dll -rwxr-xr-x 1 eric 197609 13735936 Jan 31 08:22 bin/Debug/net5.0/ruexe.dll* -rwxr-xr-x 1 eric 197609 33280 Jan 30 23:27 bin/Debug/net5.0/sgwin32.dll*
Note that the SDK has automatically added a reference to sgwin32.dll
. That's the "implementation of Win32 on .NET" library that I mentioned earlier.
(Tangent: I should probably do a separate blog entry on sgwin32. The library is far from complete, but has gradually become quite capable. It implements enough of Win32 to support a Llama-compiled SQLite that can pass the Entity Framework Core test suite on both Windows and Linux. OTOH, it's not a panacea. I also have sgposix, and in some cases that alternative works better.)
Bottom line, there is a lot going on "under the hood", and there are a lot of improvements yet to be made, but I am generally happy with how the Llama Rust SDK can provide an experience that is so similar to csproj.