Home About Eric Topics SourceGear

2021-01-31 12:00:00

Llama Rust SDK preview 0.1.3

The last time I released a preview of Llama's Rust SDK (around 8 months ago) the blog entry was filled with caveats about its limitations. Most of those still apply, but there has been significant progress. The most notable thing is that I now have partial support for the Rust std library.

(Background and terminology: Llama is my project focused on using "other languages" with .NET. Its core component is a compiler that translates LLVM bitcode to .NET CIL. SourceGear.Rust.NET.Sdk (aka the Llama Rust SDK) is an MSBuild project SDK that integrates Rust into the regular .NET build process.)

SourceGear.Rust.NET.Sdk version 0.1.3 is available on nuget.org now. This blog entry is a walkthrough of using it to build an actual Rust program.

Reminder: Llama is at the "proof of concept" stage, and is not production ready.

Prerequisites

Llama currently requires .NET 5.0 on one of the following platforms:

And you'll need Rust nightly, plus its `rust-src` component. For this walkthrough, I am using a specific nightly that I know works.

rustup toolchain install nightly-2020-10-13
rustup component add rust-src --toolchain nightly-2020-10-13

ruplacer

For this walkthrough, the demo project is "ruplacer", a utility to find and replace text in multiple files. I found it on GitHub:

https://github.com/TankerHQ/ruplacer

This program has been a handy test case for Llama. It has some dependencies, but not too many. As Llama becomes more capable, I look for slightly-more-difficult projects to build.

If you have Rust installed, you should find it simple to clone the repo and build it with the regular Rust tools:

git clone https://github.com/TankerHQ/ruplacer
cd ruplacer
cargo build

Both bin and lib

The first challenge with ruplacer is that it builds both a library crate and a binary crate, because src/lib.rs and src/main.rs are both present. This is a common practice with Rust, and is perfectly acceptable to Cargo, but it's not really a great fit for how MSBuild works. For example, when working with C#, one csproj file results in building one .NET assembly. Building two assemblies from one project file is not really a thing.

So before we get involved with .NET, let's separate things.

The ruplacer Cargo.toml file looks like this:

[package]
name = "ruplacer"
version = "0.4.3"
authors = ["Dimitri Merejkowsky "]
description = "Find and replace text in source files"
license = "BSD-3-Clause"
readme = "README.md"
edition = "2018"
keywords = ["ruplacer", "find", "grep", "command", "line"]
categories = ["command-line-utilities"]
repository = "https://github.com/TankerHQ/ruplacer"

[package.metadata.deb]
extended-description = "Find and replace text in source files"

[dependencies]
difference = "2.0"
ignore = "0.4"
structopt = "0.2"
colored = "1.6"
regex = "1"
isatty = "0.1"
Inflector = "0.11"
anyhow = "1.0.32"


[dev-dependencies]
tempdir = "0.3"

What I'm going to do is create a separate "ruexe" crate as a peer of ruplacer.

mkdir ../ruexe
mkdir ../ruexe/src
cp ./src/main.rs ../ruexe/src
cp ./Cargo.toml ../ruexe

Inside ruexe/Cargo.toml, we need to change the package name from ruplacer to ruexe, and add a dependency for the library:

[dependencies.ruplacer]
path = "../ruplacer"

So now the ruplacer executable has its own build directory, but it references the original ruplacer crate, unchanged, as a library.

You should be able to cargo build this and run it. For example, here I try to replace "pointer" with "terrier" in my copy of the Mono.Cecil tree:

cargo run pointer terrier ../cecil

(blah blah blah)

Would perform 160 replacements on 5 matching files
Re-run ruplacer with --go to write these changes to the filesystem

It's nice that ruplacer's default is to preview what changes would be made instead of making any actual changes to the files.

Now with Llama

To use Rust with .NET we want it to be like any other .NET language. Instead of a Cargo.toml file, we want an .rsproj, which would be an SDK-style project file. Like modern .csproj, except for Rust. Providing this experience is basically what the Llama Rust SDK does.

The content of ruexe.rsproj looks like this:

<Project Sdk="SourceGear.Rust.NET.Sdk/0.1.3">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net5.0</TargetFramework>
    <RustToolChain>+nightly-2020-10-13</RustToolChain>
  </PropertyGroup>

  <ItemGroup>
    <RustReference Include="..\ruplacer" Name="ruplacer" />

    <RustCrateReference Include="difference" Version="2.0" />
    <RustCrateReference Include="ignore" Version="0.4" />
    <RustCrateReference Include="structopt" Version="0.2" />
    <RustCrateReference Include="colored" Version="1.6" />
    <RustCrateReference Include="regex" Version="1" />
    <RustCrateReference Include="isatty" Version="0.1" />
    <RustCrateReference Include="Inflector" Version="0.11" />
    <RustCrateReference Include="anyhow" Version="1.0.32" />
  </ItemGroup>

</Project>

This is basically a translation of the Cargo.toml to MSBuild.

Before we try to build this, we need to make one change to the code itself. Currently, the Llama compiler can't find Rust main() because its name is mangled. At some point I will figure out a better solution for this problem, but for now, let's tweak the signature of main() just a bit.

In src/main.rs, change:

fn main() -> Result<()> {

to:

#[no_mangle]
pub extern "C" fn main() -> Result<()> {

So now we should be able to build this .rsproj just like any other .NET project:

dotnet build

And running our .NET build of ruplacer should give the same results as the one built the regular way.

dotnet run --no-build pointer terrier ../cecil

(blah blah blah)

Would perform 160 replacements on 5 matching files
Re-run ruplacer with --go to write these changes to the filesystem

Voila.

How does this work?

In the big picture, the Llama Rust SDK does two things:

  1. Use the regular Rust tools to build, except generate an LLVM bitcode file instead of the usual platform-specific outputs

  2. Run that bitcode file through the Llama compiler to create a .NET assembly

At a more detailed level, there are quite a few steps here. Each step is done inside the obj directory, and you can look in there to see what has happened.

To get the regular Rust tooling to generate a bitcode file, we need a custom target. The Rust compiler supports custom targets in JSON, and you can see the one Llama uses at obj/aarch64-sourcegear-windows.json. It looks like this:

{
"arch": "aarch64",
"data-layout": "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
"dynamic-linking": true,
"dll-prefix" : "",
"dll-suffix" : ".bc",
"only-cdylib": true,
"no-builtins": true,
"allow_asm": false,
"requires-lto": true,
"executables": false,
"is-builtin": false,
"linker": "dotnet",
"pre-link-args": { "gcc" : [ "C:/Users/eric/.nuget/packages/sourcegear.rust.net.sdk/0.1.3/build/../tools/rsfakelink/rsfakelink.dll" ] },
"linker-flavor": "gcc",
"linker-is-gnu": false,
"llvm-target": "aarch64-pc-windows-msvc",
"max-atomic-width": 64,
"obj-is-bitcode": true,
"os": "windows",
"panic-strategy" : "abort",
"target-c-int-width": "32",
"target-endian": "little",
"target-family": "windows",
"target-pointer-width": "64",
"vendor": "sourcegear"
}

Things to note here:

We use this custom target to build a sysroot, and then we use that sysroot to build the project itself, including all of its dependencies.

The build directory for the sysroot is in obj/sr, and the Cargo.toml file we generated there looks like this:

[package]
authors = ["The Rust Project Developers"]
name = "sysroot"
version = "0.0.0"
[dependencies.std]
path = "C:/Users/eric/.rustup/toolchains/nightly-x86_64-pc-windows-msvc/lib/rustlib/src/rust/library/std"
[patch.crates-io.rustc-std-workspace-alloc]
path = "C:/Users/eric/.rustup/toolchains/nightly-x86_64-pc-windows-msvc/lib/rustlib/src/rust/library/rustc-std-workspace-alloc"
[patch.crates-io.rustc-std-workspace-core]
path = "C:/Users/eric/.rustup/toolchains/nightly-x86_64-pc-windows-msvc/lib/rustlib/src/rust/library/rustc-std-workspace-core"
[patch.crates-io.rustc-std-workspace-std]
path = "C:/Users/eric/.rustup/toolchains/nightly-x86_64-pc-windows-msvc/lib/rustlib/src/rust/library/rustc-std-workspace-std"
[dependencies.compiler_builtins]
features = ['mem']

The paths are obtained from rustc --print sysroot and will be different on your system. They reference the source code for std, which is why the rust-src component is required as a prerequisite.

The command line to build std looks like this (with some line wrapping because it's so wide):

cargo +nightly-2020-10-13 build --release 
    --manifest-path sr/Cargo.toml 
    --target ./aarch64-sourcegear-windows.json 
    --verbose -p std

After the build is done, we copy the results from obj/sr/target into obj/sysroot with the layout that rustc expects. Now we can use that sysroot to build the project itself. That build directory is in obj/rs. We generated that Cargo.toml file as well:

[package]
name = "ruexe"
version = "1.0.0"
edition = "2018"
autobins = false
autoexamples = false
autotests = false
autobenches = false

[lib]
crate-type = ["cdylib"]
path = "../../src/main.rs"

[dependencies.ruplacer]
path = "C:/Users/eric/dev/ruplacer"

[dependencies.difference]
version = "2.0"

[dependencies.ignore]
version = "0.4"

[dependencies.structopt]
version = "0.2"

[dependencies.colored]
version = "1.6"

[dependencies.regex]
version = "1"

[dependencies.isatty]
version = "0.1"

[dependencies.Inflector]
version = "0.11"

[dependencies.anyhow]
version = "1.0.32"

This is the content of ruexe.rsproj rewritten into Cargo.toml form. Yes, that's right, this walkthrough started with a Cargo file, and we translated it to an MSBuild rsproj, and then the SDK converted it back to Cargo.

The command line to build this is (again with some line wrapping):

cargo +nightly-2020-10-13 -vv build --verbose --release 
    --manifest-path C:\Users\eric\dev\ruexe\obj\rs\Cargo.toml 
    --target ./aarch64-sourcegear-windows.json

But where is the sysroot specified? Well that part is a bit dorky, as it requires specifying the --sysroot argument in an environment variable called RUSTFLAGS.

BTW, much of what the Llama Rust SDK does is similar to xargo, and I learned a lot about how to do such things by studying the xargo code:

https://github.com/japaric/xargo

The result of all this is the bitcode file: obj/rs/target/aarch64-sourcegear-windows/release/ruexe.bc

So we run that through the Llama compiler and put the resulting assembly in bin/Debug/net5.0.

$ ls -l bin/Debug/net5.0/*.dll
-rwxr-xr-x 1 eric 197609 13735936 Jan 31 08:22 bin/Debug/net5.0/ruexe.dll*
-rwxr-xr-x 1 eric 197609    33280 Jan 30 23:27 bin/Debug/net5.0/sgwin32.dll*

Note that the SDK has automatically added a reference to sgwin32.dll. That's the "implementation of Win32 on .NET" library that I mentioned earlier.

(Tangent: I should probably do a separate blog entry on sgwin32. The library is far from complete, but has gradually become quite capable. It implements enough of Win32 to support a Llama-compiled SQLite that can pass the Entity Framework Core test suite on both Windows and Linux. OTOH, it's not a panacea. I also have sgposix, and in some cases that alternative works better.)

Bottom line, there is a lot going on "under the hood", and there are a lot of improvements yet to be made, but I am generally happy with how the Llama Rust SDK can provide an experience that is so similar to csproj.