2020-03-10 12:00:00

My exploration of Rust and .NET

If this were a twelve-step meeting

ME: "Hi, my name is Eric, and I compulsively write code that hardly anybody else wants."

YOU (all together): "Hi Eric"

TLDR

I have been working on stuff that facilitates .NET development using Rust. My progress has reached a point where it looks like this could maybe become a real thing, but I'm not sure what to do with it. So I have decided to let you listen in while I ramble a bit.

Sometimes I write blog posts in a format I might describe as "pretending that the reader is asking me questions". This is one of those.

What is it?

The project I am discussing here, which does not yet have a name, consists of two main parts:

A "compiler" that takes LLVM bitcode from rustc and converts it into a .NET assembly
A tool to generate Rust bindings for other .NET assemblies so that Rust code can call existing .NET libraries

So the result is that Rust code can call .NET code, and vice versa.

Here's a Rust function which takes a string literal of digits, converts it to a .NET string, and then calls System.Int32.TryParse() on it:

fn int_tryparse_out_parm() -> bool {
    let s = "30579";
    let s_clr = System::Text::Encoding::UTF8().GetString_1(s.as_bytes());
    let mut result = 0;
    let b = System::Int32::TryParse_2(&s_clr, &mut result);
    return b && (result == 30579);
}

Is this open source? Can I try it?

Sorry, not at this time. This project is currently taking place in a private repo until I decide if or how to take it forward.

What are all these terms?

Yeah, since this blog entry might be read by people who know .NET or Rust but not both, let's define some things.

In .NET, an "assembly" is code in its compiled form. A shared library. The file suffix is .DLL, regardless of whether the OS platform is Windows or Linux or Mac or whatever.
In Rust, the build tool is called "cargo". It manages dependencies on both local projects and external packages.
In Rust, a package is a "crate". The main package registry is https://crates.io
In .NET, a package is a "NuGet package". The main package registry is https://nuget.org
The .NET system is built around a core runtime called the Common Language Runtime (CLR). The low level language is called Common Intermediate Language (CIL), or MSIL, or just IL.
Rust is compiled to native code through LLVM, a huge and popular set of compiler tools and libraries. (LLVM was originally created right here in the town where I live, at the University of Illinois at Urbana-Champaign.)
In Rust, referencing a part of a string or array is called a "slice". In .NET, this is called a "Span". Each side has memory safety rules about how such things can be used.

If .NET developers want interop with Rust, can't they just use P/Invoke?

Yes, well, that is the approach a sane person would use, isn't it?

What I'm exploring here is an alternative which could offer deeper integration between the Rust and .NET worlds.

How deep could that Rust/.NET integration go?

A lot deeper than P/Invoke, but probably not as deep as F#.

For those of you who use F#, you know how it constantly feels like you're straddling two worlds? The F# and C# worlds are both visible, and rather different.

Seq is an alias for IEnumerable.
For async, you can use the C# flavor or the F# flavor.
There are the F# collections, but you still have access to the stuff in System.Collections.Generic.
F# types like records and discriminated unions are not exposed to C#.
Naming conventions for identifiers are different

Rust in this project feels kinda like that, but the dissonance between Rust and .NET is even greater. For example, Rust has no garbage collector. Rust types like struct and enum are not are not exposed for use by other .NET stuff.

So even though P/Invoke is not involved, there is still a boundary between the Rust world and the world that C# sees.

How does the boundary between Rust and CLR work?

From the Rust perspective, the boundary is FFI ("extern"). Any Rust function which needs to be called by non-Rust CLR code should be "extern". The generated bindings for Rust to call other .NET stuff are in the form of extern functions. Just as with normal Rust, the only things that can cross that FFI boundary are simple types.

From the CLR perspective, interop with Rust/CLR code requires some of the same things as interop with native code through P/Invoke. Even when compiled for the CLR, Rust memory is "unmanaged" memory. The need to Marshal still applies. The binding generator handles a lot of this.

What currently works? What doesn't?

Nothing in this project is exhaustively tested. When I say something "works", I am speaking of a quality level such as "proof of concept". I have a small but growing test suite.

Converting LLVM bitcode to CIL generally works. I still find a bug every so often, and I expect to find more.
The binding generator can handle most of the surface area of the .NET class libraries.
Reference objects are wrapped in a struct and the methods for that class are placed in an impl block for that struct.
From the Rust point of view, things depending on the libcore and liballoc layers are working.
But Rust libstd is not implemented. The .NET bindings will become the basis for a .NET implementation of libstd. In other words, Rust file I/O (for example) would be implemented by calling .NET file I/O instead of libc.
Using Rust crates seems to work if they support no_std. For example, serde works.
Operator overloading in a C# type gets implemented on the Rust side using the corresponding traits.
The binding generator does not currently have a way to present a generic type as a Rust generic type.
The binding generator currently skips generic methods that have their own type parameters. I haven't figured out what to do about these yet.
.NET Task objects are wrapped on the Rust side like any other. I am hoping it will be possible to integrate Task with Rust futures and await, but I haven't looked closely at that yet.

Just how cool is this project?

I was hoping you would ask that, even though it's like asking a Mom how cute her baby is...

Here's something pretty cool: The binding generator has stuff to convert a Rust closure into a C# delegate.

fn mul_delegate() -> bool {
    let d = System::Func_i32_i32_i32_::create(|a,b| a * b);
    let x = d.Invoke_2(3,4);
    return x == 3 * 4;
}

Now THAT is one seriously cute baby.

(Yeah, I know, you're distracted by the ugliness of System.Func<int,int,int> becoming System::Func_i32_i32_i32. Just nod and be polite, okay?)

Is this project related to that Wasm stuff you did?

Kinda. Not really.

My wasm2cil work is on GitHub here:

https://github.com/ericsink/wasm2cil

There have been some copy-and-paste episodes between wasm2cil and this project, but the two are generally quite different, because WebAssembly and LLVM bitcode are quite different.

What does the implementation of this project look like?

The compiler and the binding generator are both written in F#, but the resulting assemblies have no F# dependencies.

Assembly generation is currently done using Mono.Cecil. I keep looking at System.Reflection.Metadata and thinking about it.

If this project has a future, it should eventually be re-written in Rust. Any decent compiler should be able to self-compile.

How does the CIL generation work?

First, generate LLVM bitcode using "cargo xbuild" with a custom target and a special linker.

Then, use LLVMSharp to read that .bc file, and Mono.Cecil to write it out as a .NET assembly.

The conversion of bitcode to CIL is where most of the magic happens.

Wait, I thought LLVM bitcode was non-portable?

It is. The way I am using bitcode is "a pathway to many abilities, some considered to be unnatural".

I do really wish that LLVM bitcode was available in a well-specified portable subset of some kind. A little googling has revealed that others have wanted this too, but the notion has apparently never gained any real traction.

Wasm would be a great alternative if it supported (a) 64 bits, and (b) optional use of non-sandboxed memory. For (a), I assume that Wasm64 will eventually happen, but I see no reason why they would do (b). Wasm's reason for existence is safety, not to provide the compiler world a portable assembly language.

So I find myself using LLVM bitcode and trying to deal with the places where arch-specific stuff creeps in.

Strangely, even though everybody says that LLVM bitcode is not portable, it's actually kinda close. For example, my CIL code generator doesn't actually care whether the Rust custom target arch is x64 or ARM64 -- it works fine either way.

I'm confused -- how can you compile a low-level language for the CLR?

Recall that CLR stands for COMMON Language Runtime. The CLR actually has good support for lower-level languages, even though none of the mainstream CLR languages actually make much use of those features.

Would this bc-to-CIL approach work for C/C++ too?

In theory, yes.

In more practical terms, confining my efforts to Rust is making things much simpler. C/C++ has several decades more baggage.

Isn't the Rust memory model really different from C#?

Yep. But here again, the CLR offers garbage collection, but it doesn't force you to use it. Nothing prevents us from using manual memory management.

Rust code compiled for the CLR still has the same memory model as any other Rust code.

How does Rust liballoc work?

Currently the allocator I give to Rust is implemented by simply calling AllocHGlobal and AllocHFree. I threw this in as a temporary solution, figuring I would need to replace it with a real malloc of some kind. Strangely, it isn't as slow as I expected.

Overall, how is the performance of this project?

I've done no real benchmarks, but roughly speaking, it appears the performance of "Rust CLR" is in the same ballpark as "Rust native".

I've seen some cases where "Rust CLR" actually seems to be faster, but I haven't investigated thoroughly. It makes sense that the JIT would win at least some of the time.

How do you deal with Rust's lack of function overloading?

That issue has been quite a problem. Naming things is hard.

You may have already noticed in the Rust code snippets above that names are a bit weird. For example, how do I generate a binding for System.Console.WriteLine(), which has multiple overloads?

fn write_string(s : &String) {
    let s_clr = System::Text::Encoding::UTF8().GetString_1(s.as_bytes());
    System::Console::WriteLine_1(&s_clr);
}

Under the hood, the binding generator gives every method an ugly-but-unique name like "void__WriteLine__1__String". At a higher layer, Rust traits are used to provide friendlier names to the developer.

But how to deal with those names? There are several approaches to this kind of problem, and all of them have tradeoffs. My current path is to implement all of them, and then experiment with actual usage to see how they work out.

For example, Rust can simulate function overloading using traits if number of parameters does not vary. So one of the techniques I'm taking for a test drive is to add a suffix _N for the number of params. This is the approach shown in the snippet above, where WriteLine has taken on the name WriteLine_1, because it has one parameter.

Another approach is to use tuples, but the extra set of parenthesis is ugly.

Of course, for all the methods that have only one overload, I could just keep the name. But if an overload gets added later, it will break.

There are lots of things here that still need to be figured out.

Wait, isn't .NET just about C#?

It is true that here in 2020, C# is by far the dominant language for .NET.

But the architecture was originally designed to support multiple languages. After all, the core component is called the COMMON Language Runtime.

What other .NET languages are there?

In the beginning, there was C# and VB.NET.

In hindsight, I suspect the multi-language architecture of .NET was primarily driven by Microsoft's desire to avoid leaving their Visual Basic developers behind. VB.NET is still around, and it still has many developers using it. But I assume (without evidence) that the VB segment is not growing, and probably shrinking.

The other .NET language often mentioned is F#, which might be described as .NET variant of OCaml. FWIW, I describe myself as an F# fan, although I also like and use C# a greal deal.

According to The .NET Language Strategy (Mads Torgersen, 2017), when counting developers using these languages, they speak of "millions" of people using C#, and "hundreds of thousands" for VB.NET, and "tens of thousands" for F#:

https://devblogs.microsoft.com/dotnet/the-net-language-strategy/

I don't know which CLR language is in fourth place, but it's probably a DISTANT fourth place.

It is worth noting that Microsoft has one more .NET language that is rarely discussed, and wasn't even mentioned in that Torgersen blog entry: C++/CLI. I'll stay with tradition and not talk about it.

Bottom line, if you assume that .NET and C# are synonymous, you will annoy F# and VB.NET fans, but sadly, you will be correct enough for most contexts.

Have their been any others?

Lots of them. Wikipedia has a whole page:

https://en.wikipedia.org/wiki/List_of_CLI_languages

My comments on selected cases:

I have never used Nemerle, but I always thought it looked interesting. Apparently the developers have been hired by JetBrains.
RemObjects has Oxygene. This is a Pascal-ish language for .NET, somewhat like the spiritual descendant of Delphi. It seems to be actively developed, but I cannot recall ever hearing this company or their products mentioned out in the wild.
Years ago, the LLVM project had a backend which could generate CIL. It apparently died of neglect and was removed.
PeachPie is an interesting case. It's a PHP compiler for .NET.

So there's some interesting stuff here, but the bottom line is: Every CLR language not named C# has lived a constant struggle for viability.

So then why in the world are you doing this?

Recall that the first thing I did in this blog entry was admit to a compulsion.

If I attempt to explain my actions beyond that, a couple of things come to mind.

First of all, when I look at the kind of leadership coming from Microsoft right now, I see an emphasis on building bridges between .NET and other ecosystems. Because of .NET Core, this is a new era for .NET, and it seems reasonable to wonder how things in the future might go very differently than they have in the past.

I also wonder about the possibility that Rust might be a special case.

How might Rust be "a special case"?

Rust is simply a masterpiece. It is an amazing achievement of programming language design and implementation. There is nothing else like it.

I believe that the popularity of Rust will continue to grow rapidly. And let me immediately say that I might be wrong. The best technology doesn't always win.

But Rust is showing very strong signs of momentum. For example, as of January 2020, Rust has been Stack Overflow's "most loved language" four years in a row:

https://stackoverflow.blog/2020/01/20/what-is-rust-and-why-is-it-so-popular/

So my claim that Rust could become hugely popular may not be correct, but it's not the craziest thing I've ever said.

What if Rust explodes in popularity over the next 5-10 years? It seems likely to me that if Rust becomes really big, there could be a significant number of .NET people interested in using it.

What is the goal of this blog entry?

One great thing about the Internet is its ability to connect people with niche interests. Recently I was in my kitchen making coffee and found myself wondering if Captain America could beat a lion barehanded. Sure enough, people on the Internet have been discussing that important topic for many years.

With over 7 billion people on the planet, I'm probably not the only one interested in the possibilities of developing for .NET using Rust.

I mean, there's gotta be at least two more of you. If that darn Covid-19 scare weren't cancelling all our conferences, the three of us could get together for coffee and chat.

As we are all currently trapped in our homes hoarding hand sanitizer, we'll need to figure something else out. But it would still be nice to know who you are.

How will you take this project forward?

I honestly don't know yet. There are lots of variables in the equation. I could make this open source. I could create a niche proprietary commercial product. And there are many shades of gray in between.

Regardless of how it goes forward, if this is to become more than my fun side project, there are questions about sustainability.