2019-07-02 15:00:00

Using Span for high performance interop with unmanaged libraries

There's so much great performance-related stuff happening in the .NET world right now.

A major character in this story is Span, which gives us a way to reduce heap allocations and copying. Before we had Span, a common operation like Substring() resulted in an allocation and a copy.

In this blog post I want to talk about using Span in the specific case of interop with unmanaged libraries that deal in zero-terminated strings (aka null-terminated strings).

Layer 0: The unmanaged function

Let's talk about this in layers. The bottom layer is the unmanaged function itself, written in C or something that pretends to be C. The kind of functions I'm talking about are ones that have a return value or a parameter which is a zero-terminated string. Such things are quite common. Here are a few familiar examples:

int puts(const char* p);

FILE *fopen(const char *path, const char *mode);

int sqlite3_open(const char* path, sqlite3** ppdb);

char *asctime(const struct tm *tm);

With a zero-terminated string, all we have is a pointer. To find the length, something is going to start at that pointer and walk through memory until it finds a zero byte.

(I am aware of the various (security | correctness | etc) problems associated with zero-terminated strings. Nonetheless, there are many libraries that use them. For the purpose of this blog post, we assume the unmanaged function is something we can't change. It is what it is, and we have to deal with it.)

And before we go any further, let's mention the challenges the arising from the the differing encodings of the string. In .NET, strings are encoded in UTF-16, where each character is 2 bytes, and perhaps more. Any C library that expects a string terminated with a zero byte is not using UTF-16. If it's an old-school library, it might be expecting ASCII. The more modern alternative is UTF-8.

In any case, to use a .NET string with a zero-terminated string, we will have to do a conversion. That conversion is another form of a copy, and for performance reasons, we want to keep those to a minimum.

Layer 1: The P/Invoke definition

The next layer up is the P/Invoke definition. This is what allows us to access the C function from .NET.

A typical P/Invoke looks something like this:

// int abs(int x);
[DllImport("msvcrt")]
public static extern int abs(int x);

This definition says, "in a dynamic library named 'msvcrt', look for a function named abs, with a return type of int and one parameter of type int.

Of course, a pointer to a zero-terminated string is a bit more complicated than an int. How should we describe that pointer in our P/Invoke definition? We actually have some choices here, so let's discuss the tradeoffs.

Alternative 1a: string

One thing we could do is define the P/Invoke to accept a regular .NET string:

[DllImport("msvcrt")]
public static extern int puts(string s);

This approach offers convenience. And it accomplishes what we probably need to do at some layer anyway, which is start with a .NET string, convert it to UTF-8, get a pointer for it, and pass it to the C code.

Well actually: The default marshaling for a string is zero-terminated, but with ANSI encoding. To get UTF-8, we would have to do this:

[DllImport("msvcrt")]
public static extern int puts([MarshalAs(UnmanagedType.LPUTF8Str)] string s);

Convenient or not, from the perspective of performance, I am not fond of this at all. The problem is that the UTF-16 to UTF-8 conversion is done every time we call this function.

What if this function is called from a loop with the same string every time?

foreach (var item in collection)
{
    puts("Item: ");
    puts(item.ToString());
    puts("\n");
}

In this (admittedly-contrived) example, we are writing a collection of items to stdout, one per line. And each line starts with a prefix and ends with a newline. And those two puts() calls are sad, because the conversion of the same strings to UTF-8 is going to be done every time. That's a piece of work we would rather do outside the loop, so it can be done just once. And in order to do that, we need a P/Invoke definition that somehow accepts the already-converted UTF-8 data.

Alternative 1b: byte[]

So at the C# layer, a string converted to UTF-8 is just an array of bytes, right? Maybe the P/Invoke definition should look like this:

[DllImport("msvcrt")]
public static extern int puts(byte[] s);

Compared to the previous idea, this is a big improvement, because we have the flexibility to do the encoding conversion separately.

Which reminds me, we need a utility function for that conversion. Here's the one from SQLitePCLRaw:

using System.Text;

public static byte[] to_utf8_with_z(this string s)
{
    if (s == null)
    {
        return null;
    }

    int len = Encoding.UTF8.GetByteCount(s);
    var ba = new byte[len + 1];
    var wrote = Encoding.UTF8.GetBytes(s, 0, s.Length, ba, 0);
    ba[wrote] = 0;

    return ba;
}

BTW, my apologies for the snake case naming. For various reasons, SQLitePCLRaw mostly follows the naming conventions of SQLite, which I admit isn't very friendly to .NET folks.

Anyway, using byte[] as the parameter type in the P/Invoke definition means we could write that loop like this:

var ba_front = "Item: ".to_utf8_with_z();
var ba_back = "\n".to_utf8_with_z();
foreach (var item in collection)
{
    puts(ba_front);
    puts(item.ToString());
    puts(ba_back);
}

But we still have problems. What if we have UTF-8 data that is not in a byte array? What if our UTF-8 data is in a byte array, but it's not alone in there? We can't do a slice without an allocation and a copy.

Putting byte[] in the P/Invoke definition means that it carries an assumption about using the heap. We don't want that assumption, because it may not always be true, and when it's not, we would have to do an allocation + copy to make it true.

Alternative 1c: IntPtr or byte*

Another option is to make our P/Invoke look like this:

[DllImport("msvcrt")]
public static extern int puts(IntPtr s);

or like this:

[DllImport("msvcrt")]
unsafe public static extern int puts(byte* s);

These two are essentially equivalent. I personally prefer to use byte*, because I think it's more clear. Using byte* does require the unsafe keyword, but I see no reason to avoid that when calling unmanaged code, which is inherently "unsafe" anyway.

But either way, this version of the P/Invoke definition is taking a pointer, just like the underlying C function. This gives us more flexibility than byte[], because the P/Invoke no longer assumes the use of managed memory.

However, the cost of that flexibility, in the case where we are using managed memory, is that the caller now has more work to do. If we're going to pass a pointer to managed memory down to unmanaged code, we need to "pin" it, to prevent the garbage collector from moving it around. When the P/Invoke definition had the parameter type as byte[], the marshaling code was doing the pin/unpin for us. But with a pointer, we have to do it ourselves.

So now our example loop (shown with IntPtr) has gotten worse:

var ba_front = "Item: ".to_utf8_with_z();
GCHandle pinned_front = GCHandle.Alloc(ba_front, GCHandleType.Pinned);
IntPtr ptr_front = pinned_front.AddrOfPinnedObject();

var ba_back = "\n".to_utf8_with_z();
GCHandle pinned_back = GCHandle.Alloc(ba_back, GCHandleType.Pinned);
IntPtr ptr_back = pinned_back.AddrOfPinnedObject();

foreach (var item in collection)
{
    puts(ptr_front);
    puts(item.ToString());
    puts(ptr_back);
}

pinned_front.Free();
pinned_back.Free();

Nonetheless, using a pointer is my preferred form of this P/Invoke definition. Just a simple equivalent to the C function, nothing more and nothing less. I don't want marshaling to do anything for me. I want to fhift all the problems of pinning and allocation and encoding up to a higher layer.

So I finish this section and move forward with the following implementation of Layer 1:

static class Layer1
{
    // int puts(const char* p);
    [DllImport("msvcrt")]
    unsafe public static extern int puts(byte* s);
}

Let's talk about that next layer up.

Layer 2: The one just above the P/Invoke

What is the goal for this layer? Well that kinda depends on what I chose for the type of the parameter in the P/Invoke definition back in Layer 1.

If I had left the P/Invoke definition doing the string marshaling, I probably wouldn't need another layer at all.

If I had chosen the P/Invoke definition with parameter type of byte[], maybe I would now want a layer that takes a string and does the conversion.

But since I chose byte*, I forced myself into a situation where I need another layer just to get unsafe out of my public API. The unsafe doesn't belong in a public C# API. I may be crazy, but I'm not that crazy.

So is Layer 2 where I go ahead and express that string parameter in terms of an actual .NET string? Or byte[]?

Yeah maybe. But I'd like to put that off as long as possible. Instead, I'm asking myself, "What is the thinnest useful layer of abstraction I can put here?"

What about a Span?

static class Layer2
{
    public static int puts(ReadOnlySpan<byte> s)
    {
        unsafe
        {
            fixed (var p = s)
            {
                return Layer1.puts(p);
            }
        }
    }
}

The nice thing about a Span is that it brings the advantages we saw with byte[] except without those pesky assumptions about whether the memory is managed or not. Span can represent unmanaged memory, or part of a managed array, or memory on the stack.

That said, for the case where a Span represents managed memory, we still have to pin it. The fixed block shown above is what does that. Span supports the fixed pattern, aka GetPinnableReference().

Span is looking like the best option so far, but it still has some problems.

Problems with Span

For our case where the underlying C function needs a string with a zero terminator, Span is not a perfect solution, and all of its problems are related to the Length property.

A Span is an encapsulation of a pointer and a length.

But Layer 0 and Layer 1 only care about the pointer. Anything else (string, byte[], Span) is for the benefit of the layers above, and makes no difference to the underlying C function.

So the fact that the Length is part of Layer 2's public API but will be unused should be a clue that we are headed for trouble.

And here is the troubling question: In Layer2.puts() as shown above, for the ReadOnlySpan<byte> parameter, is the zero terminator included in the Length?

Suppose the string is "hello". Because this is a string with all ASCII characters, it's the same as UTF-8, so that's 5 bytes of actual string data plus one byte for the zero terminator. We could pass a Span of length 5 or 6.

Again, the layers below won't care. Either 5 or 6 will still work. Heck, we could use a completely wrong length like 1, and Layer 0 will still end up getting the same pointer.

And if the Length doesn't matter, why is it there?

Let's look more closely at the possibilities. What exactly does that ReadOnlySpan<byte> parameter mean? I see 3 alternatives.

Alternative 2a: zero terminator unknown

In this alternative, the Span.Length contains just the string data, and there may or may not be zero terminator after it.

I mention this one because it is the way other .NET Span-based APIs work. When you see a ReadOnlySpan<byte> as a parameter of a method and that parameter refers to a string data, it is customary to assume that there is no zero terminator inside the Span, and that there may not be one immediately after the Span. Simply put, in the world of .NET, zero-terminated strings are just not really a thing.

And yet, even though this interpretation is the common .NET practice, in our case, this might be the worst possible option.

The contract of our C function in Layer 0 requires the zero terminator. So the C# layer has to make sure there is one. And if it is not going to pass that expectation up to its caller, then it will need to allocate a new buffer with one extra byte, copy the data, and set the zero terminator at the end.

Which would defeat the whole point of using Span.

Alternative 2b: zero terminator unfficial

In this alternative, the Span.Length contains just the string data, but (wink wink) we all know there is actually a zero byte just beyond the end of the Span.

In my opinion, this is dreadful. When I see a Span, I assume the code using it will not go beyond the end.

The contract of our function requires the ending zero. What if I want to insert a check on the C# side to make sure it's there? Such as:

static class Layer2
{
    public static int puts(ReadOnlySpan<byte> s)
    {
        // Look at the byte immediately after the Span to
        // make sure the zero terminator is there.
        if (s[s.Length] != 0) // BZZZT! WRONG!
        {
            throw new Exception("Zero terminator required");
        }

        unsafe
        {
            fixed (var p = s)
            {
                return Layer1.puts(p);
            }
        }
    }
}

But I can't, because C# (rightfully) won't let me go beyond the end of the Span.

Worse, eventually somebody is going to pass a Span that doesn't have the unofficial zero byte after it, because that's what would be the common practice for Span<byte>, and the compiler has no way to prevent them from doing it. And then we will end up with a memory corruption.

Alternative 2c: zero terminator included

In this alternative, the Span.Length includes the zero terminator.

In some ways, this seems less bad than the other two. We won't need an extra allocation + copy, since we know the zero terminator is already there. And now we're being honest about that zero byte's presence by including it in the Length.

And if we ignore enough context, doing it this way kinda makes sense. The zero byte is a documented and required part of the block of bytes the function expects. The length of the zero-terminated block of memory that represents "hello" is 6.

But including the zero byte in the length seems wrong because, well, it just is. Everywhere I go, whenever zero-terminated strings are used, the zero byte is never considered part of the length. The zero byte is not part of the string data -- it's just a memory marker. If we called strlen() on the pointer, it would return a length which did not include the zero. If we passed this pointer to System.Text.Encoding.UTF8.GetString(), the length we provide should not include the zero byte. The length of the string "hello" is 5.

Another way of illustrating this impedance mismatch is to think about slicing. Spans support Slice(), but slicing doesn't work on a zero-terminated string. Or rather, if you slice a zero-terminated string, you may end up with another string that is zero-terminated, or you may end up with one that is not, depending on whether your slice includes the end of the range or not. So slicing is not a particularly useful operation for a zero-terminated string, because it cannot reliably return the same type.

More trouble in the other direction

So far, we've mostly been talking about calling a C function with a zero-terminated string as a parameter. But what about zero-terminated strings being returned from a C function up to .NET?

Again, we just get a pointer. We need to look for a zero byte to find the length. One or two layers up, should we use a Span here too?

And if so, when we construct a ReadOnlySpan<byte> for that piece of memory, again, do we include the zero byte in the Length or not?

In this case, a big difference is that .NET stuff doesn't need or want that zero terminator. Initially, the zero byte is important, because that's how the unmanaged code is providing us a way to get the length. But once we know the length, the zero byte is not likely to be important to the .NET layer, which will typically not use it.

In fact, if we did include the zero terminator in these cases, then almost every piece of code using that Span would need to subtract one from the Length. And should it check to make sure the last byte actually is a zero before doing so?

So maybe for these strings flowing in the other directon we should just not do zero terminators? But what if we need to receive a value from unmanaged code and pass it back down to a C function that requires a zero terminator? So then are we back to needing an allocation + copy?

Maybe Span is the wrong answer for this case?

So right now the situation seems bleak. I appear to be convincing myself that Span should not be used with zero-terminated strings. I envision myself staying bitter and spiteful for the rest of my career as I watch all the other .NET kids get to use Span but those of us working with unmanaged libraries cannot. I want to use Span. I really do. But it seems like all the possible approaches are bad.

Probably the least bad option was Alternative 2c, which was "ReadOnlySpan<byte> where the Length includes the zero terminator". But that approach is just so inconsistent with common practices that it will cause confusion (and therefore, bugs). I find myself wishing I could somehow use this approach without anybody finding out about it.

As it turns out, that's the solution I have settled on: A Span that is hidden away where nobody can see it.

Like a `Span`, but not

In SQLitePCLRaw I have to deal with this issue a lot, as quite a few SQLite functions follow the pattern of accepting or returning a zero-terminated string.

During the development of my 2.0 release, as I started Span-ifying the ISQLite3Provider interface, I kept bumping into the kind of problems described here in this blog post.

But (with help and feedback from others) the solution I ended up with seems quite elegant. It's a new type I call utf8z.

public readonly ref struct utf8z
{
    // this Span will contain a zero terminator byte
    // if sp.Length is 0, it represents a null string
    // if sp.Length is 1, the only byte must be zero, and it is an empty string
    readonly ReadOnlySpan<byte> sp;

Simply, I have encapsulated a ReadOnlySpan<byte> inside another type.

At first I wasn't sure if this would be possible. A Span is a C# 7.2 ref struct, a value type which can only be used in certain ways. A ref struct can only be on the stack, which basically means it can only be a parameter or a local. It can't be on the heap, which means it also can't be a parameter or a local that might end up on the heap. For example, an async method cannot have a Span parameter, because an async method actually gets compiled into a state machine which is implemented using the heap.

But C# will happily let you put a ref struct inside another ref struct.

So in my case, I can use my shameful and confusing "Span with a zero terminator inside" and keep it hidden away from the world by making it private. Then I add stuff to the utf8z public API to expose only the things that are valid and needed.

The first thing I need is a constructor:

    utf8z(ReadOnlySpan<byte> a)
    {
        sp = a;
    }

I kept this simple. Just take a Span as a parameter and store it away in the member field. No error checking. And for that reason, this constructor is private. It is trusting its caller to make sure that the last byte of the Span is a zero terminator, so we don't want just any rando calling it.

The utf8z public API will expose only things that are not capable of violating the rules. For example, there is a static method to get a utf8z from a .NET string:

    public static utf8z FromString(string s)
    {
        if (s == null)
        {
            return new utf8z(ReadOnlySpan<byte>.Empty);
        }
        else
        {
            return new utf8z(s.to_utf8_with_z());
        }
    }

If we need a public method to get a utf8z from a Span, that method should include a check for the zero terminator:

    public static utf8z FromSpan(ReadOnlySpan<byte> a)
    {
        if (
            (a.Length > 0)
            && (a[a.Length - 1] != 0)
            )
        {
            throw new ArgumentException("zero terminator required");
        }
        return new utf8z(a);
    }

We also need a way to convert a utf8z into a pointer which can be passed to P/Invoke. In other words, we need utf8z to implement "the fixed pattern". Fortunately, we can just pass this responsibility on to the encapsulated Span:

    public ref readonly byte GetPinnableReference()
    {
        return ref sp.GetPinnableReference();
    }

Finally, we need a convenient way to convert a utf8z to a .NET string:

    public string utf8_to_string()
    {
        if (sp.Length == 0)
        {
            return null;
        }

        unsafe
        {
            fixed (byte* q = sp)
            {
                return Encoding.UTF8.GetString(q, sp.Length - 1);
            }
        }
    }

Note that I did not use an override of ToString() for this, as Microsoft guidelines say that ToString() should never return null. And I want utf8z to have a clearly-defined ability to represent a null. Span is a value type, so it is not nullable. The implementation of Span.GetPinnableReference() returns null for an empty Span, but that's not precise enough for interacting with unmanaged code, where an empty string is different from a null. (Once again, in the world of C, sometimes pointers are NULL. It is what it is, and sometimes we just have to deal with it.)

And that's it. My implementation of Layer 2 looks like this:

static class Layer2
{
    public static int puts(utf8z s)
    {
        unsafe
        {
            fixed (var p = s)
            {
                return Layer1.puts(p);
            }
        }
    }
}

Use of utf8z in SQLitePCLRaw 2.0

In the ISQLite3Provider interface, I now use an actual Span only in cases where the usage is consistent with the associated expecations for same.

And everywhere a zero-terminated string appears as a parameter or a return value, utf8z is used.

In other words, a zero-terminated UTF-8 string is a different type than a length-specified UTF-8 string. So the error using the wrong one can be caught at compile time, which I think is a huge win.

It isn't always feasible to get this kind of type safety for code down at this level, because adding a memory allocation would kill performance. But in this case, ref struct FTW.

On the horizon

We should allow ourselves to think of a future where .NET programming is done with UTF-8 instead of UTF-16. Discussion and progress toward a new Utf8String type appears to be underway:

https://github.com/dotnet/corefxlab/issues/2368

I am glad that the link above includes the words "like ensuring a null terminator (important for p/invoke scenarios)". Even if we all agree to dislike zero-terminated strings, our need to deal with them is not going away anytime soon.