2019-07-02 15:00:00
Using Span for high performance interop with unmanaged libraries
There's so much great performance-related stuff happening in the .NET world right now.
A major character in this story is Span, which gives us
a way to reduce heap allocations and copying. Before
we had Span
, a common operation like Substring()
resulted in
an allocation and a copy.
In this blog post I want to talk about using Span
in the specific case
of interop with unmanaged libraries that deal in zero-terminated
strings (aka null-terminated strings).
Layer 0: The unmanaged function
Let's talk about this in layers. The bottom layer is the unmanaged function itself, written in C or something that pretends to be C. The kind of functions I'm talking about are ones that have a return value or a parameter which is a zero-terminated string. Such things are quite common. Here are a few familiar examples:
int puts(const char* p); FILE *fopen(const char *path, const char *mode); int sqlite3_open(const char* path, sqlite3** ppdb); char *asctime(const struct tm *tm);
With a zero-terminated string, all we have is a pointer. To find the length, something is going to start at that pointer and walk through memory until it finds a zero byte.
(I am aware of the various (security | correctness | etc) problems associated with zero-terminated strings. Nonetheless, there are many libraries that use them. For the purpose of this blog post, we assume the unmanaged function is something we can't change. It is what it is, and we have to deal with it.)
And before we go any further, let's mention the challenges the arising from the the differing encodings of the string. In .NET, strings are encoded in UTF-16, where each character is 2 bytes, and perhaps more. Any C library that expects a string terminated with a zero byte is not using UTF-16. If it's an old-school library, it might be expecting ASCII. The more modern alternative is UTF-8.
In any case, to use a .NET string with a zero-terminated string, we will have to do a conversion. That conversion is another form of a copy, and for performance reasons, we want to keep those to a minimum.
Layer 1: The P/Invoke definition
The next layer up is the P/Invoke definition. This is what allows us to access the C function from .NET.
A typical P/Invoke looks something like this:
// int abs(int x); [DllImport("msvcrt")] public static extern int abs(int x);
This definition says, "in a dynamic library named 'msvcrt', look for
a function named abs
, with a return type of int
and one parameter
of type int
.
Of course, a pointer to a zero-terminated string is a bit more
complicated than an int
. How should we describe that pointer in our
P/Invoke definition? We actually have some choices here, so let's
discuss the tradeoffs.
Alternative 1a: string
One thing we could do is define the P/Invoke to accept a regular
.NET string
:
[DllImport("msvcrt")] public static extern int puts(string s);
This approach offers convenience. And it accomplishes what we
probably need to do at some layer anyway, which is start with a .NET
string
, convert it to UTF-8, get a pointer for it, and pass it to
the C code.
Well actually: The default marshaling for a string is zero-terminated, but with ANSI encoding. To get UTF-8, we would have to do this:
[DllImport("msvcrt")] public static extern int puts([MarshalAs(UnmanagedType.LPUTF8Str)] string s);
Convenient or not, from the perspective of performance, I am not fond of this at all. The problem is that the UTF-16 to UTF-8 conversion is done every time we call this function.
What if this function is called from a loop with the same string every time?
foreach (var item in collection) { puts("Item: "); puts(item.ToString()); puts("\n"); }
In this (admittedly-contrived) example, we are writing a collection of
items to stdout, one per line. And each line starts with a prefix
and ends with a newline. And those two puts()
calls are sad, because
the conversion of the same strings to UTF-8 is going to be done every
time. That's a piece of work we would rather do outside the loop,
so it can be done just once. And in order to do that, we need a
P/Invoke definition that somehow accepts the already-converted UTF-8 data.
Alternative 1b: byte[]
So at the C# layer, a string converted to UTF-8 is just an array of bytes, right? Maybe the P/Invoke definition should look like this:
[DllImport("msvcrt")] public static extern int puts(byte[] s);
Compared to the previous idea, this is a big improvement, because we have the flexibility to do the encoding conversion separately.
Which reminds me, we need a utility function for that conversion. Here's the one from SQLitePCLRaw:
using System.Text; public static byte[] to_utf8_with_z(this string s) { if (s == null) { return null; } int len = Encoding.UTF8.GetByteCount(s); var ba = new byte[len + 1]; var wrote = Encoding.UTF8.GetBytes(s, 0, s.Length, ba, 0); ba[wrote] = 0; return ba; }
BTW, my apologies for the snake case naming. For various reasons, SQLitePCLRaw mostly follows the naming conventions of SQLite, which I admit isn't very friendly to .NET folks.
Anyway, using byte[]
as the parameter type in the P/Invoke definition
means we could write that loop like this:
var ba_front = "Item: ".to_utf8_with_z(); var ba_back = "\n".to_utf8_with_z(); foreach (var item in collection) { puts(ba_front); puts(item.ToString()); puts(ba_back); }
But we still have problems. What if we have UTF-8 data that is not in a byte array? What if our UTF-8 data is in a byte array, but it's not alone in there? We can't do a slice without an allocation and a copy.
Putting byte[]
in the P/Invoke definition means that it
carries an assumption about using the heap. We don't want
that assumption, because it may not always be true, and
when it's not, we would have to do an allocation + copy
to make it true.
Alternative 1c: IntPtr or byte*
Another option is to make our P/Invoke look like this:
[DllImport("msvcrt")] public static extern int puts(IntPtr s);
or like this:
[DllImport("msvcrt")] unsafe public static extern int puts(byte* s);
These two are essentially equivalent.
I personally prefer to use byte*
, because I think it's more
clear. Using byte*
does require the unsafe
keyword, but
I see no reason to avoid that
when calling unmanaged code, which is inherently "unsafe"
anyway.
But either way, this version of the P/Invoke definition is
taking a pointer, just like the underlying C function.
This gives us more flexibility than byte[]
, because the P/Invoke no
longer assumes the use of managed memory.
However, the cost of that flexibility,
in the case where we are using managed memory,
is that the caller now has more work to do. If we're going to pass
a pointer to managed memory down to unmanaged code, we need
to "pin" it, to prevent the garbage collector from moving it
around. When the P/Invoke definition had the parameter type as byte[]
, the marshaling
code was doing the pin/unpin for us. But with a pointer, we have
to do it ourselves.
So now our example loop (shown with IntPtr
) has gotten worse:
var ba_front = "Item: ".to_utf8_with_z(); GCHandle pinned_front = GCHandle.Alloc(ba_front, GCHandleType.Pinned); IntPtr ptr_front = pinned_front.AddrOfPinnedObject(); var ba_back = "\n".to_utf8_with_z(); GCHandle pinned_back = GCHandle.Alloc(ba_back, GCHandleType.Pinned); IntPtr ptr_back = pinned_back.AddrOfPinnedObject(); foreach (var item in collection) { puts(ptr_front); puts(item.ToString()); puts(ptr_back); } pinned_front.Free(); pinned_back.Free();
Nonetheless, using a pointer is my preferred form of this P/Invoke definition. Just a simple equivalent to the C function, nothing more and nothing less. I don't want marshaling to do anything for me. I want to fhift all the problems of pinning and allocation and encoding up to a higher layer.
So I finish this section and move forward with the following implementation of Layer 1:
static class Layer1 { // int puts(const char* p); [DllImport("msvcrt")] unsafe public static extern int puts(byte* s); }
Let's talk about that next layer up.
Layer 2: The one just above the P/Invoke
What is the goal for this layer? Well that kinda depends on what I chose for the type of the parameter in the P/Invoke definition back in Layer 1.
If I had left the P/Invoke definition doing the string
marshaling,
I probably wouldn't need another layer at all.
If I had chosen the P/Invoke definition with parameter type of byte[]
,
maybe I would now want a layer that takes a string
and
does the conversion.
But since I chose byte*
, I forced myself into a situation
where I need another layer just to get unsafe
out of my public API. The unsafe
doesn't belong in
a public C# API. I may be crazy, but I'm not that crazy.
So is Layer 2 where I go ahead and express that string
parameter in terms of an actual .NET string
? Or byte[]
?
Yeah maybe. But I'd like to put that off as long as possible. Instead, I'm asking myself, "What is the thinnest useful layer of abstraction I can put here?"
What about a Span
?
static class Layer2 { public static int puts(ReadOnlySpan<byte> s) { unsafe { fixed (var p = s) { return Layer1.puts(p); } } } }
The nice thing about a Span
is that it brings the advantages we saw with byte[]
except without those
pesky assumptions about whether the memory is managed or not. Span
can represent
unmanaged memory, or part of a managed array, or memory on the stack.
That said, for the case where a Span
represents managed memory,
we still have to pin it. The fixed
block shown above is what
does that. Span
supports the fixed pattern, aka GetPinnableReference().
Span
is looking like the best option so far, but it still
has some problems.
Problems with Span
For our case where the underlying C function needs a string with a zero terminator, Span
is not a perfect solution, and
all of its problems are related to the Length
property.
A Span
is an encapsulation of a pointer and a length.
But Layer 0 and Layer 1 only care about the pointer. Anything else (string
, byte[]
, Span
) is for the benefit
of the layers above, and makes no difference to the underlying C function.
So the fact that the Length
is part of Layer 2's public API but
will be unused should be a clue that
we are headed for trouble.
And here is the troubling question: In Layer2.puts()
as shown above, for the ReadOnlySpan<byte>
parameter,
is the zero terminator included in the Length
?
Suppose the string is "hello"
. Because this is a string with all ASCII characters,
it's the same as UTF-8, so that's 5 bytes of actual string data
plus one byte for the zero terminator.
We could pass a Span
of length 5 or 6.
Again, the layers below won't care. Either 5 or 6 will still work. Heck, we could use a completely wrong length like 1, and Layer 0 will still end up getting the same pointer.
And if the Length
doesn't matter, why is it there?
Let's look more closely at the possibilities.
What exactly does that ReadOnlySpan<byte>
parameter mean?
I see 3 alternatives.
Alternative 2a: zero terminator unknown
In this alternative, the Span.Length
contains just the string data,
and there may or may not be zero terminator after it.
I mention this one because it is the way other .NET Span
-based
APIs work. When you see a ReadOnlySpan<byte>
as a parameter
of a method and that parameter refers to a string data, it
is customary to assume that there is no zero terminator
inside the Span
, and that there may not be one immediately
after the Span
. Simply put, in the world of .NET, zero-terminated
strings are just not really a thing.
And yet, even though this interpretation is the common .NET practice, in our case, this might be the worst possible option.
The contract of our C function in Layer 0 requires the zero terminator. So the C# layer has to make sure there is one. And if it is not going to pass that expectation up to its caller, then it will need to allocate a new buffer with one extra byte, copy the data, and set the zero terminator at the end.
Which would defeat the whole point of using Span
.
Alternative 2b: zero terminator unfficial
In this alternative, the Span.Length
contains just the string data, but
(wink wink) we all know there is actually a zero byte just beyond the end of the
Span
.
In my opinion, this is dreadful.
When I see a Span
, I assume the code using it will not go
beyond the end.
The contract of our function requires the ending zero. What if I want to insert a check on the C# side to make sure it's there? Such as:
static class Layer2 { public static int puts(ReadOnlySpan<byte> s) { // Look at the byte immediately after the Span to // make sure the zero terminator is there. if (s[s.Length] != 0) // BZZZT! WRONG! { throw new Exception("Zero terminator required"); } unsafe { fixed (var p = s) { return Layer1.puts(p); } } } }
But I can't, because C# (rightfully) won't let me go beyond the
end of the Span
.
Worse, eventually somebody is going to pass a Span
that
doesn't have the unofficial zero byte after it, because that's
what would be the common practice for Span<byte>
, and the compiler
has no way to prevent them from doing it. And then we will
end up with a memory corruption.
Alternative 2c: zero terminator included
In this alternative, the Span.Length
includes the zero terminator.
In some ways, this seems less bad than the other two.
We won't need an extra allocation + copy, since we know the zero terminator
is already there. And now we're being honest about that zero byte's
presence by including it in the Length
.
And if we ignore enough context, doing it this way kinda makes sense.
The zero byte is a documented and required part of the block of bytes the
function expects.
The length of the zero-terminated block of memory that
represents "hello"
is 6.
But including the zero byte in the length seems wrong because, well,
it just is. Everywhere I go, whenever zero-terminated strings are used, the
zero byte is never considered part of the length.
The zero byte is not part of the string data -- it's just a memory marker.
If we called strlen()
on the pointer, it would return a length which did not include the zero.
If we passed this pointer to System.Text.Encoding.UTF8.GetString()
,
the length we provide should not include the zero byte.
The length of the string "hello"
is 5.
Another way of illustrating this impedance mismatch is to
think about slicing. Spans support Slice()
, but
slicing doesn't work on a zero-terminated string.
Or rather, if you slice a zero-terminated string, you
may end up with another string that is zero-terminated,
or you may end up with one that is not, depending on
whether your slice includes the end of the range or not.
So slicing is not a particularly useful operation for a
zero-terminated string, because it cannot reliably return
the same type.
More trouble in the other direction
So far, we've mostly been talking about calling a C function with a zero-terminated string as a parameter. But what about zero-terminated strings being returned from a C function up to .NET?
Again, we just get a pointer. We need to look for a zero byte to
find the length. One or two layers up, should we use a Span
here too?
And if so, when we construct a ReadOnlySpan<byte>
for that piece of memory, again, do we include the zero byte in the Length
or not?
In this case, a big difference is that .NET stuff doesn't need or want that zero terminator. Initially, the zero byte is important, because that's how the unmanaged code is providing us a way to get the length. But once we know the length, the zero byte is not likely to be important to the .NET layer, which will typically not use it.
In fact, if we did include the zero terminator in these cases,
then almost every piece of code using that Span
would need to subtract one from the Length.
And should it check to make sure the last byte actually is a
zero before doing so?
So maybe for these strings flowing in the other directon we should just not do zero terminators? But what if we need to receive a value from unmanaged code and pass it back down to a C function that requires a zero terminator? So then are we back to needing an allocation + copy?
Maybe Span is the wrong answer for this case?
So right now the situation seems bleak. I appear to be convincing
myself that Span
should not be used with zero-terminated strings.
I envision myself staying bitter and spiteful for the rest
of my career as I watch all the other .NET kids get to use Span
but
those of us working with unmanaged libraries cannot.
I want to use Span
. I really do.
But it seems like all the possible approaches are bad.
Probably the least bad option was Alternative 2c, which was "ReadOnlySpan<byte>
where the Length
includes the zero terminator".
But that approach is just so inconsistent with common practices that it
will cause confusion (and therefore, bugs).
I find myself wishing I could somehow use this approach
without anybody finding out about it.
As it turns out, that's the solution I have settled on:
A Span
that is hidden away where nobody can see it.
Like a Span
, but not
In SQLitePCLRaw I have to deal with this issue a lot, as quite a few SQLite functions follow the pattern of accepting or returning a zero-terminated string.
During the development of my 2.0 release, as I started Span
-ifying the ISQLite3Provider
interface,
I kept bumping into the kind of problems described here in this blog post.
But (with help and feedback from others) the solution I ended up
with seems quite elegant. It's a new type I call utf8z
.
public readonly ref struct utf8z { // this Span will contain a zero terminator byte // if sp.Length is 0, it represents a null string // if sp.Length is 1, the only byte must be zero, and it is an empty string readonly ReadOnlySpan<byte> sp;
Simply, I have encapsulated a ReadOnlySpan<byte>
inside another type.
At first I wasn't sure if this would be possible.
A Span
is a C# 7.2 ref struct
, a value type which can only
be used in certain ways. A ref struct
can only be
on the stack, which basically means it can only be a parameter or a local. It can't be on the heap, which means
it also can't be a parameter or a local that might end up on the heap.
For example, an async
method
cannot have a Span
parameter, because an async
method actually
gets compiled into a state machine which is implemented using the heap.
But C# will happily let you put a ref struct
inside another ref struct
.
So in my case, I can use my shameful and confusing "Span
with a zero terminator inside"
and keep it hidden away from the world by making it private
. Then I add stuff
to the utf8z
public API to expose only the things that are valid and needed.
The first thing I need is a constructor:
utf8z(ReadOnlySpan<byte> a) { sp = a; }
I kept this simple. Just take a Span
as
a parameter and store it away in the member field. No
error checking.
And for that reason, this constructor is private. It is
trusting its caller to make sure
that the last byte of the Span
is a zero terminator, so we don't want
just any rando calling it.
The utf8z
public API will expose only things that are not capable
of violating the rules. For example, there is a static method to
get a utf8z
from a .NET string:
public static utf8z FromString(string s) { if (s == null) { return new utf8z(ReadOnlySpan<byte>.Empty); } else { return new utf8z(s.to_utf8_with_z()); } }
If we need a public method to get a utf8z
from a
Span
, that method should include a check
for the zero terminator:
public static utf8z FromSpan(ReadOnlySpan<byte> a) { if ( (a.Length > 0) && (a[a.Length - 1] != 0) ) { throw new ArgumentException("zero terminator required"); } return new utf8z(a); }
We also need a way to convert a utf8z
into a pointer
which can be passed to P/Invoke. In other words, we need
utf8z
to implement "the fixed pattern". Fortunately, we can
just pass this responsibility on to the encapsulated Span
:
public ref readonly byte GetPinnableReference() { return ref sp.GetPinnableReference(); }
Finally, we need a convenient way to
convert a utf8z
to a .NET string
:
public string utf8_to_string() { if (sp.Length == 0) { return null; } unsafe { fixed (byte* q = sp) { return Encoding.UTF8.GetString(q, sp.Length - 1); } } }
Note that I did not use an override of ToString()
for this,
as Microsoft guidelines say that ToString()
should never return
null. And I want utf8z
to have a clearly-defined ability to
represent a null. Span
is a value type, so it is not nullable.
The implementation of Span.GetPinnableReference()
returns null for an empty Span
, but
that's not precise enough for interacting with unmanaged code,
where an empty string is different from a null.
(Once again, in the world of C, sometimes pointers
are NULL. It is what it is, and sometimes we just have to deal with it.)
And that's it. My implementation of Layer 2 looks like this:
static class Layer2 { public static int puts(utf8z s) { unsafe { fixed (var p = s) { return Layer1.puts(p); } } } }
Use of utf8z in SQLitePCLRaw 2.0
In the ISQLite3Provider
interface, I now use
an actual Span
only in cases where the usage is consistent with
the associated expecations for same.
And everywhere a zero-terminated string appears as a parameter or a
return value, utf8z
is used.
In other words, a zero-terminated UTF-8 string is a different type than a length-specified UTF-8 string. So the error using the wrong one can be caught at compile time, which I think is a huge win.
It isn't always feasible to get this kind of type safety for code down at this level,
because adding a memory allocation would kill performance. But in this case,
ref struct
FTW.
On the horizon
We should allow ourselves to think of a future where .NET programming
is done with UTF-8 instead of UTF-16. Discussion and progress toward a
new Utf8String
type appears to be underway:
https://github.com/dotnet/corefxlab/issues/2368
I am glad that the link above includes the words "like ensuring a null terminator (important for p/invoke scenarios)". Even if we all agree to dislike zero-terminated strings, our need to deal with them is not going away anytime soon.