In this entry, I want to walk through storing byte arrays in the User Strings heap.
For this example, I'll use the simple HelloWorld console application used during the presentation, here is the C# code for completeness:
namespace HelloWorld
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello Code Camp 7!");
}
}
}
If you open the exe up in ILDasm you will see the following IL for the Main method (I highlighted the string that I'll focus on in this entry):
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 13 (0xd)
.maxstack 8
IL_0000: nop
IL_0001: ldstr "Hello Code Camp7!"
IL_0006: call void [mscorlib]System.Console::WriteLine(string)
IL_000b: nop
IL_000c: ret
} // end of method Program::Main
When you look at the contents of the User Strings heap (View -> MetaInfo -> Raw:Heaps, View -> MetaInfo -> Show!) you see the following entry:
70000001 : (17) L"Hello Code Camp7!"
This shows the token value for this string (70) = User Strings heap (000001) = staring offset of the string. (17) is the length of the string.
If you look at the User String heap in a hex editor the string will look something like (bytes and text shown):
48 00 65 00 6C 00 6C 00 6F 00 20 00 43 00 6F 00 64 00 65 00 20 00 43 00 61 00 6D 00 70 00 37 00 21 00
H.e.l.l.o .C.o.d.e. .C.a.m.p.7.!.
This was the text that we hacked at the beginning of the session with the hex editor and then ILDasm/ILAsm. If you run the console application, you simply get "Hello Code Camp 7!" written out to the console.
#US heap
What is shows is the User String heap is Unicode (2 bytes for each letter). The interesting thing I want to point out is, the User string heap will store a byte array and supposedly any type of binary object. To show this, I am still going to work with a string (makes it easier to demo) but as a bytearray.
I have a little C# application that adds some bytes to each byte in a unicode string and returns the bytes (code that I'm not going to talk about here). For a string such as "Hello Code Camp7!" it generates a string of bytes such as "f3 ab 10 ab 17 ab 17 ab 1a ab cb ab ee ab 1a ab 0f ab 10 ab cb ab ee ab 0c ab 18 ab 1b ab e2 ab cc ab"
In order to get that string of bytes stored in the #US heap, we need to edit the IL to look like this:
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
.maxstack 8
ldstr bytearray( f3 ab 10 ab 17 ab 17 ab 1a ab cb ab ee ab 1a ab 0f ab 10 ab cb ab ee ab 0c ab 18 ab 1b ab e2 ab cc ab )
call string HelloWorld.Program::Unscramble(string)
call void [mscorlib]System.Console::WriteLine(string)
ret
}
By using ldstr bytearray (... bytes ...) we get the bytes added to the #US for us. The call to the Unscramble method after the string load is just a method that reverses the byte addition I did to get the byte string to begin with (the Scramble just adds 0xab to each byte and the Unscramble subtracts 0xab bytes - again this really isn't what I want to show here).
Now if you compile the new IL file and open the exe in ILDasm you will see a User String heap that looks like the following:
70000001 : (17) L"................."
User string has unprintables, hex format below:
abf3 ab10 ab17 ab17 ab1a abcb abee ab1a ab0f ab10 abcb abee ab0c ab18 ab1b abe2
abcc
And now if you open it up in a hex editor you will see basically the same bytes that we put in the bytearray:
f3 ab 10 ab 17 ab 17 ab 1a ab cb ab ee ab 1a ab 0f ab 10 ab cb ab ee ab 0c ab 18 ab 1b ab e2 ab cc ab
What this shows is that the #US heap can actually hold values other than just strings.
In order to sum things up, I'll quote Serge Lidin from Expert .Net 2.0 IL Assembler page 77:
#US: A blob heap containing user-defined strings. This stream contains string constants defined in the user code. The strings are kept in Unicode (UTF-16) encoding, with an additional trailing byte set to 1 or 0, indicating whether there are any characters with codes greater than 0x007F in the string. This trailing byte was added to streamline the encoding conversion operations on string objects produced from user-defined string constants. This stream's most interesting characteristic is that the user strings are never referenced from any metadata table but can be explicitly addressed by the IL code (with the Ldstr instruction). In addition, being actually a blob heap, the #US heap can store not only Unicode strings but any binary object, which opens some intriguing possibilities.