Dissasembling.Net - Appendix B

In this entry, I want to walk through storing byte arrays in the User Strings heap.

For this example, I'll use the simple HelloWorld console application used during the presentation, here is the C# code for completeness:

namespace HelloWorld

{

    class Program

    {

        static void Main(string[] args)

        {

            Console.WriteLine("Hello Code Camp 7!");

        }

    }

}

 If you open the exe up in ILDasm you will see the following IL for the Main method (I highlighted the string that I'll focus on in this entry):

.method private hidebysig static void Main(string[] args) cil managed
{
  .entrypoint
  // Code size 13 (0xd)
  .maxstack 8
  IL_0000: nop
  IL_0001: ldstr "Hello Code Camp7!"
  IL_0006: call void [mscorlib]System.Console::WriteLine(string)
  IL_000b: nop
  IL_000c: ret
} // end of method Program::Main

When you look at the contents of the User Strings heap (View -> MetaInfo -> Raw:Heaps, View -> MetaInfo -> Show!) you see the following entry:

70000001 : (17) L"Hello Code Camp7!"

This shows the token value for this string (70) = User Strings heap (000001) = staring offset of the string.  (17) is the length of the string.

If you look at the User String heap in a hex editor the string will look something like (bytes and text shown):

48 00 65 00 6C 00 6C 00 6F 00 20 00 43 00 6F 00 64 00 65 00 20 00 43 00 61 00 6D 00 70 00 37 00 21 00

H.e.l.l.o .C.o.d.e. .C.a.m.p.7.!.

This was the text that we hacked at the beginning of the session with the hex editor and then ILDasm/ILAsm.  If you run the console application, you simply get "Hello Code Camp 7!" written out to the console.

#US heap

What is shows is the User String heap is Unicode (2 bytes for each letter).  The interesting thing I want to point out is, the User string heap will store a byte array and supposedly any type of binary object.  To show this, I am still going to work with a string (makes it easier to demo) but as a bytearray.

I have a little C# application that adds some bytes to each byte in a unicode string and returns the bytes (code that I'm not going to talk about here).  For a string such as "Hello Code Camp7!" it generates a string of bytes such as "f3 ab 10 ab 17 ab 17 ab 1a ab cb ab ee ab 1a ab 0f ab 10 ab cb ab ee ab 0c ab 18 ab 1b ab e2 ab cc ab"

In order to get that string of bytes stored in the #US heap, we need to edit the IL to look like this:

.method private hidebysig static void Main(string[] args) cil managed
{
  .entrypoint
  .maxstack 8
  ldstr bytearray( f3 ab 10 ab 17 ab 17 ab 1a ab cb ab ee ab 1a ab 0f ab 10 ab cb ab ee ab 0c ab 18 ab 1b ab e2 ab cc ab )
  call string HelloWorld.Program::Unscramble(string)
  call void [mscorlib]System.Console::WriteLine(string)
  ret
}

By using ldstr bytearray (... bytes ...) we get the bytes added to the #US for us.  The call to the Unscramble method after the string load is just a method that reverses the byte addition I did to get the byte string to begin with (the Scramble just adds 0xab to each byte and the Unscramble subtracts 0xab bytes - again this really isn't what I want to show here).

Now if you compile the new IL file and open the exe in ILDasm you will see a User String heap that looks like the following:

70000001 : (17) L"................."
User string has unprintables, hex format below:
abf3 ab10 ab17 ab17 ab1a abcb abee ab1a ab0f ab10 abcb abee ab0c ab18 ab1b abe2
abcc

And now if you open it up in a hex editor you will see basically the same bytes that we put in the bytearray:

f3 ab 10 ab 17 ab 17 ab 1a ab cb ab ee ab 1a ab 0f ab 10 ab cb ab ee ab 0c ab 18 ab 1b ab e2 ab cc ab

What this shows is that the #US heap can actually hold values other than just strings. 

In order to sum things up, I'll quote Serge Lidin from Expert .Net 2.0 IL Assembler page 77:

#US: A blob heap containing user-defined strings.  This stream contains string constants defined in the user code.  The strings are kept in Unicode (UTF-16) encoding, with an additional trailing byte set to 1 or 0, indicating whether there are any characters with codes greater than 0x007F in the string.  This trailing byte was added to streamline the encoding conversion operations on string objects produced from user-defined string constants.  This stream's most interesting characteristic is that the user strings are never referenced from any metadata table but can be explicitly addressed by the IL code (with the Ldstr instruction).  In addition, being actually a blob heap, the #US heap can store not only Unicode strings but any binary object, which opens some intriguing possibilities.

posted on Friday, April 06, 2007 10:27 PM

Feedback

# New and Notable 156

A light day out there for the Holiday weekend. LINQ Bart has another series going with The IQueryable
4/7/2007 1:34 PM | Sam Gentile

# Link Listing - April 8, 2007

Sync Services: How to partition data for your offline clients? [Via: Rafik ] First Service Factory v3...
4/8/2007 9:48 PM | Christopher Steen

# New and Notable 156

A light day out there for the Holiday weekend. LINQ Bart has another series going with The IQueryable

# New and Notable 156

<p>A light day out there for the Holiday weekend. LINQ Bart has another series going with The IQueryable Tales - LINQ to LDAP. Part 1 is Key Concepts and Part 2 is Getting Started with IQueryable . CLR Jason continues Disassembling .NET with Appendix B , a walkthrough of storing byte arrays in the User...</p>
12/2/2008 7:44 PM | Sam Gentile's Blog

Post Comment

Title  
Name  
Url
Comment   
Please enter the following code into the box below to stop spammers

  
Enter Code Here *