Obfuscation: Renaming and Simple Code Removal

by Jason Haley 4. June 2006 12:44

Do you think obfuscation is something that can be done whenever the code is complete? If you do, you are not alone. It seems most developers treat the obfuscation step of the life cycle pretty much the same as they treat building the installation package (ie. MSI) - it isn't their problem. This idea of obfuscation being someone else's problem or just something that will be done once the code is complete, reminds me of how the integration step use to be treated ... that is until Continuous Integration became widespread. It is true that obfuscation can be put off until the code is complete, but chances are it will be a disappointing and painful step.

If you are planning on obfuscating your code, you most likely are looking for some protection against people easily reverse engineering your product. I say “easily”, because obfuscation is not going to protect your code 100%, but it will raise the bar on what it will take to reverse engineer your assemblies. In order to get the most out of obfuscation, you will need to take some things into account when designing and implementing your code. One question you might be wondering as you read this is: why should I (as a developer) care?

In order to answer why you should care, I am going to start with giving you a better understanding (hopefully) of how you can get the most out of obfuscation. This will probably take two or three entries. After you have an idea of how hard you can make your assemblies to reverse engineer, I’ll get to how it is going to affect your code to make that happen. The first obfuscation technique everyone learns is renaming – so let’s start with it.

Renaming is the process of changing the strings that are stored in the #strings heap that belong to the assemblies you are obfuscating. The #strings heap holds strings used by an assembly (such as class names, method names, field names, etc.) including those defined in the assembly and referenced from another assembly. You don’t want the obfuscator to edit the strings that belong to referenced assemblies because the loader needs to be able to create the full name of the type (using these strings) to know what assemblies need loaded and how to get the needed referenced types. Some obfuscators have an option that will allow you to obfuscate an assembly and its dependent assemblies if the dlls are in the same directory as the exe. This would allow you to obfuscate everything – even the public interface of your application and dlls (minus any 3rd party dlls). Now let’s look at what the difference might be between three levels of renaming: No obfuscation, Obfuscate private only, and Obfuscate everything.

The example I am using is an enum and very simple Customer class that only contains fields and properties. Here is the C# code:

    1 namespace SouthRain

    2 {

    3     public enum Phone {

    4         Home, Business, Fax, Mobile

    5     };

    6 

    7     public class Customer

    8     {

    9         private string _firstName;

   10         private string _lastName;

   11         private string _company;

   12         private string _address1;

   13         private string _address2;

   14         private string _city;

   15         private string _state;

   16         private string _zipCode;

   17         private string _phoneNumber;

   18         private Phone _phone;

   19 

   20         public string FirstName {

   21             get { return _firstName; }

   22             set { _firstName = value; }

   23         }

   24 

   25         public string LastName {

   26             get { return _lastName; }

   27             set { _lastName = value; }

   28         }

   29 

   30         public string Company {

   31             get { return _company; }

   32             set { _company = value; }

   33         }

   34 

   35         public string Address1 {

   36             get { return _address1; }

   37             set { _address1 = value; }

   38         }

   39 

   40         public string Address2 {

   41             get { return _address2; }

   42             set { _address2 = value; }

   43         }

   44 

   45         public string City {

   46             get { return _city; }

   47             set { _city = value; }

   48         }

   49 

   50         public string State {

   51             get { return _state; }

   52             set { _state = value; }   

   53         }

   54 

   55         public string ZipCode {

   56             get { return _zipCode; }

   57             set { _zipCode = value; }

   58         }

   59 

   60         public string PhoneNumber {

   61             get { return _phoneNumber; }

   62             set { _phoneNumber = value; }

   63         }

   64 

   65         public Phone Phone {

   66             get { return _phone; }

   67             set { _phone = value; }

   68         }

   69     }

   70 }

No Obfuscation
If you compile the source code and open the dll in Reflector, you'll see something like the image below. Next to the Reflector image is the a list of the string heap contents (you can the the string heap contents from ILDasm from the View Menu->MetaInfo->Raw:Heaps then View Menu->MetaInfo->Show!). The code and strings are all easy to read and understand.

String Heap:  1031(0x407) bytes
00000001: 
0000000a: SouthRain.dll
00000018: Phone
0000001e: SouthRain
00000028: Customer
00000031: mscorlib
0000003a: System
00000041: Enum
00000046: Object
0000004d: value__
00000055: Home
0000005a: Business
00000063: Fax
00000067: Mobile
0000006e: _firstName
00000079: _lastName
00000083: _company
0000008c: _address1
00000096: _address2
000000a0: _city
000000a6: _state
000000ad: _zipCode
000000b6: _phoneNumber
000000c3: _phone
000000ca: get_FirstName
000000d8: set_FirstName
000000e6: get_LastName
000000f3: set_LastName
00000100: get_Company
0000010c: set_Company
00000118: get_Address1
00000125: set_Address1
00000132: get_Address2
0000013f: set_Address2
0000014c: get_City
00000155: set_City
0000015e: get_State
00000168: set_State
00000172: get_ZipCode
0000017e: set_ZipCode
0000018a: get_PhoneNumber
0000019a: set_PhoneNumber
000001aa: get_Phone
000001b4: set_Phone
000001be: .ctor
000001c4: FirstName
000001ce: LastName
000001d7: Company
000001df: Address1
000001e8: Address2
000001f1: City
000001f6: State
000001fc: ZipCode
00000204: PhoneNumber
00000210: System.Reflection
00000222: AssemblyFileVersionAttribute
0000023f: AssemblyVersionAttribute
00000258: System.Runtime.InteropServices
00000277: GuidAttribute
00000285: ComVisibleAttribute
00000299: AssemblyCultureAttribute
000002b2: AssemblyTrademarkAttribute
000002cd: AssemblyCopyrightAttribute
000002e8: AssemblyProductAttribute
00000301: AssemblyCompanyAttribute
0000031a: AssemblyConfigurationAttribute
00000339: AssemblyDescriptionAttribute
00000356: AssemblyTitleAttribute
0000036d: System.Diagnostics
00000380: DebuggableAttribute
00000394: DebuggingModes
000003a3: System.Runtime.CompilerServices
000003c3: CompilationRelaxationsAttribute
000003e3: RuntimeCompatibilityAttribute
00000401: value

Obfuscate private only
When you obfuscate the non public interface, you'll notice it is a little harder to follow - but not too much harder for this simple example. Some things to notice:

  • namespace is the same
  • types have the same names
  • field names have been renamed
  • some of the strings in string heap have been changed
  • size of the string heap is smaller

String Heap:  973(0x3cd) bytes
00000001: 
0000000a: System.Reflection
0000001c: AssemblyFileVersionAttribute
00000039: .ctor
0000003f: System.Runtime.InteropServices
0000005e: GuidAttribute
0000006c: ComVisibleAttribute
00000080: AssemblyTrademarkAttribute
0000009b: AssemblyCopyrightAttribute
000000b6: AssemblyProductAttribute
000000cf: AssemblyCompanyAttribute
000000e8: AssemblyConfigurationAttribute
00000107: AssemblyDescriptionAttribute
00000124: AssemblyTitleAttribute
0000013b: System.Runtime.CompilerServices
0000015b: CompilationRelaxationsAttribute
0000017b: RuntimeCompatibilityAttribute
00000199: System
000001a0: Attribute
000001aa: Enum
000001af: Object
000001b6: AttributeUsageAttribute
000001ce: AttributeTargets
000001df: SouthRain.dll
000001ed: mscorlib
000001f6: DotfuscatorAttribute
0000020b: Customer
00000214: SouthRain
0000021e: Phone
00000224: a
00000226: b
00000228: eval_0
0000022f: eval_1
00000236: eval_2
0000023d: eval_3
00000244: eval_4
0000024b: 5
0000024d: eval_6
00000254: eval_7
0000025b: eval_8
00000262: get_FirstName
00000270: set_FirstName
0000027e: value
00000284: get_LastName
00000291: set_LastName
0000029e: get_Company
000002aa: set_Company
000002b6: get_Address1
000002c3: set_Address1
000002d0: get_Address2
000002dd: set_Address2
000002ea: get_City
000002f3: set_City
000002fc: get_State
00000306: set_State
00000310: get_ZipCode
0000031c: set_ZipCode
00000328: get_PhoneNumber
00000338: set_PhoneNumber
00000348: get_Phone
00000352: set_Phone
0000035c: value__
00000364: Home
00000369: Business
00000372: Fax
00000376: Mobile
0000037d: A
0000037f: B
00000381: FirstName
0000038b: LastName
00000394: Company
0000039c: Address1
000003a5: Address2
000003ae: City
000003b3: State
000003b9: ZipCode
000003c1: PhoneNumber

Obfuscate everything
When you obfuscate everything, you'll notice it is quite a bit harder to follow just by reading. Some things to notice:

  • namespace is gone
  • types names have been renamed
  • field names have been renamed
  • properties are now gone
  • alot of the strings in string heap have been changed
  • several strings have been removed from the string heap
  • size of the string heap is smaller

String Heap:  622(0x26e) bytes
00000001: 
0000000a: System.Reflection
0000001c: AssemblyFileVersionAttribute
00000039: .ctor
0000003f: System.Runtime.InteropServices
0000005e: GuidAttribute
0000006c: ComVisibleAttribute
00000080: AssemblyTrademarkAttribute
0000009b: AssemblyCopyrightAttribute
000000b6: AssemblyProductAttribute
000000cf: AssemblyCompanyAttribute
000000e8: AssemblyConfigurationAttribute
00000107: AssemblyDescriptionAttribute
00000124: AssemblyTitleAttribute
0000013b: System.Runtime.CompilerServices
0000015b: CompilationRelaxationsAttribute
0000017b: RuntimeCompatibilityAttribute
00000199: System
000001a0: Enum
000001a5: Object
000001ac: Attribute
000001b6: AttributeUsageAttribute
000001ce: AttributeTargets
000001df: SouthRain.dll
000001ed: mscorlib
000001f6: DotfuscatorAttribute
0000020b: eval_0
00000212: eval_1
00000219: a
0000021b: b
0000021d: eval_2
00000224: eval_3
0000022b: eval_4
00000232: 5
00000234: eval_6
0000023b: eval_7
00000242: eval_8
00000249: A_0
0000024d: eval_5
00000254: value__
0000025c: 0
0000025e: 2
00000260: A
00000262: B
00000264: SouthRain

What do the examples show you about obfuscation?

  1. Renaming can make the interface difficult to understand
  2. Obfuscating only the private members makes it a little more difficult to reverse, but everything is better
  3. Namespaces can be removed
  4. Properties can be removed
Those 4 items are the things I want you to remember from this entry. You might also really think about items 3 and 4 and how they will break some of your code. Renaming can very useful, but there will be some side effects of maximizing the obfuscation (think binary serialization, Xml serialization, reflection, etc.).

Next time I'll cover string encryption and the #US heap.

Comments (9) | Post RSSRSS comment feed |

Categories:
Tags:

Comments

Comments are closed