Disassemble vs. Decompile

Being a non-CS major, I always found disassemble and decompile close to the same thing... sort of like burglary and robbery. Tonight I finally decided to see what the fine print said the difference was and it seems to come down to this: if you are taking a raw file and converting it to a form of assembler then it is called “disassemble”, if you are taking a raw file and converting it to a higher level language representation then it is “decompile”.  I guess this would make Lutz's Reflector tool both a decompiler (VB/C# or anyother langugae you want to add) and a disassembler (MSIL)...  Does this sound right to you?

posted on Wednesday, August 31, 2005 5:45 PM

Feedback

# re: Disassemble vs. Decompile

Jason,

While it is common to confuse these two terms, it is also common to use them interchangeably, particularly as you mention "outside of CS" or in environments of less rigor (I'd argue that your blog entitles you to choose the level of rigor in which to discuss these matters).

The traditional forms of these terms are as follows:

Decompile - To convert assembly language to a higher level language.

Disassemble - To convert machine language to assembly language.

Machine Language - Code that is "ready to run", i.e. the native processor of the computer can execute the code without modification; opcodes with operands. This is in binary or hex and is typically not human readable.

Assembly Language - Code that represents machine language. This too is difficult to read, but it typically is mnemonics (for the opcodes) with appropriate operands.

In particular in .NET, IL is neither machine language nor assembly language. It is an intermediate language (similar to the high level languages). The JIT compiler is responsible for compiling IL into native (machine language) code (based on the processor/platform). .NET moves from IL to native code in compilation and not though assembly.

Start here for some formal definitions:

http://en.wikipedia.org/wiki/Decompile

If speaking rigorously, many consider Roeder's Reflector to be a decompiler.

((begin rambling))

There are a lot of books from which to choose on CS topics, but here are a few that I think are useful when considering how things are built and consequently how they may be "taken down or taken apart".

CS Theory Book List:

(*)Introduction to the Theory of Computation - Sipser
(*)Introduction to Automata Theory, Languages and Computation - Hopcroft, Ullman
(*)Automata and Computability - Kozen
(-)Elements of the Theory of Computation - Papadimitriou, Lewis

Kozen is the best for beginners (least rigorous, but an absolute pleasure to read). Sipser is best for overall coverage and utility. Hopcroft and Ullman is for the criminally insane (especially the First Edition) and can best be described as corrugated cardboard in the desert during summer, only drier (but I still like it).

CS Compiler book list:

(*)Compilers (The Dragon Book) - Aho, Sethi, Ullman
(-)Compiler Construction: Principles and Practice - Louden
(-)The Art of Compiler Design: Theory and Practice - Pittman, Peters
(-)Optimizing Compilers for Modern Architectures: A Dependence-based Approach - Allen, Kennedy
(*)Engineering a Compiler - Cooper, Torczon
(*)Yacc and Lexx - Levine, Mason, Brown
(*)Advanced Compiler Design and Implementation - Muchnick
(-)Modern Compiler Design - Bal, Grune, Jacobs, Langendoen
(-)Writing Compilers and Interpreters - Mak

Every CS person in the practice of compiler construction has The Dragon Book, although it is getting dated.

(*)A Theory of Objects - Abadi, Cardelli

Dr. Luca Cardelli is a Microsoft Researcher in the Programming Principles and Tools Group. He "hangs" with Don Syme, Andy Gordon, Andrew Kennedy and other monsters of computation that are responsible for the research behind .NET (that monster term is a good thing). This book is useful when debating with "pretenders" that "know" OO.

(*) - I own
(-) - In my amazon cart or to-buy list (don't ask why these are on my list; I have 227 books in my "save for later" area).

If you have to pick one of the above books (non-CS, but with above average aptitude in computation and programming), I'd choose either Kozen (Theory) or The Dragon Book (Practical).

It is important to note that the disassembly and decompilation of programs is computationally difficult. Dr. Charles Simonyi, the Hungarian in Hungarian Notation, formerly of Microsoft fame and now with Intentional Software has some interesting thoughts, which I think are a little convoluted, but nonetheless, interesting here comparing programming with one-way hard problems in encryption in cryptography:

http://blog.intentionalsoftware.com/intentional_software/2005/04/dummy_post_1.html

Eilam's Reversing covers decompilation in Chapter 13. Graph Theory, First Order Logic and other topics are useful in this context, but I've already committed robbery of this post's space.

((end rambling))

Hope this opinion is useful.

---O
8/31/2005 8:33 PM | optionsScalper

# re: Disassemble vs. Decompile

---O:

Thanks for the huge comment, it has tons of good stuff in it! I'll have to put some those books on my list to get. I like your definitions, they make more sense, for some reason I didn't think of going to Wikipedia. I started reading the Reversing book last night (I'm reading the .Net and decompiling chapters to get ideas for the presentation).

Thanks again for all the info!
9/1/2005 3:42 AM | Jason Haley

Post Comment

Title  
Name  
Url
Comment   
Please enter the following code into the box below to stop spammers

  
Enter Code Here *