Certain languages like Java and .NET can be easily decompiled into readable source code. Code obfuscation is a process that makes your application binaries harder to read with a decompiler. It’s an important tool for protecting your business’s intellectual property.
Why Obfuscate Code?
Compiled languages like C++ get converted directly to bytecode. The only way to reverse engineer how they work is with a disassembler, which is an arduous and complicated process. It’s not impossible, but trying to infer high level application logic from a stream of assembly language is hard.
On the other hand, languages like C# and Java aren’t compiled for any particular operating system. Rather, they’re compiled to an intermediary language, like .NET’s MSIL. The intermediary language is similar to assembly, but it can be easily converted back into the source code. This means that if you have a public DLL or executable that your business is distributing, anyone with a copy of your executable can open it up in a .NET decompiler like dotPeek, and directly read (and copy) your source code.
Code obfuscation can’t prevent this process—any .NET DLL can be plugged into a decompiler. What obfuscation does do is use a number of tricks to make the source code annoying as hell to read and debug.
The simplest form of this is entity renaming. It’s common practice to properly name variables, methods, classes, and parameters according to what they do. But you don’t have to, and technically there’s nothing stopping you from naming them with a series of lowercase L’s and I’s, or random combinations of Chinese unicode characters. To the computer, there’s no issue, but it’s completely illegible to a human:
An basic obfuscator will handle this process automatically, taking the output from the build, and converting it to something that’s a lot harder to read. There’s no performance hit compared to non-obfuscated code
More advanced obfuscators can go further, and actually change the structure of your source code. This includes replacing control structures with more complicated but semantically identical syntax. They can also insert dummy code that doesn’t do anything except confuse the decompiler. The effect of this is that it makes your source look like spaghetti code, making it more annoying to read.
Another common focus is hiding strings from decompilers. In managed executables, you can search for strings like error messages to locate sections of code. String obfuscation replaces strings with encoded messages, which are decrypted at runtime, making it impossible to search for them from a decompiler. This usually comes with a performance penalty.
There are plenty of options for obfuscators, though it will depend on which language you are obfuscating. For .NET, there’s Obfuscar. For Java, there’s ProGuard. For JavaScript, there’s javascript-obfuscator.
Other Options: Convert to a Compiled Language
Converting one programming language to another isn’t an entirely crazy idea—Unity uses IL2CPP, a converter that transforms .NET code into compiled C++ bytecode. It’s a lot more performant, but it also helps secure games against easy cracking, which is crucial for an environment plagued by piracy and cheaters.
Microsoft has CoreRT, an experimental .NET Core runtime using Ahead-Of-Time compilation, though it isn’t ready for production use.
Should You Obfuscate?
If you’re deploying code in untrusted environments where you want to protect your source code, you should almost always use at least a basic obfuscator to rename functions, methods, and properties to make decompiling take a bit more effort.
If you really need nobody to be able to decompile your app, you can use a more intrusive obfuscator, but really you should consider if the problem would be better solved by switching to a language that doesn’t have this issue, such as C++ or Rust.