Reverse Engineering
Reverse engineering is the process of analyzing a program, or system to understand its structure, function, and behavior.
It often means getting back the original code/system using disassembling, decompiling, or similar methods.
It can be used to understand how a malware works 🛡️ or to find vulnerabilities in a program/system in a black-box assessment 💥.
Practice
- crackmes (binaries to crack, 👻)
For simple programs, we might be able to get the information we need using tracers such as:
strace: see every system callstrings: extract every readable string, may not be installed
For a preliminary analysis of your executable:
- Using the
filecommand on Linux
$ file some_executable
- Using Detect-It-Easy on Windows
Java Reverse Engineering
JAR application
You can extract a JAR archive using archive tools or:
$ jar xf xxx.jar
You can also create a JAR archive using archive tools or:
$ jar -cvf ../xxx.war *
$ jar -cmf ./META-INF/MANIFEST.MF ../xxx.jar *
If you plan to edit the JAR, you may have to remove every checksum from the MANIFEST.MF along with .RSA/.SF to bypass integrity checks. ⚠️ Note that MANIFEST.MF must ends with a blank line.
Before you modify a file, you need to create a raw copy in which you will inject .class before bundling them back to a JAR:
$ mkdir raw && cd raw
$ jar xvf ../xxx.jar && cd ..
Now, you can edit a file and transfer its compiled .class to "raw":
$ javac -cp xxx.jar source_code/path/to/file.java
$ cp -r source_code/path/to/*.class raw/path/to/file
Once you are done, pack "raw" back to a JAR:
$ jar -cmf raw/META-INF/MANIFEST.MF xxx.jar raw
Java jd - Decompiler
You can use jd-gui (13.4k ⭐) to reverse a Java application. Run jd-gui, and oad the JAR in it. You can then either:
- Explore the reversed sources from
jd-gui - Use
File> Save all sourcesand read/modify them in your editor
Java - Other Decompilers
Other well-known decompilers:
- Recaf (5.5k ⭐)
- JetBrains IntelliJ (16.4k ⭐)
.NET Reverse Engineering
You can work on reversing .NET executable and DLL files using the following tools...
De4Dot - .NET Decompiler
You can use de4dot (6.7k ⭐, 2020 🪦) to reverse your binary. On Windows, drag and drop your binary onto the de4dot executable.
dnSpy - .NET Debug/Editor
You can use dnSpy (25.2k ⭐, 2020 🪦) to explore .NET source code.
dotPeek - .NET Decompiler/Editor
You can use JetBrains dotPeek (free 🐲) on Windows to reverse your binary and explore the source code. Opening the file will automatically load the .NET solutions contained in the executable.
ILSpy - .NET Decompiler/Editor
ILSpy (19.9k ⭐) is the most popular open-source .NET decompiler. It can be integrated in editors such as VSCode or standalone editors.
On Linux, you can use the AvaloniaILSpy (1.4k ⭐) port.
$ cd /tmp
$ # download a release at https://github.com/icsharpcode/AvaloniaILSpy/releases
$ unzip Linux.x64.Release.zip && unzip ILSpy-linux-x64-Release.zip
$ mkdir -p $HOME/tools/ && mv artifacts/linux-x64/ $HOME/tools/AvaloniaILSpy
$ rm -rf Linux.x64.Release.zip ILSpy-linux-x64-Release.zip artifacts # cleanup
$ ln -s $HOME/tools/AvaloniaILSpy/ILSpy $HOME/.local/bin/ILSpy
$ ILSpy # run
Opening the executable will automatically load its .NET solutions. You can save an entire reversed solution or specific files by right-clicking on the target you want to save and selecting 'Save Code.'
Python Reverse Engineering
Python bytecode files .pyc or .pyo for optimized bytecode are compiled cross-platform code that python can execute.
For Python <= 3.11, you can use pycdc (2.6k ⭐).
$ git clone https://github.com/zrax/pycdc.git && cd pycdc
$ mkdir build && cd build && cmake .. && make -j $(nproc)
$ ./pycdc ./xxx.pyc
For Python <= 3.8, you can use uncompyle6 (3.5k ⭐):
$ pipx install git+https://github.com/rocky/python-uncompyle6
$ uncompyle6 xxx.pyc
If your python version and the target python version is the same, you can use the builtin dis module:
import dis
import marshal
dis.dis(marshal.load(open('xxx.pyc', 'rb')))
Alternatively, python code can be bundled in a executable that doesn't require the python engine using PyInstaller.
You can extract its contents using pyinstxtractor (2.3k ⭐).
Android Reverse Engineering
APK files contains multiple files such as classes.dex and other .dex files, resources, the manifest and certificate files, etc.
It's possible for code to exist within a DEX file but not detected nor reversed by tools such as JADX.
JADX — APK+Dex Decompiler/Disassembler
You can use jadx (38.5k ⭐) to decompile APK and DEX files to Java.
$ sudo apt install -y jadx
You can use either the CLI or the GUI:
$ jadx $(pwd)/basic_rev.apk -d $(pwd)/out
$ jadx-gui # and open your file
Android Studio For Reversed Code
You may open decompiled files in Android Studio. Create a new project, put your files inside (in java/ and res/), apply fixes if prompted, remove the automatically generated R.java, and run the app.
🐲 Android Studio has a DexViewer which you can use to see if your Dex file contains hidden methods.
Android Boot Image Unpacker
You can use mkbootimg developed by Google (unpack_bootimg.py):
$ sudo apt install -y mkbootimg
$ unpack_bootimg --boot_img boot.img
$ cd out && gunzip -c ramdisk | cpio -idmv
Additional tools: unpackbootimg (0.2k ⭐), mkbootimg_tools (0.5k ⭐, 2016 🪦) or abootimg (0.1k ⭐, 2012 🪦).
Additional Notes
- The dexdump list methods/classes in a DEX file
- The 010editor hex editor is a paid tool to analyze a DEX file
- bytecode-viewer (14.3k ⭐, 👻)
- Apktool (18.6k ⭐, 👻)
- dex2jar (11.8k ⭐, 👻)
- intro-to-mobile-pentesting
androguard — APK+Dex Explorer/Disassembler
androguard (4.9k ⭐) is a powerful Python tool to explore APK/Dex files, but there is almost no documentation.
$ sudo apt install -y androguard
$ androguard analyze ./example.apk # or directly classes.dex
$ androguard analyze
prompt> from androguard.misc import AnalyzeAPK
prompt> a, d, dx = AnalyzeAPK("./example.apk")
prompt> d = DalvikVMFormat(a) # If 'd' is "empty"...
List every class in the DEX:
classes = [c for c in dx.get_classes() if not c.external] ; classes
List every method (look for methods not found by JADX):
for i, m in enumerate(d.methods.methods):
print(m.get_class_name()+m.get_name(), 'has method idx=', i, '; hex=', hex(i))
You can detect code not associated with a method using:
known_offsets = [m.get_code().offset for m in d.get_methods() if m.get_code()]
for c in d.get_codes_item().code:
if c.offset not in known_offsets:
print("No method associated with code offset:", c.offset, "; hex=", hex(c.offset))
# View The ByteCode Given A Suspicous Offset
d.get_codes_item().get_code(0xffff).show()
We can declare the hidden method as a virtual method, rebuilt the DEX, and decompile it again using another tool.
- Increase the number of virtual methods by 1
- Write the method IDX,
0x1for public, and the code offset to the file. All values are in hexadecimal and uleb128 formatted. - Compute the sha1 of
file_size - 32and write it at index 12 - Compute the adler32 checksum of
file[12:]and write it at index 8
Missing Additional Notes FixMePlease
A few more parameters are required to declare a virtual method. Assuming you have access to a class object (cf. classes), you need to find where is the number of virtual methods stored.
virtual_methods_index = hex(aClass.get_class().class_data_item.get_off() + 3)
The class_data_item contains four values. The last one (notice the +3) is the number of virtual methods.
The second step is to find where we can add virtual method reference, e.g. virtual_methods_block_index. I don't know how to do that programmatically.
Python Code Samples
def update_sha1(input_file):
import hashlib
with open(input_file, 'rb+') as f:
f.seek(0, 2)
file_size = f.tell()
f.seek(32)
sha1 = hashlib.sha1(f.read(file_size - 32)).hexdigest()
f.seek(12)
f.write(bytes.fromhex(sha1))
print("SHA1:", sha1)
def update_checksum(input_file):
import zlib
with open(input_file, 'rb+') as f:
f.seek(12)
checksum = '{:08x}'.format(zlib.adler32(f.read()) & 0xFFFFFFFF)
f.seek(8)
f.write(int(checksum, 16).to_bytes(4, byteorder='little'))
print("Checksum: ", int(checksum, 16))
def encode_uleb128(value):
encoded_bytes = bytearray()
while True:
byte = value & 0x7F
value >>= 7
if value != 0:
byte |= 0x80
encoded_bytes.append(byte)
if value == 0:
break
return int.from_bytes(encoded_bytes, byteorder='little')
def modify_virtual_methods_size(input_file):
virtual_methods_index = 0x3988
virtual_methods_block_index = 0x39b1
method_idx = 0x22
method_access = 0x1
method_offset = 0x1fcc
with open(input_file, 'rb+') as f:
# Determine The Current Value
f.seek(virtual_methods_index)
virtual_methods_size = int.from_bytes(f.read(1), 'little')
# Add One
f.seek(virtual_methods_index)
virtual_methods_size += 1
f.write(virtual_methods_size.to_bytes(1, 'little'))
print("Virtual Methods Size:", virtual_methods_size)
# Add The Given Method
f.seek(virtual_methods_block_index)
f.write(encode_uleb128(method_idx).to_bytes(1, 'little'))
f.write(encode_uleb128(method_access).to_bytes(1, 'little'))
f.write(encode_uleb128(method_offset).to_bytes(2, 'little'))
Reversing Binaries On Linux
Linux ObjDump Disassembly
The most basic disassembler:
$ objdump -D xxx.bin -M intel
Linux GDB Disassembly
$ git clone https://github.com/longld/peda.git ~/peda
$ echo "source ~/peda/peda.py" >> ~/.gdbinit
$ echo "set disassembly-flavor intel" >> ~/.gdbinit
$ gdb -q xxx.bin
(gdb) disas main
(gdb) # refer to GDB documentation
(gdb) run
Linux Tracers
Linux commands strace and ltrace are very helpful to identify system and library calls in a program, which is helpful to reverse it.
$ strace xxx.bin # -f | -e open/... | -s 1000 | -y
syscall(args) = return_code
...
$ ltrace xxx.bin
Boomerang
boomerang (0.4k ⭐) that is somewhat able to reverse x86 binaries in an unreadable uncompilable C file.
Docker Installation
Save the code below in a dockerfile and run docker build -t boomrangcli:latest . to build the docker image.
FROM ubuntu:22.04
# From https://github.com/BoomerangDecompiler/boomerang#building-on-linux
# [CHANGE] qt5-default => libqt5core5a libqt5gui5 libqt5widgets5 qtbase5-dev
RUN apt-get update && \
apt-get install -y git build-essential cmake \
qtbase5-dev libqt5core5a libqt5gui5 libqt5widgets5 \
libcapstone-dev flex bison
# [CHANGE] Used /opt
WORKDIR /opt
RUN git clone https://github.com/BoomerangDecompiler/boomerang.git
WORKDIR /opt/boomerang/build
RUN cmake .. && make -j$(nproc) && make install
# Remove the build folder
RUN rm -rf /opt/boomerang
# Don't run the tool as root
RUN useradd -ms /usr/sbin/nologin boomerang
WORKDIR /builds/
RUN chown -R boomerang:boomerang /builds/
USER boomerang
ENTRYPOINT ["/usr/local/bin/boomerang-cli"]
For instance to decompile ch1.bin:
$ docker run -it -v $(pwd):/builds boomrangcli:latest ch1.bin
Linux Radare Disassembly
Radare (19.4k ⭐) is similar to GDB, but it somewhat easier to use if we only need to disassemble the code.
$ rabin2 -I xxx.bin # get information
$ rabin2 -z xxx.bin # list strings
$ r2 -d xxx.bin # Open in debug mode (if applicable)
$ r2 -A xxx.bin # Open and analyze (aaa)
$ r2 -qcizz xxx.bin
(r2) a? # help for analyze
(r2) aaa # analyze
(r2) vv # view disassembly, symbols, etc.
(r2) VV # view the program flow
(r2) afl # list function, can grep
(r2) pdf @main # disassemble 'main' (@sym.main)
(r2) oo # reload executable
(r2) db 0xAABBCCDD # breakpoint
(r2) dc # run the program, stop before breakpoint
(r2) ds # run one instruction
(r2) px @ address # display the memory at address
(r2) dr # display registry values
📚 When using pdf such as pdf @main, we can see a list of variables and their addresses. We can pass these addresses to px.
Reversing Binaries On Windows
While still in progress, look here for PE notes.
Windows x64dbg debugger
You can use x64dbg (42.9k ⭐) to debug binaries.
- You can navigate to options to define the breakpoints. For instance, uncheck everything except Exit Breakpoint.
- The memory map tab can be used to find stuff like memory-mapped files (a file mapped to a memory region like a buffer).
- Double-click on an entry to see its bytes
- You may recognize a file from the magic code bytes
- Right-click on an address to dump its contents to a file
- Use 'Search for > Current Module > String references' to see strings and their address. You can double-click on an address to navigate to it (see also: Right-click > 'Toggle Breakpoint.')
- Place the cursor on an instruction, and press 'Spacebar' to edit it.
- Use CTRL+P to save the patched instructions.
➡️ See also: x64dbg unpack malware and OllyDbg.
Additional Tools
- WinDBG ( 👻)
Reversing Binaries On Any Platform
dogbolt Online Decompiler Explorer
dogbolt (1.8k ⭐) quickly test your code against many decompilers. It's quite handy during CTFs, but has some an implicit binary sharing policy and legal restrictions on private instances.
IDA Decompiler & Disassembler
IDA Pro is the most used and well-known compiler while it is paid. You can use the limited free version:
$ wget https://out7.hex-rays.com/files/idafree84_linux.run
$ chmod +x idafree84_linux.run
$ ./idafree84_linux.run
$ # assuming you installed it in $HOME/tools/
$ ln -s $HOME/tools/idafree-8.4/ida64 $HOME/.local/bin/ida
$ ida xxx.bin
Additional Notes
- Press F5 to use the third-party x64 free cloud decompiler.
→denotes a jump while---→denotes a conditional instruction
Binary Ninja
Binary Ninja is a paid decompiler and disassembler. You can use the limited free version:
$ mkdir -p $HOME/tools/ && cd $HOME/tools/
$ wget https://cdn.binary.ninja/installers/BinaryNinja-free.zip
$ unzip BinaryNinja-free.zip && rm BinaryNinja-free.zip
$ cd && ln -s $HOME/tools/binaryninja/binaryninja $HOME/.local/bin/binaryninja
Additional Tools
Firmware Reversing And Analysis
Encrypted Firmware
Using binwalk, you might be able to know if the firmware was encrypted, for instance, using gpg.
$ binwalk -E -N firmware.bin
📚 Previous versions may contain exposed encryption data.
Firmware Extraction
- Firmware ModKit (FMK) (0.8k ⭐): it uses binwalk to extract the filesystem. Can repack the modified firmware.
$ sudo apt install -y firmware-mod-kit
$ /opt/firmware-mod-kit/trunk/extract-firmware.sh firmware.bin
$ cd fmk/rootfs/gpg/ # find keys, and crack the passphrase
- firmwalker (1.0k ⭐): search for juicy files in the firmware
Additional Notes
Code Obfuscation
Moved to code obfuscation.
C Language Notes
Random notes for CTFs:
scanf("XXX%s", &s): must inputXXX<input>; only input is stored.strcmp("abc", "zyx"): returns 0, 1, or -1. But,strcmp(var, "xxx")returns the ordinal difference of the first different char.
👻 To-do 👻
Stuff that I found, but never read/used yet.
Where to learn?
Tools
Note
- A program creating a file then deleting it =>
Disable inheritance, Convert to explicit, Select user > Edit > Show advanced permissions, uncheck both 'delete'to prevent them from removing the file and allow us to read it. - Ghidra
sudo apt install ghidra -y
ghidra
> new project
> import file
> analyze
Windows
> Decompile: Main
> Functions
> Defined Strings
StackView
> Double-click on a variable to open