Reverse Engineering
Reverse engineering is the process of analyzing a program, or system to understand its structure, function, and behavior.
It often means getting back the original code/system using disassembling, decompiling, or similar methods.
It can be used to understand how a malware works 🛡️ or to find vulnerabilities in a program/system in a black-box assessment 💥.
Practice
- crackmes (binaries to crack, 👻)
For simple programs, we might be able to get the information we need using tracers such as:
strace
: see every system callstrings
: extract every readable string, may not be installed
For a preliminary analysis of your executable:
- Using the
file
command on Linux
$ file some_executable
- Using Detect-It-Easy on Windows
Java Reverse Engineering
JAR application
You can extract a JAR archive using archive tools or:
$ jar xf xxx.jar
You can also create a JAR archive using archive tools or:
$ jar -cvf ../xxx.war *
$ jar -cmf ./META-INF/MANIFEST.MF ../xxx.jar *
If you plan to edit the JAR, you may have to remove every checksum from the MANIFEST.MF along with .RSA/.SF
to bypass integrity checks. ⚠️ Note that MANIFEST.MF must ends with a blank line.
Before you modify a file, you need to create a raw copy in which you will inject .class
before bundling them back to a JAR:
$ mkdir raw && cd raw
$ jar xvf ../xxx.jar && cd ..
Now, you can edit a file and transfer its compiled .class
to "raw":
$ javac -cp xxx.jar source_code/path/to/file.java
$ cp -r source_code/path/to/*.class raw/path/to/file
Once you are done, pack "raw" back to a JAR:
$ jar -cmf raw/META-INF/MANIFEST.MF xxx.jar raw
Java jd - Decompiler
You can use jd-gui (13.4k ⭐) to reverse a Java application. Run jd-gui
, and oad the JAR in it. You can then either:
- Explore the reversed sources from
jd-gui
- Use
File> Save all sources
and read/modify them in your editor
Java - Other Decompilers
Other well-known decompilers:
- Recaf (5.5k ⭐)
- JetBrains IntelliJ (16.4k ⭐)
.NET Reverse Engineering
You can work on reversing .NET executable and DLL files using the following tools...
De4Dot - .NET Decompiler
You can use de4dot (6.7k ⭐, 2020 🪦) to reverse your binary. On Windows, drag and drop your binary onto the de4dot executable.
dnSpy - .NET Debug/Editor
You can use dnSpy (25.2k ⭐, 2020 🪦) to explore .NET source code.
dotPeek - .NET Decompiler/Editor
You can use JetBrains dotPeek (free 🐲) on Windows to reverse your binary and explore the source code. Opening the file will automatically load the .NET solutions contained in the executable.
ILSpy - .NET Decompiler/Editor
ILSpy (19.9k ⭐) is the most popular open-source .NET decompiler. It can be integrated in editors such as VSCode or standalone editors.
On Linux, you can use the AvaloniaILSpy (1.4k ⭐) port.
$ cd /tmp
$ # download a release at https://github.com/icsharpcode/AvaloniaILSpy/releases
$ unzip Linux.x64.Release.zip && unzip ILSpy-linux-x64-Release.zip
$ mkdir -p $HOME/tools/ && mv artifacts/linux-x64/ $HOME/tools/AvaloniaILSpy
$ rm -rf Linux.x64.Release.zip ILSpy-linux-x64-Release.zip artifacts # cleanup
$ ln -s $HOME/tools/AvaloniaILSpy/ILSpy $HOME/.local/bin/ILSpy
$ ILSpy # run
Opening the executable will automatically load its .NET solutions. You can save an entire reversed solution or specific files by right-clicking on the target you want to save and selecting 'Save Code.'
Python Reverse Engineering
Python bytecode files .pyc
or .pyo
for optimized bytecode are compiled cross-platform code that python
can execute.
For Python <= 3.11
, you can use pycdc (2.6k ⭐).
$ git clone https://github.com/zrax/pycdc.git && cd pycdc
$ mkdir build && cd build && cmake .. && make -j $(nproc)
$ ./pycdc ./xxx.pyc
For Python <= 3.8
, you can use uncompyle6 (3.5k ⭐):
$ pipx install git+https://github.com/rocky/python-uncompyle6
$ uncompyle6 xxx.pyc
If your python version and the target python version is the same, you can use the builtin dis module:
import dis
import marshal
dis.dis(marshal.load(open('xxx.pyc', 'rb')))
Alternatively, python code can be bundled in a executable that doesn't require the python
engine using PyInstaller.
You can extract its contents using pyinstxtractor (2.3k ⭐).
Android Reverse Engineering
APK files contains multiple files such as classes.dex
and other .dex
files, resources, the manifest and certificate files, etc.
It's possible for code to exist within a DEX file but not detected nor reversed by tools such as JADX.
JADX — APK+Dex Decompiler/Disassembler
You can use jadx (38.5k ⭐) to decompile APK and DEX files to Java.
$ sudo apt install -y jadx
You can use either the CLI or the GUI:
$ jadx $(pwd)/basic_rev.apk -d $(pwd)/out
$ jadx-gui # and open your file
Android Studio For Reversed Code
You may open decompiled files in Android Studio. Create a new project, put your files inside (in java/ and res/), apply fixes if prompted, remove the automatically generated R.java
, and run the app.
🐲 Android Studio has a DexViewer which you can use to see if your Dex file contains hidden methods.
Android Boot Image Unpacker
You can use mkbootimg developed by Google (unpack_bootimg.py):
$ sudo apt install -y mkbootimg
$ unpack_bootimg --boot_img boot.img
$ cd out && gunzip -c ramdisk | cpio -idmv
Additional tools: unpackbootimg (0.2k ⭐), mkbootimg_tools (0.5k ⭐, 2016 🪦) or abootimg (0.1k ⭐, 2012 🪦).
Additional Notes
- The dexdump list methods/classes in a DEX file
- The 010editor hex editor is a paid tool to analyze a DEX file
- bytecode-viewer (14.3k ⭐, 👻)
- Apktool (18.6k ⭐, 👻)
- dex2jar (11.8k ⭐, 👻)
- intro-to-mobile-pentesting
androguard — APK+Dex Explorer/Disassembler
androguard (4.9k ⭐) is a powerful Python tool to explore APK/Dex files, but there is almost no documentation.
$ sudo apt install -y androguard
$ androguard analyze ./example.apk # or directly classes.dex
$ androguard analyze
prompt> from androguard.misc import AnalyzeAPK
prompt> a, d, dx = AnalyzeAPK("./example.apk")
prompt> d = DalvikVMFormat(a) # If 'd' is "empty"...
List every class in the DEX:
classes = [c for c in dx.get_classes() if not c.external] ; classes
List every method (look for methods not found by JADX):
for i, m in enumerate(d.methods.methods):
print(m.get_class_name()+m.get_name(), 'has method idx=', i, '; hex=', hex(i))
You can detect code not associated with a method using:
known_offsets = [m.get_code().offset for m in d.get_methods() if m.get_code()]
for c in d.get_codes_item().code:
if c.offset not in known_offsets:
print("No method associated with code offset:", c.offset, "; hex=", hex(c.offset))
# View The ByteCode Given A Suspicous Offset
d.get_codes_item().get_code(0xffff).show()
We can declare the hidden method as a virtual method, rebuilt the DEX, and decompile it again using another tool.
- Increase the number of virtual methods by 1
- Write the method IDX,
0x1
for public, and the code offset to the file. All values are in hexadecimal and uleb128 formatted. - Compute the sha1 of
file_size - 32
and write it at index 12 - Compute the adler32 checksum of
file[12:]
and write it at index 8
Missing Additional Notes FixMePlease
A few more parameters are required to declare a virtual method. Assuming you have access to a class object (cf. classes), you need to find where is the number of virtual methods stored.
virtual_methods_index = hex(aClass.get_class().class_data_item.get_off() + 3)
The class_data_item
contains four values. The last one (notice the +3) is the number of virtual methods.
The second step is to find where we can add virtual method reference, e.g. virtual_methods_block_index
. I don't know how to do that programmatically.
Python Code Samples
def update_sha1(input_file):
import hashlib
with open(input_file, 'rb+') as f:
f.seek(0, 2)
file_size = f.tell()
f.seek(32)
sha1 = hashlib.sha1(f.read(file_size - 32)).hexdigest()
f.seek(12)
f.write(bytes.fromhex(sha1))
print("SHA1:", sha1)
def update_checksum(input_file):
import zlib
with open(input_file, 'rb+') as f:
f.seek(12)
checksum = '{:08x}'.format(zlib.adler32(f.read()) & 0xFFFFFFFF)
f.seek(8)
f.write(int(checksum, 16).to_bytes(4, byteorder='little'))
print("Checksum: ", int(checksum, 16))
def encode_uleb128(value):
encoded_bytes = bytearray()
while True:
byte = value & 0x7F
value >>= 7
if value != 0:
byte |= 0x80
encoded_bytes.append(byte)
if value == 0:
break
return int.from_bytes(encoded_bytes, byteorder='little')
def modify_virtual_methods_size(input_file):
virtual_methods_index = 0x3988
virtual_methods_block_index = 0x39b1
method_idx = 0x22
method_access = 0x1
method_offset = 0x1fcc
with open(input_file, 'rb+') as f:
# Determine The Current Value
f.seek(virtual_methods_index)
virtual_methods_size = int.from_bytes(f.read(1), 'little')
# Add One
f.seek(virtual_methods_index)
virtual_methods_size += 1
f.write(virtual_methods_size.to_bytes(1, 'little'))
print("Virtual Methods Size:", virtual_methods_size)
# Add The Given Method
f.seek(virtual_methods_block_index)
f.write(encode_uleb128(method_idx).to_bytes(1, 'little'))
f.write(encode_uleb128(method_access).to_bytes(1, 'little'))
f.write(encode_uleb128(method_offset).to_bytes(2, 'little'))
Reversing Binaries On Linux
Linux ObjDump Disassembly
The most basic disassembler:
$ objdump -D xxx.bin -M intel
Linux GDB Disassembly
$ git clone https://github.com/longld/peda.git ~/peda
$ echo "source ~/peda/peda.py" >> ~/.gdbinit
$ echo "set disassembly-flavor intel" >> ~/.gdbinit
$ gdb -q xxx.bin
(gdb) disas main
(gdb) # refer to GDB documentation
(gdb) run
Linux Tracers
Linux commands strace and ltrace are very helpful to identify system and library calls in a program, which is helpful to reverse it.
$ strace xxx.bin # -f | -e open/... | -s 1000 | -y
syscall(args) = return_code
...
$ ltrace xxx.bin
Boomerang
boomerang (0.4k ⭐) that is somewhat able to reverse x86 binaries in an unreadable uncompilable C file.
Docker Installation
Save the code below in a dockerfile and run docker build -t boomrangcli:latest .
to build the docker image.
FROM ubuntu:22.04
# From https://github.com/BoomerangDecompiler/boomerang#building-on-linux
# [CHANGE] qt5-default => libqt5core5a libqt5gui5 libqt5widgets5 qtbase5-dev
RUN apt-get update && \
apt-get install -y git build-essential cmake \
qtbase5-dev libqt5core5a libqt5gui5 libqt5widgets5 \
libcapstone-dev flex bison
# [CHANGE] Used /opt
WORKDIR /opt
RUN git clone https://github.com/BoomerangDecompiler/boomerang.git
WORKDIR /opt/boomerang/build
RUN cmake .. && make -j$(nproc) && make install
# Remove the build folder
RUN rm -rf /opt/boomerang
# Don't run the tool as root
RUN useradd -ms /usr/sbin/nologin boomerang
WORKDIR /builds/
RUN chown -R boomerang:boomerang /builds/
USER boomerang
ENTRYPOINT ["/usr/local/bin/boomerang-cli"]
For instance to decompile ch1.bin
:
$ docker run -it -v $(pwd):/builds boomrangcli:latest ch1.bin
Linux Radare Disassembly
Radare (19.4k ⭐) is similar to GDB, but it somewhat easier to use if we only need to disassemble the code.
$ rabin2 -I xxx.bin # get information
$ rabin2 -z xxx.bin # list strings
$ r2 -d xxx.bin # Open in debug mode (if applicable)
$ r2 -A xxx.bin # Open and analyze (aaa)
$ r2 -qcizz xxx.bin
(r2) a? # help for analyze
(r2) aaa # analyze
(r2) vv # view disassembly, symbols, etc.
(r2) VV # view the program flow
(r2) afl # list function, can grep
(r2) pdf @main # disassemble 'main' (@sym.main)
(r2) oo # reload executable
(r2) db 0xAABBCCDD # breakpoint
(r2) dc # run the program, stop before breakpoint
(r2) ds # run one instruction
(r2) px @ address # display the memory at address
(r2) dr # display registry values
📚 When using pdf
such as pdf @main
, we can see a list of variables and their addresses. We can pass these addresses to px
.
Reversing Binaries On Windows
While still in progress, look here for PE notes.
Windows x64dbg debugger
You can use x64dbg (42.9k ⭐) to debug binaries.
- You can navigate to options to define the breakpoints. For instance, uncheck everything except Exit Breakpoint.
- The memory map tab can be used to find stuff like memory-mapped files (a file mapped to a memory region like a buffer).
- Double-click on an entry to see its bytes
- You may recognize a file from the magic code bytes
- Right-click on an address to dump its contents to a file
- Use 'Search for > Current Module > String references' to see strings and their address. You can double-click on an address to navigate to it (see also: Right-click > 'Toggle Breakpoint.')
- Place the cursor on an instruction, and press 'Spacebar' to edit it.
- Use CTRL+P to save the patched instructions.
➡️ See also: x64dbg unpack malware and OllyDbg.
Additional Tools
- WinDBG ( 👻)
Reversing Binaries On Any Platform
dogbolt Online Decompiler Explorer
dogbolt (1.8k ⭐) quickly test your code against many decompilers. It's quite handy during CTFs, but has some an implicit binary sharing policy and legal restrictions on private instances.
IDA Decompiler & Disassembler
IDA Pro is the most used and well-known compiler while it is paid. You can use the limited free version:
$ wget https://out7.hex-rays.com/files/idafree84_linux.run
$ chmod +x idafree84_linux.run
$ ./idafree84_linux.run
$ # assuming you installed it in $HOME/tools/
$ ln -s $HOME/tools/idafree-8.4/ida64 $HOME/.local/bin/ida
$ ida xxx.bin
Additional Notes
- Press F5 to use the third-party x64 free cloud decompiler.
→
denotes a jump while---→
denotes a conditional instruction
Binary Ninja
Binary Ninja is a paid decompiler and disassembler. You can use the limited free version:
$ mkdir -p $HOME/tools/ && cd $HOME/tools/
$ wget https://cdn.binary.ninja/installers/BinaryNinja-free.zip
$ unzip BinaryNinja-free.zip && rm BinaryNinja-free.zip
$ cd && ln -s $HOME/tools/binaryninja/binaryninja $HOME/.local/bin/binaryninja
Additional Tools
Firmware Reversing And Analysis
Encrypted Firmware
Using binwalk, you might be able to know if the firmware was encrypted, for instance, using gpg.
$ binwalk -E -N firmware.bin
📚 Previous versions may contain exposed encryption data.
Firmware Extraction
- Firmware ModKit (FMK) (0.8k ⭐): it uses binwalk to extract the filesystem. Can repack the modified firmware.
$ sudo apt install -y firmware-mod-kit
$ /opt/firmware-mod-kit/trunk/extract-firmware.sh firmware.bin
$ cd fmk/rootfs/gpg/ # find keys, and crack the passphrase
- firmwalker (1.0k ⭐): search for juicy files in the firmware
Additional Notes
Code Obfuscation
Moved to code obfuscation.
C Language Notes
Random notes for CTFs:
scanf("XXX%s", &s)
: must inputXXX<input>
; only input is stored.strcmp("abc", "zyx")
: returns 0, 1, or -1. But,strcmp(var, "xxx")
returns the ordinal difference of the first different char.
👻 To-do 👻
Stuff that I found, but never read/used yet.
Where to learn?
Tools
Note
- A program creating a file then deleting it =>
Disable inheritance, Convert to explicit, Select user > Edit > Show advanced permissions, uncheck both 'delete'
to prevent them from removing the file and allow us to read it. - Ghidra
sudo apt install ghidra -y
ghidra
> new project
> import file
> analyze
Windows
> Decompile: Main
> Functions
> Defined Strings
StackView
> Double-click on a variable to open