Reverse Engineering

Reverse engineering is the process of analyzing a program, or system to understand its structure, function, and behavior.

It often means getting back the original code/system using disassembling, decompiling, or similar methods.

It can be used to understand how a malware works 🛡️ or to find vulnerabilities in a program/system in a black-box assessment 💥.

Practice

For simple programs, we might be able to get the information we need using tracers such as:

  • strace: see every system call
  • strings: extract every readable string, may not be installed

For a preliminary analysis of your executable:

  • Using the file command on Linux
$ file some_executable

Java Reverse Engineering

JAR application

attacking_common_applications

You can extract a JAR archive using archive tools or:

$ jar xf xxx.jar

You can also create a JAR archive using archive tools or:

$ jar -cvf ../xxx.war *
$ jar -cmf ./META-INF/MANIFEST.MF ../xxx.jar *

If you plan to edit the JAR, you may have to remove every checksum from the MANIFEST.MF along with .RSA/.SF to bypass integrity checks. ⚠️ Note that MANIFEST.MF must ends with a blank line.

Before you modify a file, you need to create a raw copy in which you will inject .class before bundling them back to a JAR:

$ mkdir raw && cd raw
$ jar xvf ../xxx.jar && cd ..

Now, you can edit a file and transfer its compiled .class to "raw":

$ javac -cp xxx.jar source_code/path/to/file.java
$ cp -r source_code/path/to/*.class raw/path/to/file

Once you are done, pack "raw" back to a JAR:

$ jar -cmf raw/META-INF/MANIFEST.MF xxx.jar raw

Java jd - Decompiler

attacking_common_applications blocky

You can use jd-gui (13.4k ⭐) to reverse a Java application. Run jd-gui, and oad the JAR in it. You can then either:

  • Explore the reversed sources from jd-gui
  • Use File> Save all sources and read/modify them in your editor

Java - Other Decompilers

Other well-known decompilers:


.NET Reverse Engineering

adventofcyber2 chrome ctfcollectionvol1 pe_dotnet_0_protection pe_dotnet_basic_anti_debug godot_0_protection

You can work on reversing .NET executable and DLL files using the following tools...

De4Dot - .NET Decompiler

attacking_common_applications

You can use de4dot (6.7k ⭐, 2020 🪦) to reverse your binary. On Windows, drag and drop your binary onto the de4dot executable.


dnSpy - .NET Debug/Editor

attacking_common_applications

You can use dnSpy (25.2k ⭐, 2020 🪦) to explore .NET source code.


dotPeek - .NET Decompiler/Editor

adventofcyber2

You can use JetBrains dotPeek (free 🐲) on Windows to reverse your binary and explore the source code. Opening the file will automatically load the .NET solutions contained in the executable.

ILSpy - .NET Decompiler/Editor

adventofcyber2

ILSpy (19.9k ⭐) is the most popular open-source .NET decompiler. It can be integrated in editors such as VSCode or standalone editors.

On Linux, you can use the AvaloniaILSpy (1.4k ⭐) port.

$ cd /tmp
$ # download a release at https://github.com/icsharpcode/AvaloniaILSpy/releases
$ unzip Linux.x64.Release.zip && unzip ILSpy-linux-x64-Release.zip
$ mkdir -p $HOME/tools/ && mv artifacts/linux-x64/ $HOME/tools/AvaloniaILSpy
$ rm -rf Linux.x64.Release.zip ILSpy-linux-x64-Release.zip artifacts # cleanup
$ ln -s $HOME/tools/AvaloniaILSpy/ILSpy $HOME/.local/bin/ILSpy
$ ILSpy # run

Opening the executable will automatically load its .NET solutions. You can save an entire reversed solution or specific files by right-clicking on the target you want to save and selecting 'Save Code.'


Python Reverse Engineering

pyc_bytecode

Python bytecode files .pyc or .pyo for optimized bytecode are compiled cross-platform code that python can execute.

For Python <= 3.11, you can use pycdc (2.6k ⭐).

$ git clone https://github.com/zrax/pycdc.git && cd pycdc
$ mkdir build && cd build && cmake .. && make -j $(nproc)
$ ./pycdc ./xxx.pyc

For Python <= 3.8, you can use uncompyle6 (3.5k ⭐):

$ pipx install git+https://github.com/rocky/python-uncompyle6
$ uncompyle6 xxx.pyc

If your python version and the target python version is the same, you can use the builtin dis module:

import dis
import marshal
dis.dis(marshal.load(open('xxx.pyc', 'rb')))

Alternatively, python code can be bundled in a executable that doesn't require the python engine using PyInstaller.

You can extract its contents using pyinstxtractor (2.3k ⭐).


Android Reverse Engineering

apk_introduction apk_introduction

APK files contains multiple files such as classes.dex and other .dex files, resources, the manifest and certificate files, etc.

It's possible for code to exist within a DEX file but not detected nor reversed by tools such as JADX.

JADX — APK+Dex Decompiler/Disassembler

You can use jadx (38.5k ⭐) to decompile APK and DEX files to Java.

$ sudo apt install -y jadx

You can use either the CLI or the GUI:

$ jadx $(pwd)/basic_rev.apk -d $(pwd)/out
$ jadx-gui # and open your file

Android Studio For Reversed Code

You may open decompiled files in Android Studio. Create a new project, put your files inside (in java/ and res/), apply fixes if prompted, remove the automatically generated R.java, and run the app.

🐲 Android Studio has a DexViewer which you can use to see if your Dex file contains hidden methods.

Android Boot Image Unpacker

You can use mkbootimg developed by Google (unpack_bootimg.py):

$ sudo apt install -y mkbootimg
$ unpack_bootimg --boot_img boot.img
$ cd out && gunzip -c ramdisk | cpio -idmv

Additional tools: unpackbootimg (0.2k ⭐), mkbootimg_tools (0.5k ⭐, 2016 🪦) or abootimg (0.1k ⭐, 2012 🪦).

Additional Notes

androguard — APK+Dex Explorer/Disassembler

androguard (4.9k ⭐) is a powerful Python tool to explore APK/Dex files, but there is almost no documentation.

$ sudo apt install -y androguard
$ androguard analyze ./example.apk # or directly classes.dex
$ androguard analyze
prompt> from androguard.misc import AnalyzeAPK
prompt> a, d, dx = AnalyzeAPK("./example.apk")
prompt> d = DalvikVMFormat(a) # If 'd' is "empty"...

List every class in the DEX:

classes = [c for c in dx.get_classes() if not c.external] ; classes

List every method (look for methods not found by JADX):

for i, m in enumerate(d.methods.methods):
  print(m.get_class_name()+m.get_name(), 'has method idx=', i, '; hex=', hex(i))

You can detect code not associated with a method using:

known_offsets = [m.get_code().offset for m in d.get_methods() if m.get_code()]
for c in d.get_codes_item().code:
    if c.offset not in known_offsets:
        print("No method associated with code offset:", c.offset, "; hex=", hex(c.offset))

# View The ByteCode Given A Suspicous Offset
d.get_codes_item().get_code(0xffff).show()

We can declare the hidden method as a virtual method, rebuilt the DEX, and decompile it again using another tool.

  • Increase the number of virtual methods by 1
  • Write the method IDX, 0x1 for public, and the code offset to the file. All values are in hexadecimal and uleb128 formatted.
  • Compute the sha1 of file_size - 32 and write it at index 12
  • Compute the adler32 checksum of file[12:] and write it at index 8
Missing Additional Notes FixMePlease

A few more parameters are required to declare a virtual method. Assuming you have access to a class object (cf. classes), you need to find where is the number of virtual methods stored.

virtual_methods_index = hex(aClass.get_class().class_data_item.get_off() + 3) 

The class_data_item contains four values. The last one (notice the +3) is the number of virtual methods.

The second step is to find where we can add virtual method reference, e.g. virtual_methods_block_index. I don't know how to do that programmatically.

Python Code Samples
def update_sha1(input_file):
    import hashlib
    with open(input_file, 'rb+') as f:
        f.seek(0, 2)
        file_size = f.tell()
        f.seek(32)
        sha1 = hashlib.sha1(f.read(file_size - 32)).hexdigest()
        f.seek(12)
        f.write(bytes.fromhex(sha1))
        print("SHA1:", sha1)
def update_checksum(input_file):
    import zlib
    with open(input_file, 'rb+') as f:
        f.seek(12)
        checksum = '{:08x}'.format(zlib.adler32(f.read()) & 0xFFFFFFFF)
        f.seek(8)
        f.write(int(checksum, 16).to_bytes(4, byteorder='little'))
        print("Checksum: ", int(checksum, 16))
def encode_uleb128(value):
    encoded_bytes = bytearray()
    while True:
        byte = value & 0x7F
        value >>= 7
        if value != 0:
            byte |= 0x80
        encoded_bytes.append(byte)
        if value == 0:
            break
    return int.from_bytes(encoded_bytes, byteorder='little')


def modify_virtual_methods_size(input_file):
    virtual_methods_index = 0x3988
    virtual_methods_block_index = 0x39b1
    method_idx = 0x22
    method_access = 0x1
    method_offset = 0x1fcc

    with open(input_file, 'rb+') as f:
        # Determine The Current Value
        f.seek(virtual_methods_index)
        virtual_methods_size = int.from_bytes(f.read(1), 'little')

        # Add One
        f.seek(virtual_methods_index)
        virtual_methods_size += 1
        f.write(virtual_methods_size.to_bytes(1, 'little'))
        print("Virtual Methods Size:", virtual_methods_size)

        # Add The Given Method
        f.seek(virtual_methods_block_index)
        f.write(encode_uleb128(method_idx).to_bytes(1, 'little'))
        f.write(encode_uleb128(method_access).to_bytes(1, 'little'))
        f.write(encode_uleb128(method_offset).to_bytes(2, 'little'))

Reversing Binaries On Linux

compiled 0x41haz catpictures getting_started questionnaire reg elf_x86_0_protection elf_x86_basic

Linux ObjDump Disassembly

The most basic disassembler:

$ objdump -D xxx.bin -M intel

Linux GDB Disassembly

stack_based_buffer_overflows_linux_x86 attacking_common_applications

You can use peda and GDB.

$ git clone https://github.com/longld/peda.git ~/peda
$ echo "source ~/peda/peda.py" >> ~/.gdbinit
$ echo "set disassembly-flavor intel" >> ~/.gdbinit
$ gdb -q xxx.bin
(gdb) disas main
(gdb) # refer to GDB documentation
(gdb) run

Linux Tracers

mustacchio

Linux commands strace and ltrace are very helpful to identify system and library calls in a program, which is helpful to reverse it.

$ strace xxx.bin # -f | -e open/... | -s 1000 | -y
syscall(args) = return_code
...
$ ltrace xxx.bin

Boomerang

boomerang (0.4k ⭐) that is somewhat able to reverse x86 binaries in an unreadable uncompilable C file.

Docker Installation

Save the code below in a dockerfile and run docker build -t boomrangcli:latest . to build the docker image.

FROM ubuntu:22.04

# From https://github.com/BoomerangDecompiler/boomerang#building-on-linux
# [CHANGE] qt5-default => libqt5core5a libqt5gui5 libqt5widgets5 qtbase5-dev
RUN apt-get update && \
    apt-get install -y git build-essential cmake \
    qtbase5-dev libqt5core5a libqt5gui5 libqt5widgets5 \
    libcapstone-dev flex bison

# [CHANGE] Used /opt
WORKDIR /opt
RUN git clone https://github.com/BoomerangDecompiler/boomerang.git
WORKDIR /opt/boomerang/build

RUN cmake .. && make -j$(nproc) && make install

# Remove the build folder
RUN rm -rf /opt/boomerang

# Don't run the tool as root
RUN useradd -ms /usr/sbin/nologin boomerang
WORKDIR /builds/
RUN chown -R boomerang:boomerang /builds/
USER boomerang

ENTRYPOINT ["/usr/local/bin/boomerang-cli"]

For instance to decompile ch1.bin:

$ docker run -it -v $(pwd):/builds boomrangcli:latest ch1.bin

Linux Radare Disassembly

adventofcyber2

Radare (19.4k ⭐) is similar to GDB, but it somewhat easier to use if we only need to disassemble the code.

$ rabin2 -I xxx.bin # get information
$ rabin2 -z xxx.bin # list strings
$ r2 -d xxx.bin     # Open in debug mode (if applicable)
$ r2 -A xxx.bin     # Open and analyze (aaa)
$ r2 -qcizz xxx.bin
(r2) a?             # help for analyze
(r2) aaa            # analyze
(r2) vv             # view disassembly, symbols, etc.
(r2) VV             # view the program flow
(r2) afl            # list function, can grep
(r2) pdf @main      # disassemble 'main' (@sym.main)
(r2) oo             # reload executable
(r2) db 0xAABBCCDD  # breakpoint
(r2) dc             # run the program, stop before breakpoint
(r2) ds             # run one instruction
(r2) px @ address   # display the memory at address
(r2) dr             # display registry values

📚 When using pdf such as pdf @main, we can see a list of variables and their addresses. We can pass these addresses to px.


Reversing Binaries On Windows

While still in progress, look here for PE notes.

Windows x64dbg debugger

attacking_common_applications introduction_to_malware_analysis

You can use x64dbg (42.9k ⭐) to debug binaries.

  • You can navigate to options to define the breakpoints. For instance, uncheck everything except Exit Breakpoint.
  • The memory map tab can be used to find stuff like memory-mapped files (a file mapped to a memory region like a buffer).
    • Double-click on an entry to see its bytes
    • You may recognize a file from the magic code bytes
    • Right-click on an address to dump its contents to a file
  • Use 'Search for > Current Module > String references' to see strings and their address. You can double-click on an address to navigate to it (see also: Right-click > 'Toggle Breakpoint.')
  • Place the cursor on an instruction, and press 'Spacebar' to edit it.
  • Use CTRL+P to save the patched instructions.

➡️ See also: x64dbg unpack malware and OllyDbg.

Additional Tools


Reversing Binaries On Any Platform

dogbolt Online Decompiler Explorer

dogbolt (1.8k ⭐) quickly test your code against many decompilers. It's quite handy during CTFs, but has some an implicit binary sharing policy and legal restrictions on private instances.

IDA Decompiler & Disassembler

IDA Pro is the most used and well-known compiler while it is paid. You can use the limited free version:

$ wget https://out7.hex-rays.com/files/idafree84_linux.run
$ chmod +x idafree84_linux.run
$ ./idafree84_linux.run
$ # assuming you installed it in $HOME/tools/ 
$ ln -s $HOME/tools/idafree-8.4/ida64 $HOME/.local/bin/ida
$ ida xxx.bin

Additional Notes

  • Press F5 to use the third-party x64 free cloud decompiler.
  • denotes a jump while ---→ denotes a conditional instruction

Binary Ninja

Binary Ninja is a paid decompiler and disassembler. You can use the limited free version:

$ mkdir -p $HOME/tools/ && cd $HOME/tools/
$ wget https://cdn.binary.ninja/installers/BinaryNinja-free.zip
$ unzip BinaryNinja-free.zip && rm BinaryNinja-free.zip
$ cd && ln -s $HOME/tools/binaryninja/binaryninja $HOME/.local/bin/binaryninja

Additional Tools


Firmware Reversing And Analysis

Encrypted Firmware

adventofcyber4

Using binwalk, you might be able to know if the firmware was encrypted, for instance, using gpg.

$ binwalk -E -N firmware.bin

📚 Previous versions may contain exposed encryption data.

Firmware Extraction

adventofcyber4

  • Firmware ModKit (FMK) (0.8k ⭐): it uses binwalk to extract the filesystem. Can repack the modified firmware.
$ sudo apt install -y firmware-mod-kit
$ /opt/firmware-mod-kit/trunk/extract-firmware.sh firmware.bin
$ cd fmk/rootfs/gpg/ # find keys, and crack the passphrase
  • firmwalker (1.0k ⭐): search for juicy files in the firmware

Additional Notes

Code Obfuscation

Moved to code obfuscation.

C Language Notes

Random notes for CTFs:

  • scanf("XXX%s", &s): must input XXX<input> ; only input is stored.
  • strcmp("abc", "zyx"): returns 0, 1, or -1. But, strcmp(var, "xxx") returns the ordinal difference of the first different char.

👻 To-do 👻

Stuff that I found, but never read/used yet.

Where to learn?

Tools

Note

  • A program creating a file then deleting it => Disable inheritance, Convert to explicit, Select user > Edit > Show advanced permissions, uncheck both 'delete' to prevent them from removing the file and allow us to read it.
  • Ghidra
sudo apt install ghidra -y
ghidra
> new project
> import file
> analyze
Windows
> Decompile: Main
> Functions
> Defined Strings
StackView
> Double-click on a variable to open