-
Notifications
You must be signed in to change notification settings - Fork 107
The wcc compiler takes binaries (ELF, PE, ...) as an input and creates valid ELF binaries as an output. It can be used to create relocatable object files from executables or shared libraries.
jonathan@blackbox:~$ wcc
Witchcraft Compiler Collection (WCC) version:0.0.6 (18:10:50 May 10 2024)
Usage: wcc [options] file
options:
-o, --output <output file>
-m, --march <architecture>
-e, --entrypoint <0xaddress>
-i, --interpreter <interpreter>
-p, --poison <poison>
-s, --shared
-c, --compile
-S, --static
-x, --strip
-X, --sstrip
-E, --exec
-C, --core
-O, --original
-D, --disasm
-d, --debug
-h, --help
-v, --verbose
-V, --version
jonathan@blackbox:~$
-o, --output <output file>
Speficy the desired output file name. Default: a.out
-m, --march <architecture>
Specify the desired output architecture. This option is ignored. Run the 64bit or the 32bit versions of wcc to produce 64 bits or 32 bits binaries respectively.
-e, --entrypoint <0xaddress>
Specify the address of the entry point as found in the ELF header manually.
-i, --interpreter <interpreter>
Specify a new program interpreter to be written to the interpreter segment of the output program.
-p, --poison <poison>
Specify a poison byte to be written in the unused bytes of the output file.
-s, --shared
Produce a shared library.
-c, --compile
Produce relocatable object files.
-S, --static
Produce a static binary.
-x, --strip
Do not use the Dynamic symbol table to unstrip the binary. Default: off.
-X, --sstrip
Strip more.
-E, --exec
Set binary type to ET_EXEC in the ELF header.
-C, --core
Set binary type to a Core file in the ELF header.
-O, --original
Copy original section headers from input file (which must be an ELF) instead of guessing them from bfd sections. Default: off.
-D, --disasm
Display application disassembly.
-d, --debug
Enable debug mode (very verbose).
-h, --help
Display help.
-v, --verbose
Be verbose.
-V, --version
Display version number.
The primary use of wcc is to "unlink" (undo the work of a linker) ELF binaries, either executables or shared libraries, back into relocatable shared objects. The following command line attempts to unlink the binary /bin/ls (from GNU binutils) into a relocatable file named /tmp/ls.o
jonathan@blackbox:~$ wcc -c /bin/ls -o /tmp/ls.o
jonathan@blackbox:~$
This relocatable file can then be used as if it had been directly produced by a compiler. The following command would use the gcc compiler to link /tmp/ls.o into a shared library /tmp/ls.so
jonathan@blackbox:~$ gcc /tmp/ls.o -o /tmp/ls.so -shared
jonathan@blackbox:~$
wcc will process any file supported by libbfd and produce ELF files that will contain the same mapping when relinked and executed. This includes PE or OSX COFF files in 32 or 64 bits. However, rebuilding relocations is currently supported only for Intel ELF x86_64 binaries. Transforming a PE into an ELF and invoking pure functions is for instance supported.
wcc uses libbfd to parse the sections of the input binary, and generates an ELF file with the corresponding Sections and Segments. wcc also handles symbols and symbol tables and attempts to unstrip stripped binaries by parsing their dynamic symbol tables. Relocations are recreated as needed for ELF Intel x86_64 input files. Help on extending to other cpus and relocation types very welcome :)
In order to observe more closely the output of wcc, let's take a look at /tmp/ls.o as parsed by readelf (GNU binutils package) editted for brevity:
jonathan@blackbox:~$ readelf -a /tmp/ls.o
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 2348624 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 9
Section header string table index: 8
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 0001ae00
00000000002191ec 0000000000000000 WAX 0 0 16
[ 2] .rodata PROGBITS 0000000000000000 00011f20
00000000000050fc 0000000000000000 A 0 0 32
[ 3] .data PROGBITS 0000000000000000 0001a3a0
0000000000000254 0000000000000000 WA 0 0 32
[ 4] .bss NOBITS 0000000000000000 0001a5f4
0000000000000d60 0000000000000000 WA 0 0 32
[ 5] .rela.all RELA 0000000000000000 00233fe0
0000000000007158 0000000000000018 A 7 1 8
[ 6] .strtab STRTAB 0000000000000000 0023b138
0000000000000dee 0000000000000000 0 0 1
[ 7] .symtab SYMTAB 0000000000000000 0023bf26
00000000000016f8 0000000000000018 6 5 8
[ 8] .shstrtab STRTAB 0000000000000000 0023d890
000000000000003e 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
Relocation section '.rela.all' at offset 0x233fe0 contains 1209 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000217eb0 000600000001 R_X86_64_64 0000000000000000 __ctype_toupper_loc + 0
000000217eb8 000700000001 R_X86_64_64 0000000000000000 __uflow + 0
000000217ec0 000800000001 R_X86_64_64 0000000000000000 getenv + 0
000000217ec8 000900000001 R_X86_64_64 0000000000000000 sigprocmask + 0
000000217ed0 000a00000001 R_X86_64_64 0000000000000000 raise + 0
000000217ed8 007b00000001 R_X86_64_64 00000000004021f0 free + 0
000000217ee0 000b00000001 R_X86_64_64 0000000000000000 localtime + 0
000000217ee8 000c00000001 R_X86_64_64 0000000000000000 __mempcpy_chk + 0
000000217ef0 000d00000001 R_X86_64_64 0000000000000000 abort + 0
000000217ef8 000e00000001 R_X86_64_64 0000000000000000 __errno_location + 0
000000217f00 000f00000001 R_X86_64_64 0000000000000000 strncmp + 0
...
00000000091f 000400000002 R_X86_64_PC32 0000000000000000 .bss + abd
000000000971 000400000002 R_X86_64_PC32 0000000000000000 .bss + ac1
000000000976 00020000000a R_X86_64_32 0000000000000000 .rodata + 1924
000000000988 000400000002 R_X86_64_PC32 0000000000000000 .bss + acd
0000000009b6 000400000002 R_X86_64_PC32 0000000000000000 .bss + ad1
0000000009ce 00020000000a R_X86_64_32 0000000000000000 .rodata + 1160
0000000009d3 00020000000a R_X86_64_32 0000000000000000 .rodata + 3ca8
000000000a0b 000400000002 R_X86_64_PC32 0000000000000000 .bss + b3e
000000000a12 000400000002 R_X86_64_PC32 0000000000000000 .bss + b46
000000000a26 000400000002 R_X86_64_PC32 0000000000000000 .bss + b0d
000000000a2f 000400000002 R_X86_64_PC32 0000000000000000 .bss + b36
000000000a39 000400000002 R_X86_64_PC32 0000000000000000 .bss + b2a
...
000000000b25 008500000002 R_X86_64_PC32 0000000000000000 optarg - 4
000000000b45 000400000002 R_X86_64_PC32 0000000000000000 .bss + ad1
000000000b50 000400000002 R_X86_64_PC32 0000000000000000 .bss + b3e
00000000240f 008200000002 R_X86_64_PC32 0000000000000000 stderr - 4
...
The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.
Symbol table '.symtab' contains 245 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
2: 0000000000000000 0 SECTION LOCAL DEFAULT 2 .rodata
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3 .data
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 .bss
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .unknown
6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __ctype_toupper_loc
7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __uflow
8: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getenv
9: 0000000000000000 0 FUNC GLOBAL DEFAULT UND sigprocmask
10: 0000000000000000 0 FUNC GLOBAL DEFAULT UND raise
11: 0000000000000000 0 FUNC GLOBAL DEFAULT UND localtime
12: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __mempcpy_chk
...
132: 0000000000411efc 0 NOTYPE WEAK DEFAULT UND old__fini
133: 0000000000000000 8 OBJECT GLOBAL DEFAULT UND optarg
134: 0000000000000000 100 FUNC GLOBAL DEFAULT 1 old_plt
135: 0000000000000738 100 FUNC GLOBAL DEFAULT 1 old_text
136: 00000000000104d5 100 FUNC GLOBAL DEFAULT 1 old_text_end
137: 000000000000b538 100 FUNC GLOBAL DEFAULT 1 internal_0040d6a0
138: 000000000000fd78 100 FUNC GLOBAL DEFAULT 1 internal_00411ee0
139: 000000000000c4d8 100 FUNC GLOBAL DEFAULT 1 internal_0040e640
140: 0000000000007ce8 100 FUNC GLOBAL DEFAULT 1 internal_00409e50
141: 000000000000ed28 100 FUNC GLOBAL DEFAULT 1 internal_00410e90
142: 000000000000ead8 100 FUNC GLOBAL DEFAULT 1 internal_00410c40
143: 00000000000075e8 100 FUNC GLOBAL DEFAULT 1 internal_00409750
144: 000000000000e9c8 100 FUNC GLOBAL DEFAULT 1 internal_00410b30
145: 0000000000007fb8 100 FUNC GLOBAL DEFAULT 1 internal_0040a120
146: 000000000000a6a8 100 FUNC GLOBAL DEFAULT 1 internal_0040c810
147: 000000000000c7c8 100 FUNC GLOBAL DEFAULT 1 internal_0040e930
148: 000000000000c498 100 FUNC GLOBAL DEFAULT 1 internal_0040e600
149: 000000000000c4c8 100 FUNC GLOBAL DEFAULT 1 internal_0040e630
150: 000000000000c4e8 100 FUNC GLOBAL DEFAULT 1 internal_0040e650
151: 0000000000002c68 100 FUNC GLOBAL DEFAULT 1 internal_00404dd0
...
241: 000000000000e958 100 FUNC GLOBAL DEFAULT 1 internal_00410ac0
242: 000000000000fbc8 100 FUNC GLOBAL DEFAULT 1 internal_00411d30
243: 000000000000fc48 100 FUNC GLOBAL DEFAULT 1 internal_00411db0
244: 000000000000fc88 100 FUNC GLOBAL DEFAULT 1 internal_00411df0
No version information found in this file.
jonathan@blackbox:~$
It is worth in particular noticing that wcc rebuilt different types of relocations under the new .rela.all section. It also stripped the sections non essential to a relocatable object file from the input binary, and rebuilt a symbol table. On this last topic, it is also worth noticing that wcc created new symbols named internal_00XXXXXX where 0xXXXXXX is the address of a static function within the binary, not normally exported. Finally, wcc also makes used of additional symbol tables to find the address of additional functions if any are available (parsing both symbol tables and dynamic symbol tables).