In the previous post, we started analyzing the inner workings of the Lexmark MS811n printer, extracting the Flash image from the NAND Flash by de-soldering it and reading its contents with a universal programmer.
To see exactly what we did, you can check out our blog, "Lexmark Printers Firmware Extraction – Part A."
At this point in the analysis, we had a 128MB binary blob with a Linux OS to work with. The next thing we wanted to do was extract the kernel and file systems from the Flash image.
To get the kernel and file systems, we ran binwalk on the Flash Image. Besides the "noise" that Binwalk puts out, it found one ELF file and eight different CramFS (see below - the other binwalk findings have been removed because they were too long).
CramFS is a simple compressed file system that is commonly used in embedded systems. Files on CRAMFS are contained on one page at a time, using z-lib-compression to allow random read access. The metadata is not compressed, but is expressed in a terse representation that is more space-efficient than conventional file systems.
CramFS has a header called "Superblock," which contains:
A magic number :0x28cd3d45 (big-endian)
The size of the filesystem
A signature: "Compressed ROMFS"
Unique info, such as CRC, edition, number of blocks and files, etc.
The name of the file system
A pointer to the first i-node
For each element in the file system, there is an i-node that holds the info of the object. The object types are: directory, file and symbolic link.
From the binwalk results, we saw that binwalk tried and failed to parse the superblock, which specifies the info inside of the block. We too tried and failed to extract the files (with -e flag).
Binwalk uses Cramfsck utils, which can be found as source code at GitHub: https://github.com/npitre/cramfs-tools. We decided to compare Cramfsck expectations with what we knew about the binary blob. First, we noticed Superblock at cramfsck was used for info purpose only - it does not affect the file system parsing. This means the i-node had somehow been modified or corrupted.
To understand it better, let's look at how Cramfsck parses the i-node. It has a struct that should fit the i-node:
After each i-node, there is the name of the object (file/directory) with the length that is stored at "namelen." From this struct, we saw the total size of each i-node should be 12 bytes ( 323 =96 bits), with the first 2 bytes representing the mode (the type of the object). This means we could look for a repetition of mode every 96 + (namelen4) bits to see any file sequences. Below is the first CramFS block at 0x1A0000:
We were able to see the file names and identify file sequences. We saw that after each file name there was a 0x6D81 hex value, which should be the file mode. If we excluded the length of the file_name, which we could identify by the string, we saw the repetition was every 16 bytes, even though we expected it to be every 12 bytes. This means the actual struct of the FS block was larger than what we expected.
At this point, we knew the size of the struct was 16 bytes (32*4 = 128 bits) and the first 16 bits were the mode. That left us with 112 bits to look at.
We wanted to identify the namelen, which is the length of the file name divided by 4. For example, let's look at a file name whose length is larger than 4 to avoid false positives. The image below is of file name "CCertUPC.bin". It has a length of 12. The value that namelen should hold is 12/4 =3.
We were able to find 0x3 at the 7th byte and see it was consistent for each and every file in the CramFS block. We concluded the namelen is at offset 0x6.
To find the file size, we looked at the boot log from the serial, which we explained in the previous blog post, to see if there was any clue of the file sizes that we could compare to the struct. We found a log calling to bootcmd and its value:
nand:0 bootcmd: setenv cramfsaddr 0x1f700000;nand device 1;nand read 0x1f700000 0x1a0000 0x33d8e0;sha1verify 0x1f700000 0x33d000 1;cramfsload 0x100000 /boot/main.img;source 0x100000;nand read 0x1f700000 0x1a0000 0x33d8e0;sha1verify 0x1f700000 0x33d000 1;cramfsload 0x100000 /boot/main.img;source 0x100000;
We used this log to see that Cramfsload was used to load a file from the first CramFS, which was the same CramFs we looked at. The file was loaded as "main.img" to 0x100000, so we deduced the i-node struct of "main.img" could be found at 0x1A00B8.
Right after the bootcmd command, we saw the following log, which meant "main.img" was 2371 bytes (0x943):
### CRAMFS load complete: 2371 bytes loaded to 0x100000
For the relevant i-node, we saw the offset 0x8 had the value 0x943 (little-endian):
From the original i-node struct (the one Cramfsck used), we knew that UID had 16 bits width and GID had eight bits width. At every i-node in the CramFs block, we saw at offset 0x2 there were two bytes of zeros, and at 0x11 there was one byte of zeros. Therefore, we concluded that UID is at offset 0x2 and GID at offset 0xA.
After all our assumptions, we are left with a file offset that should be 26 bits width. Within the i-node struct, we were left with the last 32 bits at offset 0xC and 16 bits at offset 0x04. At 0ffset 0x4 we always found 16 bits width of zeros, so we struck that out, so we deduced the last 32 bits probably contained the file offset. In the example above (main.img), this is the value 0x1F44 (little-endian).
While we expected the i-node struct to be built like this:
Our reverse engineering revealed it is actually built the following way:
To validate the modified i-node struct created from our reverse engineering, we ran a python script to execute the same methods as Cramfsck. This helped us understand the real offset.
From Cramfsck, the method to calculate the real offset is:
FsOffset = Superblock offset (in the example above this is 0x1A0000)
StartOfBlock is an offset inside the block that depends on the size of the file
As mentioned above, the file system was compressed with z-lib, which splits files into 4096-bits blocks and then compresses those blocks separately. So, the real size that we needed to read to decompress the file was:
To implement, we used the z-lib library to try to read the main.img file:
Good news, it worked!
After we validated that the only thing that changed was the i-node, we decided to patch Cramfsck and modify the i-node within the Lexmark printer. To ease the i-node structure, we decided to change the offset width to 32 bits. Because it is little-endian it changed nothing. therefore, the new struct is:
We then compiled the results:
After compiling it, we ran Cramfsck on a binary stream that contained the first CramFS at offset 0x00:
We saw from the files that it belonged to the bootloader (u-boot). As a result, we could extract all the eight CramFS that were contained within the Flash image (which is the 128MB that we extracted from the NAND Flash) and analyze it. From the file names, we could assume what each CramFS intended for:
It appears Lexmark made an effort to harden the reverse-engineering of its Firmware. The architecture they chose ensures every image and file is stored as CramFS, except the U-boot, which is stored as an .ELF file and can be found at 0xA0120. After we opened all of the file systems, we got the kernel, rootFS, and user-mod files. This allowed us to fully reverse-engineer the system and even patch and rewrite any file we wanted.
For information on our encrypted firmware update binary extraction, please read Part C of this blog.
Binwalk, github.com wiki. https://github.com/ReFirmLabs/binwalk/wiki
CramFS, kernel.org Documentation. https://www.kernel.org/doc/Documentation/filesystems/cramfs.txt
Zlib, Wikipedia the free encyclopedia. https://en.wikipedia.org/wiki/Zlib