McSema: I’m liftin’ it

McSema
, our x86 machine code to LLVM bitcode
binary translator, just got a fresh coat of paint. Last week we held a successful hackathon that produced substantial improvements to McSema’s usability, documentation, and code quality. It’s now easier than ever to use McSema to analyze and reverse-engineer binaries.

Growth stage

We use McSema
on a daily basis. It lets us find and retroactively harden binary programs against security bugs, independently validate vendor source code, and generate application tests with high code coverage. It is part of ongoing research, both in academia and in DARPA programs. We (and others
) are constantly extending it toanalyze increasingly complex programs.

You could say that McSema has been on a growth spurt since weopen-sourced it in 2014. Back then, LLVM 3.5 was new and shiny and that’s what McSema used. And that’s what it used in 2015. And in 2016. McSema stretched and grew, but some things stagnated. Over time an ache developed — a desire to modernize and to polish things off. Last week we massaged those growing pains away during our McSema usability hackathon.

Paying dividends

We made broad improvements to McSema. The code is cleaner than ever. It’s easier to install and is more portable. It runs faster and the code it produces is better.

Performance

McSema builds much faster than before. We simplified the build system by removing dead code and unneeded libraries, and by reorganizing the directory layout to be more descriptive.

McSema is faster at producing bitcode. We improved how McSema traverses the control flow graph, removed dependencies on Boost, and simplified bitcode generation.

McSema generates leaner and faster bitcode. McSema no longer stores and spills register context on entry and exit to functions. Flag operations use faster natural bitwidth operations instead of bit fields. McSema can now optimize the lazily generated bitcode to eliminate unused computations. The optimized bitcode is easier to analyze and truer to the intent of the original program.

Modernization

McSema now uses a stock distribution of LLVM 3.8. Previously, McSema used a custom modified version of LLVM 3.5. This upgrade brings in faster build times and more modern LLVM features. We have also eliminated McSema’s dependency on Boost, opting to use modern C++11 features instead.

Simplifications

The new command-line interface is more consistent and easier to use: mcsema-disass
disassembles binaries, and mcsema-lift
converts the disassembly into LLVM bitcode.

We removed bin_descend
, our custom binary disassembler. There is now only one supported decoder that uses IDA Pro as the disassembly engine.

The new code layout is simpler and more intuitive. The CMake scripts to build McSema are now smaller and simpler.

The old testing framework has been removed in favor of an integration testing based approach with no external dependencies.

New Features

McSema supports more instructions. We are always looking for help adding new instruction semantics, and we have updated our instruction addition guide
.

Mcsema will now tell you which instructions are supported and which are not, via the mcsema-lift --list-supported
command.

The new integration testing framework allows for easy addition of comprehensive translation tests, and there is a new guide about adding tests to McSema
.

Documentation

Our new documentation describes in detail how to install
, use
, test
, extend
, and debug
McSema’s codebase
. We have also documented common errors
and how to resolve them. These improvements will make it easier for third-parties to hack on McSema.

Runtime

McSema isn’t just for static analysis. The lifted bitcode can be compiled
back into a runnable program
. We improved McSema’s runtime footprint, making it faster, greatly reducing its memory usage, and making it able to seamlessly interact with native Windows and Linux code in complex ways.

Investing in the future

We will continue to invest in improving McSema. We are always expanding support for larger and more complex software. We hope to move to Binary Ninja for control flow recovery instead of IDA Pro. And we plan to add support for lifting ARM binaries to LLVM bitcode. We want to broaden McSema’s applicability to include analyzing mobile apps and embedded firmware.

We are looking for interns that are excited about the possibilities
of McSema. Looking to get started? Try out the walkthrough
of translating a real Linux binary. After that, see how McSema can enable tools like libFuzzer to work on binaries
. Finally, contact us and tell us where you’d like to take McSema. If we like it and you have a plan then we will pay you to make it happen.

稿源:Trail of Bits Blog (源链) | 关于 | 阅读提示

本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 综合技术 » McSema: I’m liftin’ it

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录