Friday, May 31, 2019

So sánh các ngôn ngữ trung gian IR!

Why did you choose VEX instead of another IR (such as LLVM, REIL, BAP, etc)?

We had two design goals in angr that influenced this choice:
  1. angr needed to be able to analyze binaries from multiple architectures. This mandated the use of an IR to preserve our sanity, and required the IR to support many architectures.
  2. We wanted to implement a binary analysis engine, not a binary lifter. Many projects start and end with the implementation of a lifter, which is a time consuming process. We needed to take something that existed and already supported the lifting of multiple architectures.
Searching around the internet, the major choices were:
  • LLVM is an obvious first candidate, but lifting binary code to LLVM cleanly is a pain. The two solutions are either lifting to LLVM through QEMU, which is hackish (and the only implementation of it seems very tightly integrated into S2E), or McSema, which only supported x86 at the time but has since gone through a rewrite and gotten support for x86-64 and aarch64.
  • TCG is QEMU's IR, but extracting it seems very daunting as well and documentation is very scarse.
  • REIL seems promising, but there is no standard reference implementation that supports all the architectures that we wanted. It seems like a nice academic work, but to use it, we would have to implement our own lifters, which we wanted to avoid.
  • BAP was another possibility. When we started work on angr, BAP only supported lifting x86 code, and up-do-date versions of BAP were only available to academic collaborators of the BAP authors. These were two deal-breakers. BAP has since become open, but it still only supports x86_64, x86, and ARM.
  • VEX was the only choice that offered an open library and support for many architectures. As a bonus, it is very well documented and designed specifically for program analysis, making it very easy to use in angr.
While angr uses VEX now, there's no fundamental reason that multiple IRs cannot be used. There are two parts of angr, outside of the angr.engines.vexpackage, that are VEX-specific:
  • the jump labels (i.e., the Ijk_Retfor returns, Ijk_Callfor calls, and so forth) are VEX enums.
  • VEX treats registers as a memory space, and so does angr. While we provide accesses to state.regs.raxand friends, on the backend, this does state.registers.load(8, 8), where the first 8is a VEX-defined offset for raxto the register file.
To support multiple IRs, we'll either want to abstract these things or translate their labels to VEX analogues.

Related Posts:

  • Tìm hiểu pyvexĐang ý tưởng tìm hiểu cách phát hiện mã độc đa nền tảng, tức học các mẫu mã độc trên x86 mà có thể phát hiện mã độc trên mips, arm... Một trong các hương đó là chuyển về mã trung gian chung Một trong các phương pháp tốt dựa t… Read More
  • LibVMMột trong các giải pháp thu thập thông tin mức thấp của các máy ảo, LibVM là một công cụ được phát triển với nhiều triển vọng. LibVMI là thư viện thu thập thông tin đọc, ghi vào bộ nhớ từ máy ảo (VMs). Nhằm sử dụng thuận lợi … Read More
  • Tìm hiểu Audit trên linux!!! Audit là một framework được tích hợp sẵn vào trong nhân linux, cung cấp thông tin chi tiết để thanh tra các hoạt động hệ thống một mức rất chi tiết.  Nó có thể: 1. Giúp người dùng giám sát các tiến trình có nguy hiểm đế… Read More
  • So sánh các ngôn ngữ trung gian IR! Why did you choose VEX instead of another IR (such as LLVM, REIL, BAP, etc)? We had two design goals in angr that influenced this choice: angr needed to be able to analyze binaries from multiple architectures. This m… Read More
  • Cơ bản về TFTP Usage: tftp [OPTION]... HOST [PORT] Transfers a file from/to a tftp server using "octet" mode. Options: -l FILE Local FILE. -r FILE Remote FILE. -g Get file. -p Put file. So try n… Read More

0 comments:

Post a Comment