HIMALIA: Recovering compiler optimization levels from binaries by deep learning

Yu Chen, Zhiqiang Shi, Hong Li, Weiwei Zhao, Yiliang Liu, Yuansong Qiao

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    19 Citations (Scopus)

    Abstract

    Compiler optimization levels are important for binary analysis, but they are not available in COTS binaries. In this paper, we present the first end-to-end system called HIMALIA which recovers compiler optimization levels from disassembled binary code without any knowledge of the target instruction set semantics. We achieve this by formulating the problem as a deep learning task and training a two layer recurrent neural network. Besides the recurrent neural network, HIMALIA is also powered by two other techniques: instruction embedding and a new function representation method. We implement HIMALIA and carry out comprehensive experiments on our dataset consisting of 378,695 different functions from 5828 binaries compiled by GCC. The results show that HIMALIA exhibits accuracy of around 89%. Moreover, we find that HIMALIA’s learnt model is explicable: it can auto-learn common compiler conventions and idioms that match our prior knowledge.

    Original languageEnglish
    Title of host publicationIntelligent Systems and Applications - Proceedings of the 2018 Intelligent Systems Conference IntelliSys Volume 1
    EditorsKohei Arai, Supriya Kapoor, Rahul Bhatia
    PublisherSpringer-Verlag GmbH and Co. KG
    Pages35-47
    Number of pages13
    ISBN (Print)9783030010539
    DOIs
    Publication statusPublished - 2018
    EventIntelligent Systems Conference, IntelliSys 2018 - London, United Kingdom
    Duration: 6 Sep 20187 Sep 2018

    Publication series

    NameAdvances in Intelligent Systems and Computing
    Volume868
    ISSN (Print)2194-5357
    ISSN (Electronic)2194-5365

    Conference

    ConferenceIntelligent Systems Conference, IntelliSys 2018
    Country/TerritoryUnited Kingdom
    CityLondon
    Period6/09/187/09/18

    Keywords

    • Binary analysis
    • Feature embedding
    • Model explicable
    • RNN
    • Reverse engineering

    Fingerprint

    Dive into the research topics of 'HIMALIA: Recovering compiler optimization levels from binaries by deep learning'. Together they form a unique fingerprint.

    Cite this