Abstract
I. Introduction
II. Background
III. Extracting Features of a Binary Function
IV. Semantic Learning Predictor
V. Evaluation
Authors
Figures
References
Abstract
With the popularity of IoT (Internet of Things) devices, the security risks of these devices are increasing. However, due to the multisource heterogeneity of IoT devices, there are significant differences between the vulnerability detection of the Internet of Things and the PC-based vulnerability search method. Therefore, determining how to accurate search for vulnerabilities in large-scale cross-platform binary executable files is an urgent problem to be solved. At present, the solution to this problem mostly calculates code similarities by generating a CFG (control flow graph) from binary code, but due to the choice of architecture, OS (operating system) or compilation options, the same source code will be compiled into different assembly codes. The performance of existing vulnerability search methods for cross-architecture binaries has been challenged. To alleviate the vast differences in the assembly codes caused by different compilation scenarios, this paper proposes a cross-platform large-scale binary vulnerability search method based on two-level feature semantic learning. The contribution is that we have defined a new functional structured signature method to mitigate the massive grammatical and structural differences of binary files caused by different compilation environments. Moreover, we reasonably integrate the hierarchical model of Structure2Vec and GAT (graph attention network) and implement training from the internal control flow characteristics of the function and the call relationship between functions to obtain a more accurate functional semantic expression.
Introduction
Using open source code or using third-party libraries is a common approach in the development process, and the same vendor often reuses code, which also provides fertile ground for the generation and survival of vulnerabilities. If an organization does not fully understand all of the code it uses or there are bugs in the code, it will not be able to withstand common attacks against known vulnerabilities in these components, and it will also be exposed to risk [25], [30]. It is foreseeable that the same vulnerability function with different architectures may appear in a large number of IoT devices. To address this critical issue, researchers are devoting their efforts to developing automated analysis technologies to meet the needs of IoT product security testing [1]–[3], [26], [27]. In response to a wide variety of IoT devices, the ability to perform vulnerability searches in an efficient and accurate manner is becoming increasingly important. This vulnerability search technology will enable security practitioners to find problems with high efficiency, saving time and resources.