خلاصه
1. معرفی
2. آثار مرتبط
3. روش شناسی
4. تکنیک های تجزیه و تحلیل
5. بررسی تکنیک های تشخیص ضعف کدگذاری در باینری های نرم افزار
6. ارزیابی ابزارهای منبع باز
7. بحث
8. محدودیت ها و تهدیدهای اعتبار
9. کار آینده
10. نتیجه گیری
بیانیه مشارکت نویسنده CRediT
اعلامیه منافع رقابتی
سپاسگزاریها
منابع
Abstract
1. Introduction
2. Related works
3. Methodology
4. Analysis techniques
5. Survey of techniques to detect coding weaknesses in software binaries
6. Evaluation of open-source tools
7. Discussion
8. Limitations and threats to validity
9. Future work
10. Conclusions
CRediT authorship contribution statement
Declaration of competing interest
Acknowledgements
References
چکیده:
آسیبپذیریهای نرمافزاری ناشی از ضعفهای کدنویسی و شیوههای توسعه ضعیف رایج هستند. مهاجمان می توانند از این آسیب پذیری ها سوء استفاده کنند و بر امنیت و حریم خصوصی کاربران نهایی تأثیر بگذارند. اکثر نرم افزارهای کاربر نهایی به صورت باینری برنامه توزیع می شوند. بنابراین، برای افزایش اعتماد به نرمافزارهای شخص ثالث، محققان تکنیکها و ابزارهایی را برای شناسایی و رفع کلاسهای مختلف ضعف کدگذاری در نرمافزارهای باینری ساختهاند. انگیزه کار ما نیاز به بررسی وضعیت پیشرفته و درک تواناییها و چالشهای پیش روی تکنیکهای سطح باینری است که برای شناسایی مهمترین نقاط ضعف کدگذاری در باینریهای نرمافزاری ساخته شدهاند. بنابراین، در این مقاله ابتدا بحرانی ترین نقاط ضعف کدنویسی برای زبان های برنامه نویسی کامپایل شده را نشان می دهیم. سپس تکنیکهای استاتیکی را که برای شناسایی هر یک از این ضعفهای کدنویسی در باینریهای نرمافزار توسعه داده شدهاند بررسی، کاوش و مقایسه میکنیم. هدف دیگر ما در این کار کشف و گزارش وضعیت پیادهسازیهای منبع باز منتشر شده تکنیکهای امنیتی سطح باینری استاتیک است. برای چارچوبهای منبع باز که مطابق مستند کار میکنند، ما به طور مستقل کارآیی آنها را در شناسایی آسیبپذیریهای کد در مجموعهای از باینریهای برنامه ارزیابی میکنیم. تا جایی که ما می دانیم، این اولین کاری است که به بررسی و ارزیابی عملکرد تکنیک های پیشرفته در سطح باینری برای شناسایی نقاط ضعف در نرم افزار باینری می پردازد.
Abstract
Software vulnerabilities resulting from coding weaknesses and poor development practices are common. Attackers can exploit these vulnerabilities and impact the security and privacy of end-users. Most end-user software is distributed as program binaries. Therefore, to increase trust in third-party software, researchers have built techniques and tools to detect and resolve different classes of coding weaknesses in binary software. Our work is motivated by the need to survey the state-of-the-art and understand the capabilities and challenges faced by binary-level techniques that were built to detect the most important coding weaknesses in software binaries. Therefore, in this paper, we first show the most critical coding weaknesses for compiled programming languages. We then survey, explore, and compare the static techniques that were developed to detect each such coding weakness in software binaries. Our other goal in this work is to discover and report the state of published open-source implementations of static binary-level security techniques. For the open-source frameworks that work as documented, we independently evaluate their effectiveness in detecting code vulnerabilities on a suite of program binaries. To our knowledge, this is the first work that surveys and independently evaluates the performance of state-of-the-art binary-level techniques to detect weaknesses in binary software.
Introduction
Technology and software have become integral to our daily lives. More software is now present in more systems, including many embedded devices, like refrigerators and microwave ovens, to cars and planes. Additionally, new software features continue to be added as the hardware, including the processor, memory, and storage, becomes faster, larger, and/or more capable. Thus, software programs also continue to grow in size and perhaps, complexity.
Given this growth in the amount of software in use, it is no surprise that the number of reported code vulnerabilities has been increasing in number and severity for many years [1]. At the same time, software vulnerabilities have been found to cause many disastrous real-world attacks [2], [3].
Software vulnerabilities are caused by weaknesses or flaws in the program code. These weaknesses may then be exploited to compromise the security or integrity of the system. Code in any language can be insecure when it is not developed with due care. However, some programming languages are designed with features that make them immune or more resistant to certain types of weaknesses. Such safer languages typically provide built-in mechanisms for memory management, input validation, type safety, and other security-related features. Languages like Rust and Go belong to this category of safe programming languages.
Alternatively, unsafe programming languages, like C and C++, are low-level languages with poor built-in memory, type, and thread safety. Code bugs and missing safety oversight for vulnerable code constructs are widespread in programs written using these languages [4]. In spite of these safety concerns and even though memory-safe language alternatives are available, C/C++ remains popular due to the large amount of existing legacy code, and low-level features of these languages that are desired by many performance and memory critical, embedded, and real-time systems. Consequently, C and C++ separately and consistently rank among the top five most popular programming languages according to the TIOBE index 1.
Conclusions
Our goal in this work was to comprehensively review and compare past research in static analysis based approaches to detect the most important CWE categories for program binaries. Another major goal was to evaluate the accuracy of open-source tools built to detect each studied program’s weaknesses. We made many significant, interesting, and novel discoveries and observations in this work. First, we found that we currently lack tools and techniques to accurately detect many important classes of errors in binary software. Second, we found that much research is not available in the open-source domain, and even the tools that exist are often not maintained and lack critical support. Third, many research works only evaluate their techniques on small benchmarks, and their results may not adequately represent performance in real-world applications. Fourth, many CWE detection techniques suffer from a high incidence of false positives and false negatives, underscoring the need for refinement and enhancement of existing techniques and tools. Thus, this work distinguishes itself as the first survey of binary-level CWE detection techniques, and the first independent assessment of binary-level open-source tools for identifying software weaknesses, offering valuable insights and setting the stage for further advancements in this critical field.