پیش بینی شکست در سیستم های توزیع شده
ترجمه نشده

پیش بینی شکست در سیستم های توزیع شده

عنوان فارسی مقاله: پیش بینی شکست در سیستم های توزیع شده چند لایه
عنوان انگلیسی مقاله: Predicting Failures in Multi-Tier Distributed Systems
مجله/کنفرانس: مجله سیستم ها و نرم افزار – Journal of Systems and Software
رشته های تحصیلی مرتبط: مهندسی کامپیوتر
گرایش های تحصیلی مرتبط: مهندسی الگوریتم و محاسبات، هوش مصنوعی، رایانش ابری
کلمات کلیدی فارسی: پیش بینی شکست، سیستم های توزیع شده چند لایه، سیستمهای خود ترمیم، تجزیه و تحلیل داده ها، یادگیری ماشین، رایانش ابری
کلمات کلیدی انگلیسی: Failure prediction, Multi-tier distributed systems, Self-healing systems, Data analytics, Machine learning, Cloud computing
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.jss.2019.110464
دانشگاه: Università di Milano-Bicocca, Milan, Italy
صفحات مقاله انگلیسی: 66
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2020
ایمپکت فاکتور: 4.018 در سال 2019
شاخص H_index: 94 در سال 2020
شاخص SJR: 0.550 در سال 2019
شناسه ISSN: 0164-1212
شاخص Quartile (چارک): Q2 در سال 2019
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
آیا این مقاله مدل مفهومی دارد: ندارد
آیا این مقاله پرسشنامه دارد: ندارد
آیا این مقاله متغیر دارد: ندارد
کد محصول: E14505
رفرنس: دارای رفرنس در داخل متن و انتهای مقاله
فهرست مطالب (انگلیسی)

Abstract

۱٫ Introduction

۲٫ The PreMiSE approach

۳٫ Offline model training

۴٫ Online failure prediction

۵٫ Evaluation methodology

۶٫ Experimental results

۷٫ Related work

۸٫ Conclusions

Declaration of Competing Interest

Acknowledgments

Appendix A. KPI List

References

بخشی از مقاله (انگلیسی)

Abstract

Many applications are implemented as multi-tier software systems, and are executed on distributed infrastructures, like cloud infrastructures, to benefit from the cost reduction that derives from dynamically allocating resources ondemand. In these systems, failures are becoming the norm rather than the exception, and predicting their occurrence, as well as locating the responsible faults, are essential enablers of preventive and corrective actions that can mitigate the impact of failures, and significantly improve the dependability of the systems. Current failure prediction approaches suffer either from false positives or limited accuracy, and do not produce enough information to effectively locate the responsible faults. In this paper, we present PreMiSE, a lightweight and precise approach to predict failures and locate the corresponding faults in multi-tier distributed systems. PreMiSE blends anomaly-based and signature-based techniques to identify multi-tier failures that impact on performance indicators, with high precision and low false positive rate. The experimental results that we obtained on a Cloud-based IP Multimedia Subsystem indicate that PreMiSE can indeed predict and locate possible failure occurrences with high precision and low overhead.

Introduction

Multi-tier distributed systems are systems composed of several distributed nodes organized in layered tiers. Each tier implements a set of conceptually homogeneous functionalities that provides services to the tier above in the layered structure, while using services from the tier below in the layered structure. The distributed computing infrastructure and the connection among the vertical and horizontal structures make multi-tier distributed systems extremely complex and difficult to understand even for those who developed them. Indeed, runtime failures are becoming the norm rather than the exception in many multi-tier distributed systems, such as ultra large systems [1] systems of systems [2, 3] and cloud systems [4, 5, 6]. In these systems, failures become unavoidable due to both their characteristics and the adoption of commodity hardware. The characteristics that increase the chances of failures are the increasing size of the systems, the growing complexity of the system–environment interactions, the heterogeneity of the requirements and the evolution of the operative environment. The adoption of low quality commodity hardware is becoming common practice in many contexts, notably in cloud systems [7, 8], and further reduces the overall system reliability. Limiting the occurrences of runtime failures is extremely important in many common applications, where runtime failures and the consequent reduced dependability negatively impact on the expectations and the fidelity of the customers, and becomes a necessity in systems with strong dependability requirements, such as telecommunication systems that telecom companies are migrating to cloud-based solutions [7].