تجزیه و تحلیل کلان داده مبتنی بر نگاشت کاهش PANFIS
ترجمه نشده

تجزیه و تحلیل کلان داده مبتنی بر نگاشت کاهش PANFIS

عنوان فارسی مقاله: تجزیه و تحلیل کلان داده مبتنی بر نگاشت کاهش PANFIS
عنوان انگلیسی مقاله: Big Data Analytics based on PANFIS MapReduce
مجله/کنفرانس: پروسدیای علوم کامپیوتر - Procedia Computer Science
رشته های تحصیلی مرتبط: مهندسی کامپیوتر، مهندسی فناوری اطلاعات
گرایش های تحصیلی مرتبط: رایانش ابری، شبکه های کامپیوتری، برنامه نویسی کامپیوتر، مهندسی نرم افزار، مهندسی الگوریتم ها و محاسبات
کلمات کلیدی فارسی: تجزیه و تحلیل جریان کلان داده، الگوریتم Distributed evolving، داده کاوی زمان واقعی اقتضایی، یادگیری موازی، استراتژی قانون ادغام
کلمات کلیدی انگلیسی: Big data stream analytic، Distributed evolving algorithm، Scalable real-time data mining، Parallel learning، Rule merging strategy
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.procs.2018.10.514
دانشگاه: La Trobe University, Melbourne, Australia
صفحات مقاله انگلیسی: 13
ناشر: الزویر - Elsevier
نوع ارائه مقاله: کنفرانس
نوع مقاله: ISI
سال انتشار مقاله: 2018
ایمپکت فاکتور: 1/013 در سال 2017
شاخص H_index: 34 در سال 2019
شاخص SJR: 0/258 در سال 2017
شناسه ISSN: 1877-0509
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
کد محصول: E11183
فهرست مطالب (انگلیسی)

Abstract

1- Introduction

2- PRELIMINARIES

3- The Proposed Approach

4- Experimental setup and results

5- Conclusions

6- Acknowledgement

References

بخشی از مقاله (انگلیسی)

Abstract

In this paper, a big data analytic framework is introduced for processing high-frequency data stream. This framework architecture is developed by combining an advanced evolving learning algorithm namely Parsimonious Network Fuzzy Inference System (PANFIS) with MapReduce parallel computation, where PANFIS has the capability of processing data stream in large volume. Big datasets are learnt chunk by chunk by processors in MapReduce environment and the results are fused by rule merging method, that reduces the complexity of the rules. The performance measurement has been conducted, and the results are showing that the MapReduce framework along with PANFIS evolving system helps to reduce the processing time around 22 percent in average in comparison with the PANFIS algorithm without reducing performance in accuracy.

Introduction

The rapid growth of data generated through the Internet, which causes big data, attracts great attention from many stakeholders. This phenomenon takes place in many areas in the real life such as business, management, medical, government, and society administration. Big data is unique of its 4Vs characteristics: volume, velocity, variety, and veracity. Volume related to the number of data generated in the storage, which is associated with the scale of data. Velocity indicates the flow rate of continuous data which is associated with the data streams. Variety of data is char-acterized by the number of different format of the data, whereas veracity shows the uncertainty of data where the data sources need to be validated (trustees, accuracy, and data quality). Big data provides enormous opportunities for government/organizations in discovering and extracting valuable information/knowledge of their system which can be beneficial for them in the decision-making processes. In the business area for example, Wal-Mart collaborated with Hewlett Packard trace every purchase record belonging to their customers from their point-of-sales terminals, where their transactions reach around 267 million per day. This valuable transaction data becomes a key basis for company to improve their benefits by applying pricing strategy and advertising campaigns [10, 6]. In this case, the decision could be made by applying some techniques such as data mining, which is extensively used for many decision-making problems in many real-life applications. However, discovering meaningful insight of big data is challenging due to its 4V’s characteristics, which lie on the difficulties in data capture, data storage, data analysis and data visualization [37, 6]. Therefore, an advanced data mining techniques and technologies are highly necessary to process and analyze big data. Big data is often stored in cloud to support the extensibility and scalability of local storage refers to one characteristic of big data, namely volume. In order to extract valuable information of big data efficiently, there is an urgent demand to modify existing data mining techniques to be scalable in processing large-scale dataset. This issue led to a necessity to the development distributed or parallelize scenario in processing big data. In addition, big data are also generated by the arrival of new instances continuously in either by batches or one by one, known as data stream, emerging from real-world applications[9, 33]. Therefore, it is necessary for machine learning algorithm to adapt to rapidly changing non-stationary data streams. Note that stream processing/mining in the web news domain has been conducted in [39] using eT2Class [31], which is able to handle streaming data[4]. This phenomenon triggers the development of evolving learning algorithms, which are able to learn big data continuously [13] by evolving its model to adjust the shift and drift of the big data pattern.