کاوش مجموعه اقلام پرتکرار مبتنی بر GPU برای داده های در مقیاس بزرگ
ترجمه نشده

کاوش مجموعه اقلام پرتکرار مبتنی بر GPU برای داده های در مقیاس بزرگ

عنوان فارسی مقاله: : یک روش سریع کاوش مجموعه اقلام پرتکرار مبتنی بر GPU برای داده های در مقیاس بزرگ: GMiner
عنوان انگلیسی مقاله: A fast GPU-based frequent itemset mining method for large-scale data: GMiner
مجله/کنفرانس: مجله علوم اطلاعات - Information Sciences
رشته های تحصیلی مرتبط: مهندسی کامپیوتر
گرایش های تحصیلی مرتبط: مهندسی نرم افزار، مهندسی الگوریتم ها و محاسبات
کلمات کلیدی فارسی: کاوش مجموعه اقلام پرتکرار، واحد پردازش گرافیکی، الگوریتم موازی، ریزش کاری
کلمات کلیدی انگلیسی: frequent itemset mining، graphics processing unit، parallel algorithm، workload skewness
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.ins.2018.01.046
دانشگاه: DGIST (Daegu Gyeongbuk Institute of Science and Technology), Daegu, Republic of Korea
صفحات مقاله انگلیسی: 45
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2018
ایمپکت فاکتور: 5/080 در سال 2017
شاخص H_index: 142 در سال 2019
شاخص SJR: 1/635 در سال 2017
شناسه ISSN: 0020-0255
شاخص Quartile (چارک): Q1 در سال 2017
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: بله
کد محصول: E10893
فهرست مطالب (انگلیسی)

Abstract

1- Introduction

2- Related work

3- TFL strategy

4- HIL Strategy

5- Multiple GPUs and cost model

6- Performance evaluation

7- Summary

References

بخشی از مقاله (انگلیسی)

Abstract

Frequent itemset mining is widely used as a fundamental data mining technique. However, as the data size increases, the relatively slow performances of the existing methods hinder its applicability. Although many sequential frequent itemset mining methods have been proposed, there is a clear limit to the performance that can be achieved using a single thread. To overcome this limitation, various parallel methods using multi-core CPU, multiple machine, or many-core graphic processing unit (GPU) approaches have been proposed. However, these methods still have drawbacks, including relatively slow performance, data size limitations, and poor scalability due to workload skewness. In this paper, we propose a fast GPU-based frequent itemset mining method called GMiner for large-scale data. GMiner achieves very fast performance by fully exploiting the computational power of GPUs and is suitable for large-scale data. The method performs mining tasks in a counterintuitive way: it mines the patterns from the first level of the enumeration tree rather than storing and utilizing the patterns at the intermediate levels of the tree. This approach is quite effective in terms of both performance and memory use in the GPU architecture. In addition, GMiner solves the workload skewness problem from which the existing parallel methods suffer; as a result, its performance increases almost linearly as the number of GPUs increases. Through extensive experiments, we demonstrate that GMiner significantly outperforms other representative sequential and parallel methods in most cases, by orders of magnitude on the tested benchmarks.

Introduction

As a fundamental data mining technique, frequent itemset mining is widely used in a wide range of disciplines such as market basket analysis, web usage mining, social network analysis, intrusion detection, bioinformatics, and recommendation systems. However, the deluge of data generated by automated systems for diagnostic or analysis purposes makes it difficult or even impossible to apply mining techniques in many realworld applications. The existing methods often fail to find frequent itemsets in such big data within a reasonable amount of time. Thus, in terms of computational time, itemset mining is still a challenging problem that has not yet been completely solved. Many sequential frequent itemset mining methods such as Apriori [2], Eclat [35], FP-Growth [14], and LCM [30] use a single CPU thread. However, these singlethreaded applications all have a fundamental mining performance limit because CPU clock speed is generally no longer increasing. To overcome the single-thread performance limit, multiple parallel frequent itemset mining methods have been proposed. These methods can be categorized into three main groups: (1) (CPU-based) multithreaded methods, (2) distributed methods, and (3) graphic processing unit (GPU)- based methods. We omit the term ”multi-thread” from the GPU-based methods because they are obviously multi-threaded. The first group focuses on accelerating the performance of the single-threaded methods by exploiting multi-core CPUs [20, 24, 26, 27, 29], while the second group tries to accelerate the performance by exploiting multiple machines [12, 17, 19]. Details about these methods are available in recent survey studies.