Abstract
1- Introduction
2- Related work
3- Indexed list based incremental high utility pattern mining without candidate generation
4- Performance evaluation
5- Conclusions
References
Abstract
Since traditional frequent pattern mining approaches assume that all the items in binary databases have the same importance regardless of their own features, they have difficulty in satisfying requirements of real world applications such as finding patterns with high profits. High utility pattern mining was proposed to deal with such an issue, and various relevant works have been researched. There have been demands for efficient solutions to find interesting knowledge from specific environments in which data accumulates continuously with the passage of time such as social network service, wireless network sensor data, etc. Although several algorithms have been devised to mine high utility patterns from incremental databases, they still have performance limitations in the process of generating a large number of candidate patterns and identifying actually useful results from the found candidates. In order to solve the problems, we propose a new algorithm for mining high utility patterns from incremental databases. The newly proposed data structures in a list form and mining techniques allow our approach to extract high utility patterns without generating any candidates. In addition, we suggest restructuring and pruning techniques that can process incremental data more efficiently. Experimental results on various real and synthetic datasets demonstrate that the proposed algorithm outperforms state-of-the-art methods in terms of runtime, memory, and scalability.
Introduction
As one of various data mining techniques [1,2], pattern mining has played an important role in finding useful knowledge hidden from large amounts of data [3–10]. As a fundamental research topic of pattern mining, frequent pattern mining identifies all of the patterns that appear more frequently than a user-defined threshold from a given binary database [3,11]. A binary database is composed of a number of transactions, each of which has multiple items. Then, each item is denoted as 0 (absence) or 1 (presence). Traditional approaches assume that all the items in such a binary type of data have the same importance regardless of their own characteristics. Hence, they mine patterns simply considering item frequencies with various real-world features ignored. However, each item in real-world databases has its own importance as well as quantity information that may exceed 1. In this regard, frequent pattern mining does not satisfy the need to find patterns with high profits in real-world applications such as market analysis. High utility pattern mining has been researched as an important topic in the pattern mining area in order to deal with such problems [12– 15]. High utility pattern mining ([16,17]; Yun, Ryang, & Ryu, 2016) considers non-binary features and the distinct importance of items to the mining process, which has made it possible to extract more useful patterns, called high utility patterns, satisfying a user-given threshold. In recent years, large-scale data have been generated continually in various real-world applications such as web data, wireless network sensor data, and social network service. As a result, this trend has increased the importance of data analysis.