مروری بر داده کاوی از منابع اطلاعاتی متعدد
ترجمه نشده

مروری بر داده کاوی از منابع اطلاعاتی متعدد

عنوان فارسی مقاله: مروری بر داده کاوی از منابع اطلاعاتی متعدد
عنوان انگلیسی مقاله: Review on mining data from multiple data sources
مجله/کنفرانس: اسناد تشخیص الگو – Pattern Recognition Letters
رشته های تحصیلی مرتبط: مهندسی صنایع
گرایش های تحصیلی مرتبط: داده کاوی
کلمات کلیدی فارسی: داده کاوی منابع چندگانه، تجزیه و تحلیل الگو، طبقه بندی داده ها، خوشه بندی داده ها، تلفیق داده
کلمات کلیدی انگلیسی: Multiple data source mining, Pattern analysis, Data classification, Data clustering, Data fusion
نوع نگارش مقاله: مقاله کوتاه (Short Communication)
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.patrec.2018.01.013
دانشگاه: Institute of Natural and Mathematical Sciences – Massey University – New Zealand
صفحات مقاله انگلیسی: 9
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2018
ایمپکت فاکتور: 2.694 در سال 2017
شاخص H_index: 129 در سال 2019
شاخص SJR: 0.662 در سال 2017
شناسه ISSN: 0167-8655
شاخص Quartile (چارک): Q1 در سال 2017
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
کد محصول: E5871
فهرست مطالب (انگلیسی)

Abstract

1- Introduction

2- Key methods for pattern analysis

3- Key methods for data source classification and clustering

4- Key methods for data source fusion

5- Conclusion and future work

Acknowledgments

References

بخشی از مقاله (انگلیسی)

Abstract

In this paper, we review recent progresses in the area of mining data from multiple data sources. The advancement of information communication technology has generated a large amount of data from different sources, which may be stored in different geological locations. Mining data from multiple data sources to extract useful information is considered to be a very challenging task in the field of data mining, especially in the current big data era. The methods of mining multiple data sources can be divided mainly into four groups: (i) pattern analysis, (ii) multiple data source classification, (iii) multiple data source clustering, and (iv) multiple data source fusion. The main purpose of this review is to systematically explore the ideas behind current multiple data source mining methods and to consolidate recent research results in this field. 

Introduction

The advancement of information communication technology has generated a large amount of data from different sources, which may be stored in different geological locations. Each database may have its own structure to store data. Mining multiple data sources [1–3] distributed at different geological locations to discover useful patterns are critical important for decision making. In particular, the Internet can be seen as a large, distributed data repository consisting of a variety of data sources and formats, which can provide abundant information and knowledge. Data from different sources may seem irrelevant to each other. Once information generated from different sources is integrated, new and useful knowledge may emerge. Here is an excellent example of how an organization to utilize mining data from different data sources to obtain profound information, which cannot obtain from an individual source. The Australian Taxation Office (ATO) mines data from different data sources such as social media posts, private school records and immigration data to detect tax cheats. Mining data from different data sources become a sophisticated tool to crackdown tax ∗ Corresponding author. E-mail address: jwt@escience.cn (W. Ji). cheats that yielded nearly $10 billion in 2016 [4]. For example, in a normal Australian family, the husband has a business and reported $80,000 of taxable income per year, putting him just inside the second-lowest tax bracket, and his wife reported earning $60,000 per year. But the data collected from different data sources revealed that the family had three children at private schools at an estimated cost of $75,000 per year, while immigration records and social media posts showed that the family had recently taken five business-class flights and a holiday in a Canadian ski resort, Whistler. It means their declared incomes did not match their lifestyle. This prompted ATO to contact them to confirm if they have unpaid taxes. From the above example, we can see that developing an effective data mining technique for mining from multiple data sources to discover useful information is crucially important for decision making.