عملکرد الگوریتم های خوشه بندی
ترجمه نشده

عملکرد الگوریتم های خوشه بندی

عنوان فارسی مقاله: ارزیابی عملکرد الگوریتم های خوشه بندی برای اندازه متغیر و ابعاد مجموعه داده ها
عنوان انگلیسی مقاله: Performance evaluation of clustering algorithms for varying cardinality and dimensionality of data sets
مجله/کنفرانس: مواد امروزی: اقدامات – materials today: proceedings
رشته های تحصیلی مرتبط: مهندسی کامپیوتر، مهندسی فناوری اطلاعات
گرایش های تحصیلی مرتبط: مهندسی الگوریتم و محاسبات، اینترنت و شبکه های گسترده
کلمات کلیدی فارسی: الگوریتم های خوشه بندی، کیفیت خوشه بندی، عملکرد خوشه بندی، رسانه های اجتماعی، زمان چرخش
کلمات کلیدی انگلیسی: Clustering algorithms، Clustering quality، Clustering performance، Social media، Turnaround time
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.matpr.2020.01.110
دانشگاه: Department of Computer Applications, Cochin University of Science and Technology, Kochi, Kerala 682022, India
صفحات مقاله انگلیسی: 7
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2020
ایمپکت فاکتور: 0.967 در سال 2019
شاخص H_index: 18 در سال 2020
شاخص SJR: 0.299 در سال 2019
شناسه ISSN: ۲۲۱۴-۷۸۵۳
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
آیا این مقاله مدل مفهومی دارد: ندارد
آیا این مقاله پرسشنامه دارد: ندارد
آیا این مقاله متغیر دارد: ندارد
کد محصول: E14657
رفرنس: دارای رفرنس در داخل متن و انتهای مقاله
فهرست مطالب (انگلیسی)

Abstract

۱٫ Introduction

۲٫ Antecedents

۳٫ Related works

۴٫ Methodology

۵٫ Empirical research

۶٫ Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

References

بخشی از مقاله (انگلیسی)

Abstract

Clustering is the most widely used unsupervised machine learning technique, having extensive applications in statistical analysis. We have multiple clustering algorithms available in theory and many more implementations available in practice. A bunch of literatures can be found focusing on the quality of clustering algorithms using various internal and external evaluation techniques. The motivation behind this work is the scarcity of literatures dealing with performance of clustering algorithms in terms of turnaround time. This paper summarizes the experimental analysis conducted on the performance of multiple clustering algorithms based on cardinality and dimensionality. The analysis is performed in R, which is a free and open source programming language mainly used for statistical computing. This work evaluates nine key algorithms coming under partitioning, hierarchical, density-based and model-based clustering approaches using different social media data sets. We captured performance trends of these algorithms in terms of turnaround time by varying the cardinality and dimensionality parameters of the data sets. Based on our experiments, CLARA, CLARANS, and k-means algorithms demonstrate best performances with varying cardinality. It is also observed that changes in dimensionality do not impact hierarchical clustering approaches whereas there is a positive influence on the execution time for partitioning, density-based and model-based clustering approaches.

Introduction

Data mining [1] is the process of extracting meaningful information from raw data through which underlying patterns and relationships are revealed. These revelations form useful knowledge that can be made use of various scientific, educational, and/or industrial scenarios. Based on the type of patterns to be processed, we can adopt appropriate data mining strategies which include, but not limited to classification, clustering, association, regression, etc. Clustering is the machine learning technique used for creating logical groups of similar entities from a data set. The aim of clustering process is to create distinct groups of elements in such a way that the entities from the same group will have similar properties whereas entities from different groups have dissimilar properties. It is an unsupervised learning technique which is widely used for per-forming statistical analysis of data. Since the volume of data being processed is increasing on a daily basis, clustering is extensively applied in almost all industrial segments. This work covers an empirical analysis of the performance of nine different clustering algorithms [2]. We captured the average processing time for each algorithm against varying number of records (cardinality) with constant number of attributes (dimensionality), and varying number of attributes with same number of records. The experiments were conducted using two distinct social media data sets.