چکیده
مقدمه
کارهای مرتبط
زمینه
طراحی تجربی
نتایج تجربی و بحث
نتیجه گیری و کار آینده
منابع
Abstract
Introduction
Related Works
Background
Experimental Design
Experimental Results and Discussion
Conclusions and Future Work
References
چکیده
الگوریتمهای خوشهبندی دادهها در شناسایی نقاط دادهای که نویز یا پرت هستند، چالشهایی را تجربه میکنند. از این رو، این مقاله یک معیار اتصال پیشرفته را بر اساس رویکرد تشخیص پرت برای مشکلات خوشهبندی داده چند هدفه پیشنهاد میکند. هدف الگوریتم پیشنهادی بهبود کیفیت راه حل با استفاده از روش عامل پرت محلی (LOF) با سنجش اعتبار اتصال است. این اصلاح برای انتخاب مکانیسم نقطه داده همسایه اعمال می شود که می تواند برای حذف چنین نقاط پرت اصلاح شود. عملکرد رویکرد پیشنهادی با استفاده از الگوریتمهای چند هدفه بر روی هشت مجموعه دادههای دو بعدی مصنوعی و واقعی ارزیابی میشود. اعتبار خارجی با استفاده از اندازه گیری F ارزیابی می شود، در حالی که ماتریس های ارزیابی عملکرد برای ارزیابی کیفیت از مجموعه های بهینه پارتو مانند پوشش و تولید بردار غیر غالب کلی استفاده می شود. نتایج تجربی ما ثابت کرد که روش تشخیص داده های پرت پیشنهادی، عملکرد الگوریتمهای خوشهبندی دادههای چند شی ای را افزایش داده است.
توجه! این متن ترجمه ماشینی بوده و توسط مترجمین ای ترجمه، ترجمه نشده است.
Abstract
Data clustering algorithms experience challenges in identifying data points that are either noise or outlier. Hence, this paper proposes an enhanced connectivity measure based on the outlier detection approach for multi-objective data clustering problems. The proposed algorithm aims to improve the quality of the solution by utilising the local outlier factor method (LOF) with the connectivity validity measure. This modification is applied to select the neighbour data point’s mechanism that can be modified to eliminate such outliers. The performance of the proposed approach is assessed by applying the multi-objective algorithms to eight real-life and seven synthetic two-dimensional datasets. The external validity is evaluated using the F-measure, while the performance assessment matrices are employed to assess the quality of Pareto-optimal sets like the coverage and overall non-dominant vector generation. Our experimental results proved that the proposed outlier detection method has enhanced the performance of the multi-objective data clustering algorithms.
Introduction
Data clustering intends to arrange collections of data points using similarity functions that can be employed next to understand the data. A diversity of applications utilised the data clustering algorithms to recognise the embedded structures within the data, and to analyse a precise collection of clusters to be additionally investigated and to recognise each cluster feature [1, 2]. Consequently, the quality of the clusters can be handled by utilising the internal validity/similarity measures, such as connectedness, compactness, and isolation. The data clustering validity measures serve as an important part in the development of the clustering algorithms, which are built based on distance measures such as the k-means partitioning algorithm. In general, the partitioning algorithms aim to identify spherically shaped clusters, but it is inefficient to recognise arbitrarily shaped clusters like non-convex or interlaced clusters that are studied in several applications. Moreover, the partitioning algorithms experience challenges in recognising data points that are either outlier or noise [3]. Unlike other validity measures, cluster connectivity works indifferently with the shape of clusters [4], which decides the degree to which neighbours of a data point have been located in the corresponding cluster. However, the robustness of the connectivity measure depends on the associated L-nearest neighbour [5, 6]. These neighbours concerned in quantifying the connectivity measure can contain outliers, which can extremely influence the accuracy of the connectedness based on non-reliable data points that can be a form of outliers [7]. Therefore, choosing a proper neighbour data point’s mechanism can be adjusted to eliminate such outliers, to enhance the performance of the connectivity measure. Data clustering and outlier detection share a corresponding relationship, in which a data point is recognised as a cluster member or an outlier. Data clustering algorithms commonly incorporate a mechanism for managing the outliers that eliminate these data points from the clusters. The applicability across the different problem fields is one significant problem for the outlier analysis [7–10]. Also, the effectiveness of an outlier analysis algorithm is quantified with the performance of the resolution of different thresholds for the outlier score.
Conclusions and Future Work
In this paper, an enhanced connectivity measure based on the LOF outlier detection method (Conn_LOF) is offered to enhance the performance of the connectivity measure by eliminating the outliers. To examine the efficiency of the proposed Conn_LOF method, it is employed within the competing algorithms and tested on eight real-life datasets with a variety of complexity obtained from the UCI repository of the machine learning database. Thus, the efficiency of the competing algorithms is tested on seven synthetic two-dimensional synthetic datasets with different cluster shapes and characteristics. The experimental results show that the performance of the modified eNSGA-II and eSPEA-II enhanced by adopting the Conn_LOF method concerning the average, and the standard deviation results of the F-measure. Thus, the multi-objective performance assessment matrices are used to evaluate the quality of the Pareto-optimal sets that include coverage and overall non-dominant vector generation. Furthermore, the Conn_LOF outlier detection method is proven to be effective when combined with the clustering algorithms to provide better Pareto-front solutions with efficient clustering measures for datasets with varying characteristics and complexity.