پیش بینی دیابت با تلفیق تکنیک های PCA و کی-میانگین

عنوان فارسی مقاله: مدل رگرسیون لجستیک بهبود یافته برای پیش بینی دیابت با تلفیق تکنیک های PCA و کی-میانگین

عنوان انگلیسی مقاله: Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques

مجله/کنفرانس: انفورماتیک در قفل پزشکی - Informatics In Medicine Unlocked

رشته های تحصیلی مرتبط: مهندسی کامپیوتر، ریاضی

گرایش های تحصیلی مرتبط: هوش مصنوعی، مهندسی نرم افزار، ریاضی کاربردی

کلمات کلیدی فارسی: کی-میانگین، دیابت، داده کاوی، رگرسیون لجستیک، PCA

کلمات کلیدی انگلیسی: PCA، K-means، Diabetes، Data mining، Logistic regression

نوع نگارش مقاله: مقاله پژوهشی (Research Article)

نمایه: Scopus - DOAJ

شناسه دیجیتال (DOI): https://doi.org/10.1016/j.imu.2019.100179

دانشگاه: School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, China

صفحات مقاله انگلیسی: 7

ناشر: الزویر - Elsevier

نوع ارائه مقاله: ژورنال

نوع مقاله: ISI

سال انتشار مقاله: 2019

ایمپکت فاکتور: 2/108 در سال 2018

شاخص H_index: 9 در سال 2019

شاخص SJR: 0/295 در سال 2018

شناسه ISSN: 2352-9148

شاخص Quartile (چارک): Q3 در سال 2018

فرمت مقاله انگلیسی: PDF

وضعیت ترجمه: ترجمه نشده است

قیمت مقاله انگلیسی: رایگان

آیا این مقاله بیس است: بله

آیا این مقاله مدل مفهومی دارد: دارد

آیا این مقاله پرسشنامه دارد: ندارد

آیا این مقاله متغیر دارد: دارد

کد محصول: E12734

رفرنس: دارای رفرنس در داخل متن و انتهای مقاله

فهرست مطالب (انگلیسی)

Abstract

1- Introduction

2- Related study

3- Methodology

4- Experimental result

5- Discussion

6- Conclusion and future work

References

بخشی از مقاله (انگلیسی)

Abstract

Diabetes causes a large number of deaths each year and a large number of people living with the disease do not realize their health condition early enough. In this study, we propose a data mining based model for early diagnosis and prediction of diabetes using the Pima Indians Diabetes dataset. Although K-means is simple and can be used for a wide variety of data types, it is quite sensitive to initial positions of cluster centers which determine the final cluster result, which either provides a sufficient and efficiently clustered dataset for the logistic regression model, or gives a lesser amount of data as a result of incorrect clustering of the original dataset, thereby limiting the performance of the logistic regression model. Our main goal was to determine ways of improving the k-means clustering and logistic regression accuracy result. Our model comprises of PCA (principal component analysis), k-means and logistic regression algorithm. Experimental results show that PCA enhanced the k-means clustering algorithm and logistic regression classifier accuracy versus the result of other published studies, with a k-means output of 25 more correctly classified data, and a logistic regression accuracy of 1.98% higher. As such, the model is shown to be useful for automatically predicting diabetes using patient electronic health records data. A further experiment with a new dataset showed the applicability of our model for the predication of diabetes.

Introduction

Diabetes stands among the top 10 causes of death for 2016. Diabetes killed 1.6 million people in 2016, up from less than 1 million in 2000. With this figure diabetes replaced HIV/AIDS as the seventh top cause of death [1]. The number of people with diabetes has risen from 108 million in 1980 to 422 million in 2014, with the global prevalence of diabetes among adults over 18 years of age rising from 4.7% in 1980 to 8.5% in 2014 [2]. By 2040, 642 million adults (1 in 10 adults) are expected to have diabetes. Also, 46.5% of those with diabetes have not been diagnosed [3]. In order to reduce the number of deaths attributable to diabetes, it is essential that methods and techniques that will aid in early diagnosis of diabetes be devised, because a large number of deaths in diabetic patients are due to late diagnosis. In order to achieve cutting-edge techniques for the early diagnosis of diabetes, we need to utilize advanced information technology, and data mining is a suitable field for this. Data mining offers the ability to extract and discover previously unknown, hidden, but interesting patterns from a large database repository. These patterns can aid medical diagnosis and decision-making. Various techniques and algorithms have been designed for application in extracting knowledge and information in the diagnosis and treatment of disease from medical databases. PCA is a simple, nonparametric method for extracting relevant information from confusing data sets [4]. When a large dataset is to be clustered into a user specified number of clusters (k), which are represented by their centroids, k-means will cluster the data by minimizing the squared error function [5], and often misclassifies some data due to outliers; also the time complexity will be greater. To overcome these problems, principal components analysis (PCA) can be used to reduce the dataset to a lower dimension, while ensuring that the least information is lost, and providing a better centroid point for clustering. K-means clustering partitions a dataset into different groups of similar objects. Clusters that are highly dissimilar from the others are regarded as outliers and discarded. Logistic regression is an efficient regression predictive analysis algorithm. Its application is efficient when the dependent variable of a dataset is dichotomous (binary).

دانلود رایگان مقاله انگلیسی

سفارش ترجمه این مقاله

مشاهده خریدهای قبلی

مقالات مشابه

نماد اعتماد الکترونیکی

پیوندها