پیش بینی دیابت با تلفیق تکنیک های PCA و کی-میانگین
ترجمه نشده

پیش بینی دیابت با تلفیق تکنیک های PCA و کی-میانگین

عنوان فارسی مقاله: مدل رگرسیون لجستیک بهبود یافته برای پیش بینی دیابت با تلفیق تکنیک های PCA و کی-میانگین
عنوان انگلیسی مقاله: Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques
مجله/کنفرانس: انفورماتیک در قفل پزشکی - Informatics In Medicine Unlocked
رشته های تحصیلی مرتبط: مهندسی کامپیوتر، ریاضی
گرایش های تحصیلی مرتبط: هوش مصنوعی، مهندسی نرم افزار، ریاضی کاربردی
کلمات کلیدی فارسی: کی-میانگین، دیابت، داده کاوی، رگرسیون لجستیک، PCA
کلمات کلیدی انگلیسی: PCA، K-means، Diabetes، Data mining، Logistic regression
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
نمایه: Scopus - DOAJ
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.imu.2019.100179
دانشگاه: School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, China
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2019
ایمپکت فاکتور: 2/108 در سال 2018
شاخص H_index: 9 در سال 2019
شاخص SJR: 0/295 در سال 2018
شناسه ISSN: 2352-9148
شاخص Quartile (چارک): Q3 در سال 2018
فرمت مقاله انگلیسی: PDF
تعداد صفحات مقاله انگلیسی: 7
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: بله
آیا این مقاله مدل مفهومی دارد: دارد
آیا این مقاله پرسشنامه دارد: ندارد
آیا این مقاله متغیر دارد: دارد
کد محصول: E12734
رفرنس: دارای رفرنس در داخل متن و انتهای مقاله
فهرست انگلیسی مطالب

Abstract


1- Introduction


2- Related study


3- Methodology


4- Experimental result


5- Discussion


6- Conclusion and future work


References

نمونه متن انگلیسی مقاله

Abstract


Diabetes causes a large number of deaths each year and a large number of people living with the disease do not realize their health condition early enough. In this study, we propose a data mining based model for early diagnosis and prediction of diabetes using the Pima Indians Diabetes dataset. Although K-means is simple and can be used for a wide variety of data types, it is quite sensitive to initial positions of cluster centers which determine the final cluster result, which either provides a sufficient and efficiently clustered dataset for the logistic regression model, or gives a lesser amount of data as a result of incorrect clustering of the original dataset, thereby limiting the performance of the logistic regression model. Our main goal was to determine ways of improving the k-means clustering and logistic regression accuracy result. Our model comprises of PCA (principal component analysis), k-means and logistic regression algorithm. Experimental results show that PCA enhanced the k-means clustering algorithm and logistic regression classifier accuracy versus the result of other published studies, with a k-means output of 25 more correctly classified data, and a logistic regression accuracy of 1.98% higher. As such, the model is shown to be useful for automatically predicting diabetes using patient electronic health records data. A further experiment with a new dataset showed the applicability of our model for the predication of diabetes.


Introduction


Diabetes stands among the top 10 causes of death for 2016. Diabetes killed 1.6 million people in 2016, up from less than 1 million in 2000. With this figure diabetes replaced HIV/AIDS as the seventh top cause of death [1]. The number of people with diabetes has risen from 108 million in 1980 to 422 million in 2014, with the global prevalence of diabetes among adults over 18 years of age rising from 4.7% in 1980 to 8.5% in 2014 [2]. By 2040, 642 million adults (1 in 10 adults) are expected to have diabetes. Also, 46.5% of those with diabetes have not been diagnosed [3]. In order to reduce the number of deaths attributable to diabetes, it is essential that methods and techniques that will aid in early diagnosis of diabetes be devised, because a large number of deaths in diabetic patients are due to late diagnosis. In order to achieve cutting-edge techniques for the early diagnosis of diabetes, we need to utilize advanced information technology, and data mining is a suitable field for this. Data mining offers the ability to extract and discover previously unknown, hidden, but interesting patterns from a large database repository. These patterns can aid medical diagnosis and decision-making. Various techniques and algorithms have been designed for application in extracting knowledge and information in the diagnosis and treatment of disease from medical databases. PCA is a simple, nonparametric method for extracting relevant information from confusing data sets [4]. When a large dataset is to be clustered into a user specified number of clusters (k), which are represented by their centroids, k-means will cluster the data by minimizing the squared error function [5], and often misclassifies some data due to outliers; also the time complexity will be greater. To overcome these problems, principal components analysis (PCA) can be used to reduce the dataset to a lower dimension, while ensuring that the least information is lost, and providing a better centroid point for clustering. K-means clustering partitions a dataset into different groups of similar objects. Clusters that are highly dissimilar from the others are regarded as outliers and discarded. Logistic regression is an efficient regression predictive analysis algorithm. Its application is efficient when the dependent variable of a dataset is dichotomous (binary).

  • اشتراک گذاری در

دیدگاه خود را بنویسید:

تاکنون دیدگاهی برای این نوشته ارسال نشده است

پیش بینی دیابت با تلفیق تکنیک های PCA و کی-میانگین
نوشته های مرتبط
مقالات جدید
لوگوی رسانه های برخط

logo-samandehi

پیوندها