خلاصه
معرفی
مواد و روش ها
نتایج و بحث
نتیجه
منابع
Abstract
Introduction
Materials and Methods
Results and Discussion
Conclusion
References
چکیده
این مطالعه با هدف بررسی کاربرد تکنیکهای یادگیری ماشین برای پیشبینی بیماری انجام شد. سه الگوریتم یادگیری ماشین محبوب، Random Forest، Support Vector Machines و Naive Bayes مورد استفاده قرار گرفتند و عملکرد آنها مورد ارزیابی قرار گرفت. نتایج نشان داد که بهترین مدل بر اساس الگوریتم جنگل تصادفی با میانگین دقت 87 درصد بود. این مدل به منظور دستیابی به عملکرد بهتر نیز تنظیم شده است که با دقت 90 درصد به دست آمده است. این مطالعه پتانسیل هوش مصنوعی را در پیشبینی بیماری برجسته میکند و بینشهایی در مورد اهمیت انتخاب الگوریتم و تنظیم برای عملکرد بهینه ارائه میدهد.
Abstract
This study aimed to investigate the application of machine learning techniques for disease prediction. Three popular machine learning algorithms, Random Forest, Support Vector Machines and Naive Bayes, were employed and their performance was evaluated. Results showed that the best performing model was based on Random Forest algorithm with the average accuracy of 87%. This model has been additionally tuned in order to achieve even better performance, which resulted with 90% accuracy. This study highlights the potential of AI in disease prediction and provides insights into the importance of algorithm selection and tuning for optimal performance.
Introduction
From a mathematical perspective, learning can bedefined as gaining awareness through study, experience, analysis, or instruction. Nevertheless, when given much thought, the process of learning cannot be described by merewords, since it is a subject to change and is unique for every individual [1]. In machine learning, pattern recognition serves as the basis for computational learning. Data which is fed into it as input is used to extract knowledge or information. [2]. This field represents a crossroads between mathematics, which provides all of the methods, concepts and theories that are relevant to the field, statistics, which specializes in making predictions based on the data available, and artificial intelligence, which is nowadays a shorthand for any task which a computer is capable of performing on par with or better than an individual. With the advent of machine learning, the indivisible component of ourlives has morphed into an integral part of our lives [3]. The algorithms of machine learning are strongly influencing us,whether they are used for selecting a movie or a TV series to watch or guiding you through the process of resetting your mixer (chatbots). There are a number of fields that are being impacted by this evolution, including transportation, gaming, environmental protection, security, media, healthcare, and the list is endless [4]. It is estimated that the world will have ashortage of 12.9 million health care workers by 2035,which if not addressed soon may have serious implications for the health of billions of people around the globe.
Conclusion
This research was focused on building the ML model that would be able to predict patient’s disease based on given symptoms. It consisted part of data preparation and augmentation, building three models based on three different classification algorithms and then evaluating their results.
In our case, the model which accomplished the best results was the Random Forest model, was further tuned and trained in order to achieve better accuracy. Final model accuracy was improved by 3% using Grid Search CV technique. However, further exploring of dataset and developing of the model that uses deep neural networks will be subject of our further research in order to determine whether or not it is possible to achieve even higher accuracy then what was presented in this paper.