بخشی از مقاله (انگلیسی)
Detection at an early stage is vital for the diagnosis of the majority of critical illnesses and is the same for identifying people suffering from depression. Nowadays, a number of researches have been done successfully to identify depressed persons based on their social media postings. However, an unexpected bias has been observed in these studies, which can be due to various factors like unequal data distribution. In this paper, the imbalance found in terms of participation in the various age groups and demographics is normalized using the one-shot decision approach. Further, we present an ensemble model combining SVM and KNN with the intrinsic explainability in conjunction with noisy label correction approaches, offering an innovative solution to the problem of distinguishing between depression symptoms and suicidal ideas. We achieved a final classification accuracy of 98.05%, with the proposed ensemble model ensuring that the data classification is not biased in any manner.
Depression is still one of the most significant challenges that the world is confronted with in the modern day, and if it is not handled, it may lead to thoughts of suicide as well as real attempts at suicide. A considerable obstacle exists on both an individual and a social level when it comes to the accurate diagnosis of depression and the determination of the point at which it becomes a risk factor for suicidal thoughts or behaviour. According to Goldman & Lewis (Goldman & Lewis, 2008), depression is a hidden mental disease that affects a person and it is also something that may happen to any individual who seems to be doing absolutely well. Moreover, it is one of the most prevalent forms of mental illness that can be found in our modern world, which is distinguished by a lightning-fast reliance on the progression of technological innovation. Even on the most basic level, the existing approaches fail to address the general population's collective needs for mental health care. It is important to note that no known treatment can completely reverse the effects of this illness. Hence it is of the utmost importance that we find out what is causing it and find a solution to the problem at its core so that it does not become significantly worsen over time.
Patients' suicidal notes and post-processing of the Electronic Health Records (EHR) of patients are generally the methods used in most instances for discovering such illness. However, the amount of information obtained from the said methods is often relatively restricted. In light of the fact that individuals use social media on a widespread scale and have easy access to these platforms in these days, researchers are analyzing individuals' behaviour via social media to battle the scarcity of data. An increasing number of studies are being carried out to comprehend suicidal ideas expressed on social media by applying natural language processing (NLP) and machine learning (ML) techniques (Boettcher, 2021; De Choudhury & De, 2014; Guo, 2011; Kim et al., 2020; Lokala et al., 2022; Low et al., 2020; Slemon et al., 2021; Xu et al., 2020). Reddit is one of the most prominent social media networks as it allows user-generated content to be placed on its site and users can discuss the difficulties which they encounter in their daily lives on the platform. As the Reddit platform enables users to express their problems anonymously and can receive assistance from within the platform, the chances of obtaining a large amount of reliable data on this platform are also significantly high. Using this data, we can figure out how likely someone currently going through the stages of depression will have suicidal thoughts or try to kill themselves. If persons of this sort are provided with early warning information, a significant number of lives may likely be spared. However, many studies are not nearly as accurate as they might be because these studies' data originate mainly from Reddit posts, which may be vulnerable to bias.
Early identification is key to treating most diseases, which is also true for those suffering from depression. This paper aimed to identify and evaluate individuals at risk for developing a mental disorder and to look for early warning indicators of suicidal thoughts and behaviours. This paper presents a reliable classifier that can compensate for any inherent bias that may have been introduced during the data gathering process to generate an objective model. Furthermore, the proposed ensemble model which includes clustering and classification presents a better result. A label correction with the NMT further tunes this method, and the bias variance is checked and fixed using the intrinsic explainability methods. We are able to achieve better classification accuracy, as evidenced in the results section, and the so achieved results can be confidently shown since we can ensure the fairness of the facts identified. As this is not a clinical study, the findings should only be utilised for research purposes. However, our approach may be used in a situation where it may give medical practitioners an extra tool for the diagnosis of particular patients. In future, we plan to improve the way mental health themes are categorised by using multiclass classification.