خلاصه
مقدمه
II. کار مرتبط
III. روش
IV. آزمایش ها و نتایج
V. بحث و نتیجه گیری
منابع
Abstract
I. INTRODUCTION
II. RELATED WORK
III. METHODOLOGY
IV. EXPERIMENTS AND RESULTS
V. DISCUSSION AND CONCLUSIONS
REFERENCES
چکیده
اختلال دوقطبی یک اختلال سلامت روان است که باعث نوسانات خلقی می شود که از افسردگی تا شیدایی متغیر است. تشخیص بالینی اختلال دوقطبی بر اساس مصاحبه با بیمار و گزارش های به دست آمده از بستگان بیماران است. متعاقباً تشخیص بستگی به تجربه متخصص دارد و همراهی با سایر اختلالات روانی وجود دارد. پردازش خودکار در تشخیص اختلال دوقطبی میتواند به ارائه شاخصهای کمی کمک کند و امکان مشاهده آسانتر بیماران را برای دورههای طولانیتر فراهم کند. در این مقاله، ما یک سیستم تصمیم گیری چندوجهی برای طبقه بندی شیدایی سه سطحی بر اساس ضبط بیماران در روش های صوتی، زبانی و بصری ایجاد می کنیم. این سیستم بر اساس مجموعه اختلال دوقطبی ترکیه که اخیراً به جامعه علمی معرفی کرده ایم، ارزیابی شده است. تجزیه و تحلیل جامع سیستم های تک وجهی و چندوجهی و همچنین تکنیک های همجوشی انجام می شود. با استفاده از ویژگیهای صوتی، زبانی و بصری در یک سیستم همجوشی چندوجهی، ما به میانگین 64.8٪ نمره یادآوری وزننشده دست یافتیم که عملکرد پیشرفته را در این مجموعه داده ارتقا میدهد.
توجه! این متن ترجمه ماشینی بوده و توسط مترجمین ای ترجمه، ترجمه نشده است.
Abstract
Bipolar disorder is a mental health disorder that causes mood swings that range from depression to mania. Clinical diagnosis of bipolar disorder is based on patient interviews and reports obtained from the relatives of the patients. Subsequently, the diagnosis depends on the experience of the expert, and there is co-morbidity with other mental disorders. Automated processing in the diagnosis of bipolar disorder can help providing quantitative indicators, and allow easier observations of the patients for longer periods. In this paper, we create a multimodal decision system for three level mania classification based on recordings of the patients in acoustic, linguistic, and visual modalities. The system is evaluated on the Turkish Bipolar Disorder corpus we have recently introduced to the scientific community. Comprehensive analysis of unimodal and multimodal systems, as well as fusion techniques, are performed. Using acoustic, linguistic, and visual features in a multimodal fusion system, we achieved a 64.8% unweighted average recall score, which advances the state-of-the-art performance on this dataset.
Introduction
ASSESSMENT of mental health disorders from behavioral data using machine learning methods is a recently growing research area, with focused work including depression [1], anxiety disorders [2] and bipolar disorder [3]. Unobtrusive affective assessment makes it possible to observe multimodal responses during structured or semi-structured observation sessions, to derive indicators and deviations from behavior, or to observe subtle changes over time [3], [4]. While, fully automated diagnosis requires the integration of a comprehensive set of indicators and detailed patient history, automatic analysis of behavior can provide clinicians with useful quantitative measurement and monitoring tools [5].
Bipolar disorder (BD) is a mental health condition that causes extreme mood swings from elevated (mania, hypomania) to diminished state (depression), as well as mixed episodes, where depression and manic symptoms occur together. Its diagnosis is performed through a set of medical examinations administered by the psychiatrist, but may require lengthy observations of the patient as there is no comprehensive test [6]. There is a lot of co-morbidity with other mental disorders including, but not limited to, any anxiety disorder, conduct disorder, and substance use disorder [6]. The disease affects 2% of the population, sub-threshold forms (recurrent hypomania episodes without major depressive episodes) affect an additional 2%, and together, the lifetime prevalence estimates are 4.4% [7]. It is ranked as one of the top ten diseases of disability-adjusted life year indicator among young adults [8], and as the 17th leading source of disability among all diseases worldwide [9].
DISCUSSION AND CONCLUSIONS
In this paper, we investigated mania-level classification (mania, hypomania, remission) of bipolar disorder (BD) patients using the Turkish Audio-Visual BD dataset, and proposed a trimodal architecture. We have performed a comprehensive analysis of fusion of modalities for predicting mania levels. The results showed that multimodality improves the classification of bipolar disorder. The acoustic, textual, and visual modalities complement each other and using all three modalities gives the best performance. A fusion model of just the linguistic and acoustic modalities still performs well, while requiring less information. This may be important, in case a camera is considered to be intrusive in the assessment sessions.
The best performing system combines audio, video and linguistic modalities using modality-specific weighted score fusion of weighted and unweighted Kernel ELMs, decisions of which are finally fused using majority voting. We achieve 64.8% test set UAR on this configuration, which advances the state-of-the-art on the BD dataset. The unimodal test performance breakdown of the top multimodal systems confirm the robustness of acoustic eGeMAPS descriptors, which deserves further research in depression studies.