طبقه بندی مجموعه داده های دی‌ ان‌ ای متیلاسیون بزرگ
ترجمه نشده

طبقه بندی مجموعه داده های دی‌ ان‌ ای متیلاسیون بزرگ

عنوان فارسی مقاله: طبقه بندی مجموعه داده های دی‌ ان‌ ای متیلاسیون بزرگ برای شناسایی رانندگان سرطانی
عنوان انگلیسی مقاله: Classification of Large DNA Methylation Datasets for Identifying Cancer Drivers
مجله/کنفرانس: تحقیقات کلان داده - Big Data Research
رشته های تحصیلی مرتبط: پزشکی، مهندسی فناوری اطلاعات
گرایش های تحصیلی مرتبط: ژنتیک پزشکی، آسیب شناسی پزشکی، ایمنی شناسی پزشکی، مدیریت سیستم های اطلاعاتی
کلمات کلیدی فارسی: طبقه بندی، یادگیری ماشین، دی‌ ان‌ ای متیلاسیون، سرطان، مدل های پیشگویانه بیماری تشخیصی، الگوریتم ها و تکنیک هایی برای سرعت بخشیدن، تجزیه و تحلیل کلان داده های پزشکی
کلمات کلیدی انگلیسی: Classification، Machine learning، DNA methylation، Cancer، Disease diagnostic predictive models، Algorithms and techniques to speed up the، analysis of big medical data
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.bdr.2018.02.005
دانشگاه: Institute of Systems Analysis and Computer Science, National Research Council, Via dei Taurini 19, 00185 Rome, Italy
صفحات مقاله انگلیسی: 8
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2018
ایمپکت فاکتور: 7/184 در سال 2017
شاخص H_index: 12 در سال 2019
شاخص SJR: 0/757 در سال 2017
شناسه ISSN: 2214-5796
شاخص Quartile (چارک): Q1 در سال 2017
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
کد محصول: E11084
فهرست مطالب (انگلیسی)

Abstract

1- Introduction

2- Methods

3- Results

4- Discussion

5- Conclusion

References

بخشی از مقاله (انگلیسی)

Abstract

DNA methylation is a well-studied genetic modification crucial to regulate the functioning of the genome. Its alterations play an important role in tumorigenesis and tumor-suppression. Thus, studying DNA methylation data may help biomarker discovery in cancer. Since public data on DNA methylation become abundant – and considering the high number of methylated sites (features) present in the genome – it is important to have a method for efficiently processing such large datasets. Relying on big data technologies, we propose BIGBIOCL an algorithm that can apply supervised classification methods to datasets with hundreds of thousands of features. It is designed for the extraction of alternative and equivalent classification models through iterative deletion of selected features. We run experiments on DNA methylation datasets extracted from The Cancer Genome Atlas, focusing on three tumor types: breast, kidney, and thyroid carcinomas. We perform classifications extracting several methylated sites and their associated genes with accurate performance (accuracy >97%). Results suggest that BIGBIOCL can perform hundreds of classification iterations on hundreds of thousands of features in few hours. Moreover, we compare the performance of our method with other state-of-the-art classifiers and with a wide-spread DNA methylation analysis method based on network analysis. Finally, we are able to efficiently compute multiple alternative classification models and extract – from DNA-methylation large datasets – a set of candidate genes to be further investigated to determine their active role in cancer. BIGBIOCL, results of experiments, and a guide to carry on new experiments are freely available on GitHub at https://github.com/fcproj/BIGBIOCL.

Introduction

Tumor, or neoplasm, is a mass of tissue originated from an abnormal and uncontrolled division of eukaryotic cells. When tumoral cells invade and destroy surrounding tissues, the tumor is malignant and it is called cancer. According to the World Health Organization (http://www.who.int/mediacentre/factsheets/ fs297/en/), nearly one six of death are caused by cancer. Since cancer is one of the leading causes of mortality, it is worth noting that research to fully understand its mechanisms and discover new ways to prevent and to treat this disease is fundamental to the human race. Transformation of healthy cells to tumoral ones is a complex process resulting from the interaction of genetic factors with external agents, like viruses, chemicals and physical muta-gens. In this context, the importance of DNA methylation in carcinogenesis is widely recognized [5,11,14,15,40]. DNA Methylation is one of the most intensely studied genetic modification in mammals involving reversible covalent alterations of DNA nucleotides [6]. In particular, the enzyme DNA methyltransferase catalyzes the conversion of the cytosine (typically in a CpG site) to 5-methylcytosine, by adding a methyl group (CH3) to cytosine residues in the sequence. In normal cells, this conversion results in different interaction properties assuring the proper regulation of gene expression and of gene silencing [4]. In the haploid human genome there around 28 million of CpG sites in methylated or unmethylated state [28]. It is well-known that inactivation of tumor-suppressor genes may occur as a consequence of hyper-methylation within the gene regions and a large range of cancer-related genes can be silenced by DNA methylation in different types of tumors.