محاسبه مجدد انتخابی از وظایف تحلیل های کلان داده
ترجمه نشده

محاسبه مجدد انتخابی از وظایف تحلیل های کلان داده

عنوان فارسی مقاله: محاسبه مجدد انتخابی و مکرر از وظایف تحلیل های کلان داده: نگرشی از مطالعه موردی ژنومیک
عنوان انگلیسی مقاله: Selective and Recurring Re-computation of Big Data Analytics Tasks: Insights from a Genomics Case Study
مجله/کنفرانس: تحقیقات کلان داده - Big Data Research
رشته های تحصیلی مرتبط: مدیریت، مهندسی فناوری اطلاعات
گرایش های تحصیلی مرتبط: مدیریت فناوری اطلاعات، مدیریت سیستم های اطلاعاتی، مدیریت دانش
کلمات کلیدی فارسی: محاسبه مجدد، فروپاشی دانش، تحلیل داده های بزرگ، ژنومیک
کلمات کلیدی انگلیسی: Re-computation، Knowledge decay، Big data analysis، Genomics
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.bdr.2018.06.001
دانشگاه: School of Computing, Newcastle University, Newcastle upon Tyne, UK
صفحات مقاله انگلیسی: 19
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2018
ایمپکت فاکتور: 7/184 در سال 2017
شاخص H_index: 12 در سال 2019
شاخص SJR: 0/757 در سال 2017
شناسه ISSN: 2214-5796
شاخص Quartile (چارک): Q1 در سال 2017
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: بله
کد محصول: E11083
فهرست مطالب (انگلیسی)

Abstract

1- Introduction

2- A generic meta-process for selective re-computation

3- Related work

4- Experimental setting and blind re-computation baseline

5- Data differences

6- Differential execution

7- Partial re-execution

8- Identifying the scope of change

9- A blueprint for a generic and automated re-computation framework – challenges

10- Conclusions and future work

References

بخشی از مقاله (انگلیسی)

Abstract

The value of knowledge assets generated by analytics processes using Data Science techniques tends to decay over time, as a consequence of changes in the elements the process depends on: external data sources, libraries, and system dependencies. For large-scale problems, refreshing those outcomes through greedy re-computation is both expensive and inefficient, as some changes have limited impact. In this paper we address the problem of refreshing past process outcomes selectively, that is, by trying to identify the subset of outcomes that will have been affected by a change, and by only re-executing fragments of the original process. We propose a technical approach to address the selective re-computation problem by combining multiple techniques, and present an extensive experimental study in Genomics, namely variant calling and their clinical interpretation, to show its effectiveness. In this case study, we are able to decrease the number of required re-computations on a cohort of individuals from 495 (blind) down to 71, and that we can reduce runtime by at least 60% relative to the naïve blind approach, and in some cases by 90%. Starting from this experience, we then propose a blueprint for a generic re-computation meta-process that makes use of process history metadata to make informed decisions about selective re-computations in reaction to a variety of changes in the data.

Introduction

In Data Science applications, the insights generated by resourceintensive data analytics processes may become outdated as a consequence of changes in any of the elements involved in the process. Changes that cause instability include updates to reference data sources, to software libraries, and changes to system dependencies, as well as to the structure of the process itself. We address the problem of efficiently restoring the currency of analytics outcomes in the presence of instability. This involves a trade-off between the recurring cost of process update and re-execution in the presence of changes on one side, and the diminishing value of its obsolete outcomes, on the other. Addressing the problem therefore requires knowledge of the impact of a change, that is, to which extent the change invalidates the analysis, as well as of the cost involved in upgrading the process and running the analysis again. Additionally, it may be possible to optimise the re-analysis given prior outcomes and detailed knowledge of, and control over, the analysis process.