مدیریت فراداده برای پایگاه داده های علمی
ترجمه نشده

مدیریت فراداده برای پایگاه داده های علمی

عنوان فارسی مقاله: مدیریت فراداده برای پایگاه داده های علمی
عنوان انگلیسی مقاله: Metadata management for scientific databases
مجله/کنفرانس: سیستم های اطلاعاتی - Information Systems
رشته های تحصیلی مرتبط: مهندسی فناوری اطلاعات، کامپیوتر
گرایش های تحصیلی مرتبط: مدیریت سیستم های اطلاعات
کلمات کلیدی فارسی: مدیریت فراداده، بانکهای اطلاعاتی علمی، بهینه سازی پرس و جو
کلمات کلیدی انگلیسی: Metadata management، Scientific databases، Query optimization
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
نمایه: Scopus - Master Journals List - JCR
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.is.2018.10.002
دانشگاه: Politecnico di Milano, Italy
صفحات مقاله انگلیسی: 20
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2019
ایمپکت فاکتور: 3/176 در سال 2018
شاخص H_index: 76 در سال 2019
شاخص SJR: 0/779 در سال 2018
شناسه ISSN: 0306-4379
شاخص Quartile (چارک): Q1 در سال 2018
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
آیا این مقاله مدل مفهومی دارد: ندارد
آیا این مقاله پرسشنامه دارد: ندارد
آیا این مقاله متغیر دارد: ندارد
کد محصول: E13234
رفرنس: دارای رفرنس در داخل متن و انتهای مقاله
فهرست مطالب (انگلیسی)

Abstract

1- Introduction and motivation

2- Scientific data model

3- Scientific query language

4- Optimization of ScQL queries

5- Applicability of the approach

6- Related work

7- Conclusions

References

بخشی از مقاله (انگلیسی)

Abstract

Most scientific databases consist of datasets (or sources) which in turn include samples (or files) with an identical structure (or schema). In many cases, samples are associated with rich metadata, describing the process that leads to building them (e.g.: the experimental conditions used during sample generation). Metadata are typically used in scientific computations just for the initial data selection; at most, metadata about query results is recovered after executing the query, and associated with its results by post-processing. In this way, a large body of information that could be relevant for interpreting query results goes unused during query processing.
In this paper, we present ScQL, a new algebraic relational language, whose operations apply to objects consisting of data–metadatapairs, by preserving such one-to-one correspondence throughout the computation. We formally define each operation and we describe an optimization, called meta-first, that may significantly reduce the query processing overhead by anticipating the use of metadata for selectively loading into the execution environment only those input samples that contribute to the result samples.
In ScQL, metadata have the same relevance as data, and contribute to building query results; in this way, the resulting samples are systematically associated with metadata about either the specific input samples involved or about query processing, thereby yielding a new form of metadata provenance. We present many examples of use of ScQL, relative to several application domains, and we demonstrate the effectiveness of the meta-first optimization.

Introduction and motivation

The organizations of scientific databases are very different. In many scientific fields, such as biology and astronomy, big consortia produce large, well-organized data repositories for public use. In other contexts, such as public administrations, data are open but much less organized and much more dispersed. Other big data players, such as Internet companies or mobile phone operators, produce information mostly for internal use, but often support third parties in research studies (e.g., about consumers’ interests) by providing them with services for data retrieval. We abstract a scientific data source as a container of several datasets, that in turn consists of thousands of samples, one for each experimental condition, often stored as files and not within a database; typically, samples are described by metadata, i.e., descriptive information about the content and production process of each sample. In meteorology, typical metadata describe ‘‘the WDM station, the sources of meteorological data, and the period of record for which the data is available’’; then the samples describe millions of records registered at the station. In genomics, typical metadata describe ‘‘the technology used for DNA sequencing, the process of DNA preparation, the genotype and phenotype of the donor’’; then, samples describe millions of genomic regions collected during the experiment. Metadata support the selection of the relevant experimental data by means of user interfaces (e.g. see genomic repositories such as ENCODE (the Encyclopedia of Genomic Elements, [1]) or TCGA (The Cancer Genome Atlas, [2]). When a source exposes APIs or WEB interfaces, metadata associated to each sample (such as Twitter’s hashtags or timestamps) support data retrieval.