مدیریت فراداده برای پایگاه داده های علمی

عنوان فارسی مقاله: مدیریت فراداده برای پایگاه داده های علمی

عنوان انگلیسی مقاله: Metadata management for scientific databases

مجله/کنفرانس: سیستم های اطلاعاتی - Information Systems

رشته های تحصیلی مرتبط: مهندسی فناوری اطلاعات، کامپیوتر

گرایش های تحصیلی مرتبط: مدیریت سیستم های اطلاعات

کلمات کلیدی فارسی: مدیریت فراداده، بانکهای اطلاعاتی علمی، بهینه سازی پرس و جو

کلمات کلیدی انگلیسی: Metadata management، Scientific databases، Query optimization

نوع نگارش مقاله: مقاله پژوهشی (Research Article)

نمایه: Scopus - Master Journals List - JCR

شناسه دیجیتال (DOI): https://doi.org/10.1016/j.is.2018.10.002

دانشگاه: Politecnico di Milano, Italy

صفحات مقاله انگلیسی: 20

ناشر: الزویر - Elsevier

نوع ارائه مقاله: ژورنال

نوع مقاله: ISI

سال انتشار مقاله: 2019

ایمپکت فاکتور: 3/176 در سال 2018

شاخص H_index: 76 در سال 2019

شاخص SJR: 0/779 در سال 2018

شناسه ISSN: 0306-4379

شاخص Quartile (چارک): Q1 در سال 2018

فرمت مقاله انگلیسی: PDF

وضعیت ترجمه: ترجمه نشده است

قیمت مقاله انگلیسی: رایگان

آیا این مقاله بیس است: خیر

آیا این مقاله مدل مفهومی دارد: ندارد

آیا این مقاله پرسشنامه دارد: ندارد

آیا این مقاله متغیر دارد: ندارد

کد محصول: E13234

رفرنس: دارای رفرنس در داخل متن و انتهای مقاله

فهرست مطالب (انگلیسی)

Abstract

1- Introduction and motivation

2- Scientific data model

3- Scientific query language

4- Optimization of ScQL queries

5- Applicability of the approach

6- Related work

7- Conclusions

References

بخشی از مقاله (انگلیسی)

Abstract

Most scientific databases consist of datasets (or sources) which in turn include samples (or files) with an identical structure (or schema). In many cases, samples are associated with rich metadata, describing the process that leads to building them (e.g.: the experimental conditions used during sample generation). Metadata are typically used in scientific computations just for the initial data selection; at most, metadata about query results is recovered after executing the query, and associated with its results by post-processing. In this way, a large body of information that could be relevant for interpreting query results goes unused during query processing.
In this paper, we present ScQL, a new algebraic relational language, whose operations apply to objects consisting of data–metadatapairs, by preserving such one-to-one correspondence throughout the computation. We formally define each operation and we describe an optimization, called meta-first, that may significantly reduce the query processing overhead by anticipating the use of metadata for selectively loading into the execution environment only those input samples that contribute to the result samples.
In ScQL, metadata have the same relevance as data, and contribute to building query results; in this way, the resulting samples are systematically associated with metadata about either the specific input samples involved or about query processing, thereby yielding a new form of metadata provenance. We present many examples of use of ScQL, relative to several application domains, and we demonstrate the effectiveness of the meta-first optimization.

Introduction and motivation

The organizations of scientific databases are very different. In many scientific fields, such as biology and astronomy, big consortia produce large, well-organized data repositories for public use. In other contexts, such as public administrations, data are open but much less organized and much more dispersed. Other big data players, such as Internet companies or mobile phone operators, produce information mostly for internal use, but often support third parties in research studies (e.g., about consumers’ interests) by providing them with services for data retrieval. We abstract a scientific data source as a container of several datasets, that in turn consists of thousands of samples, one for each experimental condition, often stored as files and not within a database; typically, samples are described by metadata, i.e., descriptive information about the content and production process of each sample. In meteorology, typical metadata describe ‘‘the WDM station, the sources of meteorological data, and the period of record for which the data is available’’; then the samples describe millions of records registered at the station. In genomics, typical metadata describe ‘‘the technology used for DNA sequencing, the process of DNA preparation, the genotype and phenotype of the donor’’; then, samples describe millions of genomic regions collected during the experiment. Metadata support the selection of the relevant experimental data by means of user interfaces (e.g. see genomic repositories such as ENCODE (the Encyclopedia of Genomic Elements, [1]) or TCGA (The Cancer Genome Atlas, [2]). When a source exposes APIs or WEB interfaces, metadata associated to each sample (such as Twitter’s hashtags or timestamps) support data retrieval.

دانلود رایگان مقاله انگلیسی

سفارش ترجمه این مقاله

مشاهده خریدهای قبلی

مقالات مشابه

نماد اعتماد الکترونیکی

پیوندها