Abstract
۱٫ Introduction
۲٫ Background
۳٫ Methodology
۴٫ Implementation
۵٫ Further development
۶٫ Discussion
۷٫ Conclusion
Acknowledgements
References
Abstract
Virtual Research Environments (VREs), also known as science gateways or virtual laboratories, assist researchers in data science by integrating tools for data discovery, data retrieval, workflow management and researcher collaboration, often coupled with a specific computing infrastructure. Recently, the push for better open data science has led to the creation of a variety of dedicated research infrastructures (RIs) that gather data and provide services to different research communities, all of which can be used independently of any specific VRE. There is therefore a need for generic VREs that can be coupled with the resources of many different RIs simultaneously, easily customised to the needs of specific communities. The resource metadata produced by these RIs rarely all adhere to any one standard or vocabulary however, making it difficult to search and discover resources independently of their providers without some translation into a common framework. Cross-RI search can be expedited by using mapping services that harvest RI-published metadata to build unified resource catalogues, but the development and operation of such services pose a number of challenges. In this paper, we discuss some of these challenges and look specifically at the VRE4EIC Metadata Portal, which uses X3ML mappings to build a single catalogue for describing data products and other resources provided by multiple RIs. The Metadata Portal was built in accordance to the e-VRE Reference Architecture, a microservice-based architecture for generic modular VREs, and uses the CERIF standard to structure its catalogued metadata. We consider the extent to which it addresses the challenges of cross-RI search, particularly in the environmental and earth science domain, and how it can be further augmented, for example to take advantage of linked vocabularies to provide more intelligent semantic search across multiple domains of discourse.
Introduction
Virtual Research Environments (VREs) [1], also known as virtual laboratories or science gateways, provide integrated online environments for researchers engaged in data science, typically including tools for activities such as data discovery, data retrieval, researcher collaboration, process scheduling on remote computing resources (such as high performance compute clusters or the Cloud), and workflow management. VREs can be considered to be one of three types of science support environment developed to support researchers in data science [2], the other two being research infrastructures (RIs) and e-infrastructure. Where RIs focus on providing access to data and services based on those data to particular research communities however, and e-infrastructure focuses on providing the fundamental compute, storage and networking facilities needed to support data science, VREs focus on supporting researchers in actually using the data, services and facilities made available by the other two kinds of infrastructure. Many VREs are coupled with certain e-infrastructures to facilitate process scheduling and storage of user data, often making use of e-infrastructures provided specifically for the research community (via initiatives such as EGI1 or EUDAT2 ) or public Cloud platforms. Data are brought into the dedicated infrastructure, and are then explored and manipulated via a particular data processing platform or scientific workflow management system [3].