Affordable Access

Data integration in the agronomic domain : national and international data discovery system

Authors
  • Philippe, Florian
  • Venkatesan, Aravind
  • El Hassouni, Nordine
  • Pommier, Cyril
  • Ruiz, Manuel
  • Larmande, Pierre
  • Steinbach, Delphine
  • Quesneville, Hadi
  • Krimmel, Erik
  • Flores, Raphaël-Gauthier
Publication Date
Jul 06, 2015
Source
HAL-UPMC
Keywords
Language
English
License
Unknown
External links

Abstract

Current research in Agronomy has produced a vast amount of genomic, genetic and phenomic data. To deal with the Volume, Variety and Velocity of those data, it is necessary to first refine to candidate datasets through data discovery then to integrate them through semantic web technologies. Data discovery is an approach that allows to easily search for datasets based on keywords and metadata. The plant bioinformatic node of the Institut Français de Bioinformatique (IFB) gives access to several public information systems hosting domain specific data. It is composed of five bioinformatics platforms : the South Green platform, the LIPM platform, the Roscoff platform ABiMS, the platform for Arthopods for Agroecosystems Arthropods and the URGI platform. The later one plays a key role in several national an international projects like the Whea Initiative. Those platforms integrate several plant genomic, genetic and phenomic data, which they need to expose in data discovery and integration systems. The distributed data discovery system need an ETL (Extraction, Transformation and Loading) based integration pipeline implemented on each platform. This ETL can either be developed from scratch or be based on existing technologies such as KarmaWeb, Talend or Open Refine. The pipeline is being developed at the URGI, and will be deployed on all IFB plant nodes. The data discovery system is based on SolR (implemented in the Transplant portal http://www.transplantdb.eu) which uses the Lucene search framework at its core for full-text indexing. Here, we will present the data discovery system architecture and the ETL solutions evaluation and comparison. Work funded by IFB investment for the future infrastructure project, IFB_Plant node.

Report this publication

Statistics

Seen <100 times