Affordable Access

Data integration in the agronomic domain : national and international data discovery system

  • Philippe, Florian
  • Venkatesan, Aravind
  • El Hassouni, Nordine
  • Pommier, Cyril
  • Ruiz, Manuel
  • Larmande, Pierre
  • Steinbach, Delphine
  • Quesneville, Hadi
  • Krimmel, Erik
  • Flores, Raphaël-Gauthier
Publication Date
Jul 06, 2015
External links


Current research in Agronomy has produced a vast amount of genomic, genetic and phenomic data. To deal with the Volume, Variety and Velocity of those data, it is necessary to first refine to candidate datasets through data discovery then to integrate them through semantic web technologies. Data discovery is an approach that allows to easily search for datasets based on keywords and metadata. The plant bioinformatic node of the Institut Français de Bioinformatique (IFB) gives access to several public information systems hosting domain specific data. It is composed of five bioinformatics platforms : the South Green platform, the LIPM platform, the Roscoff platform ABiMS, the platform for Arthopods for Agroecosystems Arthropods and the URGI platform. The later one plays a key role in several national an international projects like the Whea Initiative. Those platforms integrate several plant genomic, genetic and phenomic data, which they need to expose in data discovery and integration systems. The distributed data discovery system need an ETL (Extraction, Transformation and Loading) based integration pipeline implemented on each platform. This ETL can either be developed from scratch or be based on existing technologies such as KarmaWeb, Talend or Open Refine. The pipeline is being developed at the URGI, and will be deployed on all IFB plant nodes. The data discovery system is based on SolR (implemented in the Transplant portal which uses the Lucene search framework at its core for full-text indexing. Here, we will present the data discovery system architecture and the ETL solutions evaluation and comparison. Work funded by IFB investment for the future infrastructure project, IFB_Plant node.

Report this publication


Seen <100 times