Integrated Data Catalogue
The D4Science Data Infrastructure offers services for seamless access to and management of a wide spectrum of data including ecological and biological data, geospatial data, statistical data and semi-structured data from multiple data providers and information systems. These services can be exploited both via web based graphical user interfaces and web based protocols for programmatic access, e.g. OAI-PMH, CSW, SDMX, OGC-WFS, OGC-WCS.
A ready-to-use catalogue of data can be exploited for complex and interdisciplinary analysis.
Biological and Ecological Data
For Biological and Ecological Data, the infrastructure either hosts or provides seamless access to occurrence records and nomenclature data from the following data sources:
- Catalogue of Life: this data source offers an integrated checklist and a taxonomic hierarchy of more that 1.3 million species of animals, plants, fungi and micro-organisms www.catalogueoflife.org;
- FAO List of Species for Fishery Statistics Purpose (ASFIS): this includes 12,000+ species of interest or relation to fisheries and aquaculture www.fao.org/fishery/collection/asfis/en;
- Global Biodiversity Information Facility (GBIF): this data source offers more than 430 million of records on species and more than 14,000 datasets aggregated from 580+ publishers; www.gbif.org;
- Fishbase: this data source offers access to 32700 Species, 302900 Common names, 53600 Pictures, 49700 References aggregated thanks to the effort of thousand collaborators;
- Interim Register of Marine and Nonmarine Genera (IRMNG): this data source offers access to over 465,000 genus names and 1.6 million species names www.obis.org.au/irmng;
- Integrated Taxonomic Information System (ITIS): this data source offers authoritative taxonomic information on plants, animals, fungi, and microbes of North America and the world www.itis.gov;
- National Center of Biotechnology Information (NCBI) Taxonomy: this data source offers a curated classification and nomenclature for all of the organisms in the public sequence databases. This currently represents about 10% of the described species of life on the planet www.ncbi.nlm.nih.gov/taxonomy;
- Ocean Biogeographic Information System (OBIS): this data source offers more that 37 million records on species and 1,300+ datasets www.iobis.org;
- SeaLifeBase: this data source offers access to 126000 Species, 27300 Common names, 11900 Pictures, 18200 References aggregated thanks to the effort of hundred collaborators;
- World Register of Marine Species (WoRMS): this data source offers species "names" for more than 200,000 species including 300,000+ species names and synonyms and 400,000+ taxa www.marinespecies.org;
- World Register of Deep-Sea Species (WoRDSS): this data source offers species "names" for deep-sea species based on WoRMS www.marinespecies.org/deepsea.
For Spatial Data, the infrastructure either hosts or provides seamless access to more than 300 variables in 2D and 3D space. Those chemical and physical variables have a global spatial coverage and cover several decades. Among the others, it is worth noticing the following integrated data providers:
- FAO GeoNetwork: This data source exposes spatial data maintained by FAO and its partners geonetwork.fao.org;
- World Ocean Atlas: This data source give access to a number of environmental variables.In particular, iMarine focuses on some indicators including Apparent Oxygen Utilisation, Dissolved Oxygen, Nitrate, Oxygen Saturation, Phosphate, Sea Water Salinity, Sea Water Temperature, and Silicate www.nodc.noaa.gov/OC5/WOA09/pr_woa09.html;
- Marine Regions: This data source give access to a standard list of marine georeferenced place names and areas including EEZ www.marineregions.org;
- myOceans: This data source give access to a number of environmental variables. In particular, the ones currently integrated focus on some indicators including ice concentration, ice thickness, ice velocity, mass concentration of chlorophyll in sea water, meridional velocity, mole concentration of dissolved oxygen in sea water, mole concentration of nitrate in sea water, mole concentration of phosphate in sea water, mole concentration of phytoplankton expressed as carbon in sea water, net primary production of carbon, salinity, sea surface height, temperature, zonal velocity, wind speed, and wind stress www.myocean.eu.
For Statistical Data, the infrastructure offers seamless access to the following data sources:
- IRD UMR EME/Observatoire Thonier SDMX Registry and Repository: This data source exposes (a) the Sardara database that contains tuna captures data from several countries, aggregated according to CWP statistical squares (1'x1' or 5'x5') and (b) the ObServe database that contains tuna and bycatches captures observed by scientific observers onboard French industrial purse seiners.
- SDMX Codelists either directly accessed from the FAO Registry, or manually uploaded through the facility developed in the context of ICIS www.fao.org/figis/sdmx.
- StatBase (Economic Commission for Africa): this data source collects and organises data about several sectors including Agriculture, Education, Energy, Environment, Industry, Population. Data are collected from several data providers including African Development Bank, Central Bank of Central African States, Freedom House, International Energy Agency, OECD, United Nations Industrial Development Organization.
Knowledge Bases and Repositories
The infrastructure offers also seamless access to the following data sources including Knowledge Bases and Repositories:
- Aquatic Commons: this data source offers access to thematic material covering natural marine, estuarine/brackish and fresh water environments aquaticcommons.org;
- Biodiversity Heritage Library: this data source offers access to legacy literature of biodiversity held by a consortium of natural history and botanical libraries www.biodiversitylibrary.org;
- Bioline International: this data source offers access to open access quality research journals published in developing countries www.bioline.org.br;
- Central and Eastern European Marine Repository (CEEMar): this data source offers material covering marine, brackish and fresh water environments www.ceemar.org/dspace;
- DataCite: this data source offers access to the same service whose mission is to give access to research data www.datacite.org;
- DBPedia: this knowledge base results from Wikipedia. It contains over 4 millions things including persons, places, creative works, organisations, species and diseases dbpedia.org/About;
- DRS at National Institute of Oceanography: this data source offers institutional publications including journal articles and technical reports drs.nio.org/drs;
- Dryad: this data source offers access to the same service whose mission is to give access to research publications datadryad.org;
- FactForge: this knowledge base results from the integration of a number of datasets including DBPedia, WordNet, Geonames, and Freebase factforge.net;
- FAO FishFinder Factsheets: this data source give access to the Aquatic Species Fact Sheets developed by the same FAO programme www.fao.org/fishery/fishfinder;
- FAO FLOD: a semantic knowledge based hosted in FAO containing a dense network of relationships among the major entities of the fishery domain, including marine species, water areas, land areas, and exclusive economic zones www.fao.org/figis/flod;
- iMarine TLO Warehouse: this warehouse integrates information from FishBase, WoRMS, ECOSCOPE, FLOD and DBPedia by using the same top-level ontology developed for the marine domain. It currently contains approximately 3 millions of triples about more than 40,000 entities including marine species, ecosystems, water areas, and vessels www.ics.forth.gr/isl/MarineTLO;
- Nature: this data source offers access to the articles published by nature.com;
- OceanDocs: this data source offers research and publication materials in Marine Science by aggregating content form 256 repositories www.oceandocs.net;
- OpenAIRE: this data source give access to the publications aggregated by the same European funded projec www.openaire.eu;
- PANGAEA: this data source offers georeferenced data from earth system research via OAI-PMH. The system guarantees long-term availability of its content through a commitment of the operating institutions. The aggregated repositories are 475 www.pangaea.de;
- PenSoft Journals: this data source give access to a number of open-access journals. In particular, iMarine focuses on BioRisk, Comparative Cytogenetics, International Journal of Myriapodology, Journal of Hymenoptera Research, MycoKeys, Nature Conservation, NeoBiota, PhytoKeys, Subterranean Biology, and ZooKeys.
- SmartFish Chimaera: This knowledge base offers an unified and integrated view on three marine fisheries information sources, i.e. FIRMS – an international knowledge base including fisheries and resource from West Indian Ocean; StatBase – a statistical database containing statistics provided by West Indian Ocean countries; and WIOFish – a regional knowledge base on West Indian Ocean Fisheries.
- WHOAS: this data source offers the production of Woods Hole scientific community including articles and data sets www.mblwhoilibrary.org/services/whoas-repository-services;
- YAGO2: this knowledge base extends the YAGO knowledge base by anchoring entities,facts and events in time and space. The knowledge base is built from Wikipedia, GeoNames and WordNet and contains more than 440 million facts about 9.8 million entities www.mpi-inf.mpg.de/yago-naga/yago.