Ecosystem-Workbench

The BlueCloud2026 Ecosystem Workbench aims to improve the availability, quality, and interoperability of large collections of plankton observations and their associated extrapolated biogeographies. It provides a standardized modelling framework that is critical for predicting biogeographic patterns across space and time, while integrating the diversity of emerging biological sampling methods used to diagnose Biological Essential Ocean Variables (EOVs). At the core of this workbench is CEPHALOPOD (Comprehensible Ensemble Pipeline for Habitat modelling of Large Ocean Plankton Observation Datasets; Schickele et al., 2025), a comprehensive and highly automated R-based modelling framework designed to handle complex, sparse, and biased marine observation data and transform them into continuous estimates of species distributions. CEPHALOPOD integrates state-of-the-art data processing, statistical approaches, and machine learning techniques within a standardized workflow that ensures reproducibility and comparability across data types. The pipeline directly accesses biological data from initiatives such as AtlantECO, OBIS, GBIF, that also integrate biological outputs from platforms such as Ecotaxa and ELIXIR (MGnify). CEPHALOPOD includes harmonized steps for data pre-processing, model fitting, ensemble prediction, and quality control, enabling consistent comparison of spatial distribution outputs derived from heterogeneous sources such as presence-only records, continuous measurements (e.g., abundance or biomass), and compositional data (e.g., metagenomic read proportions). Its scalability allows the simultaneous modelling of multiple taxa, making it particularly well suited for large-scale marine biodiversity assessments. In addition, CEPHALOPOD produces automated and user-friendly diagnostic reports (including PDF outputs) that facilitate interpretation of model performance, uncertainty, and reliability, even for non-specialist users. A fully documented notebook is provided in the /public folder, containing all code, explanations, and step-by-step guidance required to run the Ecosystem Workbench independently for user-specific analyses. The workbench can be executed either interactively in the RStudio virtual environment or automatically via the CCP (Cloud Computing Platform). In addition, an output visualization application is available through the Cephaloview app tab.

number_users
25
Access