Project Description
The WORLDSOILS project aims to develop, in close cooperation with authoritative end users, a pre-operational Soil Monitoring System to provide yearly estimations of Soil Organic Carbon (SOC) at global scale, exploiting space-based EO data leveraging large soil data archives and modelling techniques to improve the spatial resolution and accuracy of SOC maps.
The WORLDSOILS Monitoring System (WOSOMS) ambition is to achieve the following characteristics:
Modular implementation to allow future extension to additional soil indices.
Spatial resolution 100m x 100m globally and 50m x 50m over Europe.
Appropriate confidence metrics provision.
Use of large time series of a minimum of 3 years.
Validation over three European regions (in coordination with National Reference Centres of Soil).
The preliminary requirements for a SOC index will be specified in agreement with the engaged stakeholders but initially, the project team is focusing on computing three intermediate SOC indexes (bare soils, forest and grassland) from a reflectance composite that is built by merging large time series. The feasibility of these requirements will then be tested using case studies from previous projects of the consortium members and literature studies.
Results of case studies obtained during the feasibility analysis including possible implementation option will be presented and consolidated in a dedicated stakeholder engagement and user requirement consolidation workshop for establishing a system requirement baseline.
The consolidated baseline shall be used as a reference for generating detailed technical specification of the system which build the reference for analysing the suitability of different cloud environments for its implementation. Based on these specifications the WORLDSOILS Monitoring System (WOSOMS) will be implemented together with a suitable graphical user interface (GUI) and a searchable online help functionality on a suitable cloud environment.
After implementing the system on a suitable cloud environment and an intensive verification and testing ensuring a robust operation, the system will be validated for a minimum of 1 year over three case studies areas that have been designed along with European national entities in charge of monitoring and reporting on soils. The case studies shall be based on data acquired during the operations phase in addition to data from the previous two years (in total 3 years), including the use of local/regional reference data and the derived relationship.
Results will be presented at a final symposium which also will contain discussions on possible future enhancements and extensions as well as options for a sustained Soils Monitoring System operation.
The overall project duration is 30 months and it is understood as a first milestone to a long-term initiative that could include more indexes beyond SOC and / or increase the spatio-temporal resolution ambition.
Project Phases
The project is structured in three phases (‘Go’ or ‘No-Go’ decisions to proceed to the next phase will be provided by ESA at the end of each phase):
Phase 1: Feasibility analysis and system requirement baseline definition
In Phase-1, a detailed analysis of the end-user requirements for different application domains (e.g. agriculture, policy reporting, etc.) will be performed in terms of accuracy, spatial and temporal requirements for monitoring top-soil organic carbon.
In addition, feasibility studies will be carried out to investigates the implications of crucial choices in stages of the SOC prediction and will be used to consolidate an agreement on the soil monitoring system to be developed in phase 2.
Analysis and findings shall be detailed in the User Requirement Document (URD) which shall be further updated according to a consolidation process within this phase.
Results of these analysis shall be used to establish an overall system requirement baseline with different implementation options which shall be presented, discussed and consolidated at a dedicated stakeholder engagement and user requirement consolidation workshop at the end of this phase.
Phase 2: Design implementation, verification and testing
Phase-2 will be sub-divided into a detailed design phase, and a software development and implementation phase.
Based on the detailed design, WOSOMS will be implement together with a suitable graphical user interface (GUI) and a searchable online help functionality, in a modular way, on a suitable cloud environment.
Once WOSOMS is implemented, a demo will be performed at the Acceptance Review (AR).
Phase 3: Operation, validation and analysis
In this phase, WOSOMS will be validated for a minimum of 1 year over three case studies areas. The case studies shall be based on data acquired during the operations phase in addition to data from the previous two years (in total 3 years).
At the end of this phase, a final symposium will be organized for the end users engaged in the project to present and discuss the the pros and cons of the different methods included in the validation report. This symposium will also contain discussions on a possible future evolution/enhancement of the system.
Preliminary Approach
The proposed architecture for an EO based soil monitoring system consists of different processing stages to produce a final SOC product that can cope with terrain heterogeneity, different climate regimes and seasonality:

Pre-processing
- The pre-processing contains the preparation of the EO data base that is used for the SOC retrievals. It is based on per-pixel composites from time series of EO satellite imagery (mainly from Sentinel 2 archive) to collect exposed soil spectra from each single EO image and thus, enlarge the area that can then be analyzed in the subsequent modelling steps.
- The pre-processing step further controls the application of the appropriate SOC retrieval approach by dividing the area of interest into (a) exposed soils (mainly croplands), (b) permanent forests and (c) permanent grassland.
- For (a) estimation of SOC is based on spectral modelling and for (b) and (c) a digital soil modelling approach is applied. For the latter, the pre-processing module further produces EO covariates describing the dynamics of the vegetation.
Spectral Soil Modelling
- The WOSOMS will rely on the proven scientific excellence of its scientific partners to deploy advanced machine learning tools (e.g. RF) to predict the soil properties of selected Soil Spectral Libraries (SSLs) and test their suitability for transfer learning by fine-tuning. Given the sufficient spatial distribution and representativeness of Land Use and Coverage Area Frame Survey (LUCAS) dataset across the European continent, we leveraged this SSL in the project.
- The system will consider large spectroscopic (standardized) databases outside Europe (e.g. Brazil) to evaluate in which level the developed algorithms can be applied in different soil types and climatic conditions.
- We will employ techniques to eliminate the influence of ambient factors, such as soil moisture, reliability and precision. On that basis, the most fit for purpose methodologies will be selected to develop a solid framework whereby the appropriate models and approaches are leveraged.
Digital Soil Mapping
- DSM techniques will be used to obtain a continuous product, especially over permanently vegetated areas. A statistical relationship between measured soil properties and soil forming factors (terrain, vegetation, climate to name a few) as measured by environmental covariates will be established.
- Environmental covariates will be derived from a variety of sources and other components of this project. In particular vegetation and land cover will be derived from EO images, as by-product of the SCMaP processor.
Prediction
- SOC indexes computation for the different types of cover that are created in the previous stage (using different models, which are grounded, among others, in spectral libraries, in-situ surveys or physical property modelling).
Mosaicking
- Combination of the three prediction maps (exposed soils, permanent forests and permanent grasslands) to obtain a continuous map of soil organic carbon estimated.
- The products will be checked for artefacts due to the different indexes developed for the different land cover. We will use state-of-the-art methods to remove the boundary artefacts.
- The final map is the SOC index map containing continuous soil organic carbon estimates.
Feasibility Studies
The feasibility study investigates the implications of crucial choices in stages of the SOC prediction and will be used to consolidate an agreement on the soil monitoring system in Phase 1:
Case Studies:
Selection of LUCAS mineral soils (< 180 g C/kg) in cropland: Stratified random (Kennard Stone) splitting in 2/3 calibration and 1/3 validation dataset
Dataset:
LUCAS 2015 SSL (if available, otherwise LUCAS 2009)
Tech. Solution(s):
LUCAS will be re-sampled to the spectral resolution of Sentinel-2, LandSat and CHIME.
Machine and deep learning techniques will be used such as:
- Local PLS (script of Ward et al., 2019)
- Random Forest (or Cubist)
- SVM
- Convolutional Neural Networks with local adaptive error correction mechanisms
Output:
Spectral models using different machine learning techniques and for different satellite spectral resolutions. Goodness of fit assessed by validation (R², RMSE, RPD, RPIQ)
References:
Local PLSR: Ward et al., 2019 already carried out such an exercise re-sampling LUCAS to EnMap spectral resolution;
PLSR and RF: Castaldi et al. (2019)
SVM: Gholizadeh et al. (2018) in combination with spectral indices
CNN: Padarian et al. (2019) and Tsakiridis et.al. (2020)
Case Studies:
Selection of mineral soils (< 180 g C/kg) in cropland of the Brazilian SSL: Stratified random (Kennard Stone) selection of samples with different climate, parent material and soil type.
Dataset:
LUCAS 2015 SSL
Brazilian SSL
Internal soil standard scanned with LUCAS protocol and Brazilian protocol
Tech. Solution(s):
Spectral transfer between SSL’s using an internal soil standard
Re-sampling to S2, Landsat and CHIME spectral resolution
Local PLS and Random Forest (or Cubist) models developed on LUCAS SSL (topic 1)
Output:
After re-sampling to S2, Landsat and CHIME spectral resolutions models developed on the LUCAS SSL (topic 1) predict SOC content of samples in the Brazilian SSL. Goodness of fit assessed by validation (R², RMSE, RPD, RPIQ). Two cases: with and without internal soil standard included in LUCAS and Brazilian protocol.
References:
Internal soil standard: Kopackova and Ben Dor (2015)
Spectral modelling: see topic 1
Brazilian SSL: Dematte et al. (2019)
Case Studies:
Selection of mineral soils (< 180 g C/kg) in cropland of the Brazilian SSL: Stratified random (Kennard Stone) selection of samples with different climate, parent material and soil type.
Dataset:
LUCAS 2015 SSL (if available, otherwise 2009).
Spectral library with 106 top soil samples each at moisture contents of 0.05; 0.10; 0.15; 0.20; 0.25 g/g
Tech. Solution(s):
Re-sampling to S2, Landsat and CHIME spectral resolution (see topic 1).
Soil moisture ratio (i.e. for each wavelength, the relative difference between the reflectance of a sample with a given moisture content and a dry sample; Diek et al., 2019)
Simulated SSL’s for moist soils at satellite sensor resolution machine learning models to predict SOC for dry soils and simulated spectra at different soil
moisture steps.
Models for dry soils applied to SSL of moist soils (different combinations of soil moisture classes)
Output:
Accuracy of spectral models for moist soils compared to those of dry soils.
Effect of moisture content on SOC prediction: soil moisture threshold.
References:
Spectral modelling: see topic 1
SSL with range of moisture content (Nocita et al., 2013; Ogen et al., 2019)
Soil moisture ratio (Diek et al., 2019)
Case Studies:
Simulated aggregated and mixed pixels: Step-wise vegetation cover, residue cover, soil moisture and roughness will be added to the LUCAS SSL spectra.
Dataset:
LUCAS 2015 SSL (if available, otherwise 2009)
Data sources for vegetation, moisture, roughness signal
Tech. Solution(s):
Re-sampling to S2, Landsat and CHIME spectral resolution.
Local PLS or Random Forest (or Cubist) models developed on upscaled SSL (including vegetation, moisture and roughness signals).
Test effect of aggregated signal on SOC prediction.
Output:
Assessment of the decrease in performance due to the spatial and spectral degradation compared to a baseline of SOC prediction (topic 1).
References:
Roughness: Bablet et al. (2017); Diek et al. (2019).
Moisture: Nocita et al. (2013); Diek et al. (2019).
RT mixed vegetation-soil database: Kuester et al. (2020 submitted).
Case Studies:
Simulated and aggregated mixed pixels of topic 4
Dataset:
See topic 4
Tech. Solution(s):
Apply Unmixing techniques and test performance of SOC models
Output:
Assessment of the SOC prediction performance after spectral unmixing compared to a baseline of SOC prediction (topic 1)
References:
Spectral unmixing PV, NPV, Soils e.g. Asner and Heidebrecht (2010)
Case Studies:
Pilot zones in different bioclimatic regions of Europe. First level test case will be Bavaria. Second level test cases could be areas in Belgium, Czech Republic and Greece.
Dataset:
Sentinel-2 time series.
Copernicus Land Cover (LC) information and other LC data sets.
National land cover maps for validation.
Tech. Solution(s):
SCMap processor
Testing different available LC data sets for masking. Three approaches are foreseen:
- Static: Creating LC masks based on Copernicus LC data (e.g. CORINE, HR layer)
- Dynamic-I: using the masking output of SCMaP with the index published in Rogge et al. (2018) and further stratify using Copernicus LC data (e.g. CORINE, HR layer)
- Dynamic-II: using the masking output of SCMaP with other indices (TBD) and further stratify using Copernicus LC data (e.g. CORINE, HR layer)
Output:
Masks that trigger the use of SOC prediction models
References:
SCMap Processor: Rogge et al. (2018)
Case Studies:
Pilot zones in different bioclimatic regions of Europe. First level test case will be Bavaria.
Second level test cases could be areas in Belgium, Czech Republic and Greece.
Dataset:
S2 archive
Tech. Solution(s):
SCMaP processor is used for developing outputs of different time ranges such as (TBD) 1 year, 2 years, etc.. The exposed soil composites are then used for modelling SOC values. Validation will be performed based on the measures developed before (topic 1, R², RMSE, RPD, RPIQ).
Output:
Analysis of the difference between a manually user-controlled prediction and an operational system prediction
References:
SCMap Processor: Rogge et al. (2018)
Case Studies:
Pilot zones in different bioclimatic regions of Europe. First level test case will be Bavaria.
Second level test cases could be areas in Belgium, Czech Republic and Greece.
Dataset:
Archives of available mapping missions such S2.
Tech. Solution(s):
Parameterise and analyse the spectral variability of the composed exposed soil pixels as an indicator for outliers (e.g. from haze and clouds), moisture and remaining vegetation influence.
Additionally, measure of spectral variability can give information about the temporal variation and/or stability of soils in optical EO data and might be an important information for SOC prediction.
Output:
Additional SCMaP layer of spectral variability of composed pixels.
References:
None.
Case Studies:
Pilot zones in different bioclimatic regions of Europe. First level test case will be Bavaria.
Second level test cases could be areas in Belgium, Czech Republic and Greece.
Dataset:
Terrain: numerous derivatives from a digital elevation model (Copernicus) to describe the terrain and its shape.
Climate: The main available sources for climatic are interpolation of long-term time series of ground stations data (i.e. WorldClim, Fick et al, 2017), data form Earth Observation, mainly for temperature and precipitation and data derived using data assimilation from climatic models (e.g. ERA5 approach, C3S, 2017)
Vegetation indices derived from SCMap processor. In particular indices that will help model soil based on a string relationship with the growing vegetation and its changes.
Tech. Solution(s):
Statistical relationship between measured soil properties and soil forming factors (terrain, vegetation, cli-mate to name a few) as measured by environmental covariates.
Output:
These indexes will provide information about the spatial changes in vegetation and the different temporal behaviour that can be related to vegetation types. The relationship between vegetation type and SOC will be modelled for all the pixels to obtain a continuous surface.
References:
McBratney et al. (2013); Minasny and McBratney (2016)
Case Studies:
Pilot zones in different bioclimatic regions of Europe. First level test case will be Bavaria.
Second level test cases could be areas in Belgium, Czech Republic and Greece.
Dataset:
Datasets obtained from topic 9
Tech. Solution(s):
Statistical relationship between measured soil properties and soil forming factors (terrain, vegetation, cli-mate to name a few) as measured by environmental covariates.
RandomForest and other suitable candidates will be tested for both accuracy and operational feasibility.
Output:
Models and insight in the most promising approach.
References:
McBratney et al. (2013); Minasny and McBratney (2016)
Case Studies:
Pilot zones in different bioclimatic regions of Europe. First level test case will be Bavaria.
Second level test cases could be areas in Belgium, Czech Republic and Greece.
Dataset:
Combining the outcomes of topics 6-10
Tech. Solution(s):
Mosaicking the three index maps generated for each of the land cover considered. The product will be checked for artefacts and discontinuities at the borders between land cover.
Simple GIS based solution will be tested alongside model average approaches.
Output:
Smooth continuous final SOC map
References:
None