BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//talks.staging.osgeo.org//foss4g-europe-2024-academic-tra
 ck//talk//WLUUSK
BEGIN:VTIMEZONE
TZID:EET
BEGIN:STANDARD
DTSTART:20000101T000000
RRULE:FREQ=YEARLY;BYMONTH=1;UNTIL=20001231T220000Z
TZNAME:EET
TZOFFSETFROM:+0200
TZOFFSETTO:+0200
END:STANDARD
BEGIN:STANDARD
DTSTART:20021027T050000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:EET
TZOFFSETFROM:+0300
TZOFFSETTO:+0200
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20020331T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:EEST
TZOFFSETFROM:+0200
TZOFFSETTO:+0300
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-foss4g-europe-2024-academic-track-WLUUSK@talks.staging.osgeo.or
 g
DTSTART;TZID=EET:20240703T160000
DTEND;TZID=EET:20240703T163000
DESCRIPTION:Soil Erosion\, the displacement of topsoil by water and wind\, 
 poses a significant threat to global land health\, impacting food security
 \, water quality\, climate change\, and ecosystem stability. Earth Observa
 tion (EO)  and remote sensing technologies play a crucial role in monitori
 ng and assessing soil erosion\, offering valuable spatial and temporal dat
 a for informed decision-making. This paper applied three (3) Machine Learn
 ing (ML) models\, namely the XGBoost classifier\, LightGBM classifier\, an
 d CatBoost classifier to perform soil erosion classification in the Europe
 an Union (EU) region. The data used in this study were sourced from Kaggle
 \, a huge repository of community-published machine learning models and da
 ta\, and it includes several EO data namely the Landsat 7 seasonal Analysi
 s Ready Data (ARD)\, BioClim v1.2 historical (1981-2010) average climate d
 ata using the CHLSA classification system\, annual MODIS EVI data\, climat
 ic variables (water vapour\, monthly snow probability\, annual MODIS LST i
 n daytime or night time\, annual CHELSA rainfall V2.1)\, Human footprint (
 Hengl et al.\, 2023)\, Land cover\, Landform and landscape parameters (Hen
 gl\, 2018)\, Lithology (Hengl\, 2018). The dataset has a total of 3754 sam
 ple points and 139 features. A detailed description of the dataset feature
 s can be found here.\n\nDuring the Exploratory Data Analysis (EDA) process
 \, the visual relationship between the Landsat bands and the target variab
 le (erosion category)\, revealed that the Near Infrared (NIR) \, Short-Wav
 e Infrared I (SWIR1)\, Short-Wave Infrared II (SWIR2)\, and Thermal bands 
 were effective in differentiating between the various erosion categories\,
  compared to other bands. This insight gave direction in the feature engin
 eering process. As suggested by Puente et al. (2019)\, vegetation indices 
 could prove effective in predicting soil erosion. Consequently\, we comput
 ed various vegetation indices such as the Normalised Difference Water Inde
 x (NDWI)\, Normalised Difference Infrared Index (NDII)\, and Shortwave Inf
 rared Water Stress Index (SIWSI)  as well as applied the Tasseled Cap Tran
 sformation which includes Brightness\, Wetness and Greenness\, to augment 
 the features. To capture textural variations of each pixel location\, elev
 ation\, and slope-based measures were computed. The Topographic Position I
 ndex (TPI) was computed for each position using a 100\,000-metre radius\, 
 calculating the mean elevation of points within the radius and subtracting
  it from each point elevation within the radius. Other features computed w
 ere the Topographic Wetness Index (TWI)\, Aspect\, LS-Factor\, and Stream 
 Power Index (SPI) which reflects the erosive power of streams. Leveraging 
 the thermal band\, Land Surface Temperature (LST) was derived. As noted by
  Ghosal (2021)\, combining LST with temporal data can identify regions vul
 nerable to soil erosion.\n\nThe development of these models incorporated S
 cikit-Learn Recursive Feature Elimination (RFE) in the preliminary feature
  selection process using the XGBoost model as the estimator. The goal of R
 FE is to return “n” features by training the model on all features\, r
 ank all features by importance\, and remove the least important features u
 ntil “n” features remain. The RFE “n” features were set to 200. Af
 terward\, an XGBoost model was trained with the 200 features\, and Scikit-
 Learn’s Randomised Search CV was employed to optimise its hyperparameter
 s\, leading to an improved F-1 score for the XGBoost classifier. Using the
  XGBoost’s classifier feature importance ranking\, the top 155 features 
 were selected for use in the final ensemble model for predictions. To prov
 ide a more reliable estimate of the performance of the training model\, Sc
 ikit-Learn's Stratified KFold was implemented with n_splits set to 5 and t
 he erosion category as the stratification variable. By using stratified KF
 old\, a balanced class representation in each fold during training was ach
 ieved. For modelling of erosion categories\, an ensemble voting classifier
  combined predictions from three optimised gradient boosting models (XGBoo
 st\, LightGBM\, CatBoost) using a "soft" voting scheme. This approach aime
 d to improve accuracy and reduce overfitting compared to individual models
 . The confusion matrix was used to evaluate the ensemble's performance\, c
 onsidering precision\, recall\, and F1-score metrics. These metrics assess
  the model's ability to correctly identify positive and negative cases\, w
 ith a higher F1 score indicating better overall performance.\n\nThe weight
 ed F-1 score reached 0.86\, and the weighted precision and recall were 0.8
 6 and 0.86 respectively\, indicating that the proposed method using variou
 s EO data to predict soil erosion categories (No Gully/badland\, Gully\, B
 adland\, Landslides) displayed good performance. Specifically\, No Gully/b
 adland (0.89\, 0.91) and Landslides(1.00\, 1.00) had higher precision and 
 recall values\, which means that the model can correctly identify areas th
 at fall within these erosion categories with low false positives and false
  negatives. The Badland(0.49) had the least recall value indicating that t
 he model could not identify a substantial amount of this category.  \n\nAc
 cording to the Feature Importance analysis\; Year\, Latitude Coordinates\,
  Topographic Wetness Index (TWI)\, Longitude Coordinates\, Maximum Fractio
 n of Absorbed Photosynthetically Active Radiation (FAPAR)\, Minimum Annual
  Water Vapour\,  Mean of Slope\, Weighted Difference Vegetation Index (WDV
 I)\, Normalised Difference Snow Index (NDSI) and Standard Deviation of Slo
 pe emerged as the top ten (10) factors influencing soil erosion. Indicatin
 g that Topographic factors and vegetation indices were important for predi
 cting soil erosion. The year was the most important feature\, which shows 
 that temporal trends have a huge impact in predicting soil erosion.\n\nIn 
 conclusion\, this project successfully explored the potential of ensemble 
 learning and EO data for classifying soil erosion\, highlighting its promi
 sing role in addressing this crucial environmental issue. The proposed fra
 mework indicates that Topographic indices like the TWI and vegetation indi
 ces like the WDVI hold valuable information for predicting soil erosion. F
 urthermore\, band combinations using near-infrared (NIR)\, SWIR1\, SWIR2\,
  and thermal bands can significantly improve the classification of soil er
 osion categories. Crucially\, EO data like digital elevation models (DEMs)
  and Analysis Ready Landsat data serve as the foundation for accurate soil
  erosion prediction. The proposed approach to incorporate multi-temporal E
 O data offers exciting prospects for even more accurate soil erosion class
 ification.
DTSTAMP:20260509T090313Z
LOCATION:Omicum
SUMMARY:Mapping Soil Erosion Classes using Remote Sensing Data and Ensemble
  Models - Ayomide Oraegbu\, Emmanuel Jolaiya
URL:https://talks.staging.osgeo.org/foss4g-europe-2024-academic-track/talk/
 WLUUSK/
END:VEVENT
END:VCALENDAR