BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//talks.staging.osgeo.org//foss4g-europe-2025//speaker//JR
 T3S9
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-foss4g-europe-2025-MTM3ZA@talks.staging.osgeo.org
DTSTART;TZID=CET:20250718T113000
DTEND;TZID=CET:20250718T120000
DESCRIPTION:Quantitative thematic mapping is widely used in meteorology (e.
 g.\, weather maps)\, geology (e.g.\, topographic maps)\, and environmental
  science (e.g.\, pollution distribution). However\, mapping continuous qua
 ntitative data is challenging\, especially when data is sparse or unreliab
 le. Sparse data arises from uneven measurement distribution\, while unreli
 able data stems from inconsistencies or errors\, such as subjective self-r
 eports or satellite-derived estimates. These challenges intensify when map
 ping hidden variables requiring indirect proxies[Ervin\, 2009]\, introduci
 ng further uncertainty. Advanced techniques for integrating multiple datas
 ets and statistical methods like interpolation or machine learning are ess
 ential for improving accuracy.\nMapping walkability and fire risk in wilde
 rness areas is particularly difficult due to the hidden nature of these va
 riables and data limitations. Wilderness walkability depends on factors su
 ch as trail connectivity\, slope\, surface quality\, and accessibility\, w
 hich are difficult to measure comprehensively. Desktop analyses often miss
  real-world obstacles like debris or vegetation overgrowth. Similarly\, fi
 re risk is influenced by vegetation dryness\, wind patterns\, topography\,
  and human activity—complex interactions that are hard to quantify due t
 o sparse sensor coverage and environmental variability. Both require indir
 ect proxies\, such as fire behavior data or trail condition audits\, which
  introduce inaccuracies. Addressing these challenges demands advanced mapp
 ing techniques that integrate multiple data sources\, including crowdsourc
 ed trail reviews\, IoT sensors\, and remote sensing data.\nThis study focu
 ses on thematic mapping of walkability. Unlike urban walkability[Horak\,20
 22]\, which is linked to built infrastructure\, wilderness walkability dep
 ends on natural terrain features such as slope\, surface stability\, veget
 ation density\, and trail connectivity. Measuring these factors directly i
 s difficult due to terrain heterogeneity and dynamics. For instance\, stee
 p inclines and loose surfaces can impede movement\, while dense undergrowt
 h or debris can block trails entirely.\nWalkability can be assessed using 
 GPX trail data by calculating walking speed along trails\, providing an ob
 jective measure of terrain difficulty. GPX files contain time-stamped geog
 raphic coordinates that allow speed calculations based on distance and tim
 e. However\, individual differences in fitness\, experience\, and preferen
 ces introduce subjectivity when expressing walkability as walking speed. O
 ne hiker may struggle on rocky trails\, while another navigates them with 
 ease. Aggregating data from multiple users helps mitigate these biases\, c
 apturing broader patterns and providing a more accurate walkability repres
 entation. Walking speed alone is insufficient for defining inherent walkab
 ility\, so we propose matrix factorization as a technique for revealing la
 tent walkability values. Using multiple GPX trails\, we evaluate different
  matrix factorization methods for thematic mapping.\nData\nWe collected 1\
 ,620 GPX trails from users across Croatia\, including mountain rescue team
 s\, hikers\, runners\, dog walkers\, and casual users. To ensure anonymity
 \, each GPX file was assigned a unique user ID without personal informatio
 n. Each trail contained geographic coordinates and timestamps\, though var
 iations in recording instruments led to differences in segment lengths. Mo
 vement speed was calculated by comparing time and location of neighboring 
 segments. After filtering out outliers\, we obtained 1\,795\,663 valid seg
 ments described by location\, time\, user ID\, and speed.\nTo address inco
 nsistencies\, segments were grouped into 100-meter spatial cells per user.
  The median movement speed per user-cell combination was computed\, result
 ing in 127\,478 user-cell speed descriptions. The final dataset was struct
 ured as a 1\,609 × 24\,349 sparse matrix\, where rows represent users and
  columns represent terrain cells\, with values indicating median walking s
 peed.\nMethods\nWhen factorizing user-item rating matrices\, various techn
 iques uncover latent features and improve predictions (Khalitov\,2021\; Du
 \,2023). \nSingular Value Decomposition (SVD)  factorizes the matrix into 
 three components\, capturing latent relationships through eigenvalue calcu
 lations.  Truncated SVD  retains only the top  k  singular values\, primar
 ily for dimensionality reduction in preprocessing.   Non-Negative Matrix F
 actorization (NMF) \, similar to SVD\, enforces non-negative components\, 
 making results more interpretable by representing additive user-item inter
 actions.  For large datasets\,  Stochastic Gradient Descent (SGD)  iterati
 vely updates latent factors to minimize prediction error\, while  Alternat
 ing Least Squares (ALS)  optimizes user and item factors in a least-square
 s framework.   Fast Independent Component Analysis (FastICA)  extracts sta
 tistically independent latent factors\, assuming non-Gaussian distribution
 s\, and is mainly used for feature extraction and preprocessing.\nEvaluati
 on\nAll factorization techniques were tested on the dataset using Python\,
  scikit-learn\, and custom implementations. Factorization extracted a sing
 le latent factor per user and per cell. Performance was evaluated by calcu
 lating RMSE between reconstructed values (user-cell latent factor product)
  and original sensed values.\nA cell's latent factor represents its inferr
 ed walkability. Walkability maps generated from each technique were compar
 ed with satellite imagery\, topography\, and land cover data using GRASS G
 IS statistical tools.\nResults\nRMSE obtained for the constructed dataset 
 evaluation was 0.4148 (NMF and TruncatedSVD) \, 0.4665 (SVD) \, 0.4666 (  
 FastICA)\, 2.2839 (ALS) and 5.2349 (SGD)\nConclusion\nResults confirm that
  matrix factorization effectively separates user and terrain latent data i
 n sparse datasets. NMF\, with its explainability\, proves particularly use
 ful for mapping hidden values\, as it ensures non-negative components that
  directly relate to real-world factors influencing walkability. This prope
 rty makes it especially suited for modeling walkability.\nExtracted latent
  factors provide insights into spatial walkability patterns\, revealing ar
 eas where walking conditions are more or less favorable. These factors can
  serve as a foundation for extrapolating walkability across larger areas u
 sing geospatial datasets\, including land cover classifications\, slope gr
 adients\, and aspect orientations. Additionally\, integrating these result
 s with external environmental datasets could lead to predictive models for
  walkability in natural landscapes. Future research should explore method 
 transferability across diverse geographical regions and other applications
 \, such as fire risk mapping\, where fire behavior data could quantify fir
 e susceptibility.
DTSTAMP:20260527T015035Z
LOCATION:PA01 (Quarticle)
SUMMARY:Evaluating Matrix Factorization Techniques for Thematic Mapping of 
 Wilderness Walkability Using Multiple GPX Datasets - Ljiljana Seric\, Bori
 s Draško
URL:https://talks.staging.osgeo.org/foss4g-europe-2025/talk/MTM3ZA/
END:VEVENT
END:VCALENDAR
