FOSS4G 2023 academic track

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
10:30
10:30
30min
Site Calibration with PROJ and WKT2
Javier Jimenez Shaw

In many surveying projects in construction, civil engineering, mining, surveying, etc. it is common to work on coordinates referenced to a local coordinate reference system (CRS) that is established ad hoc for the project site. These CRSs are necessary for applications with requirements that cannot be fulfilled by more common and affordable GNSS surveying techniques, for example millimetre accuracy, controlled distortion, etc. In these systems, assigning coordinates to an on-site location with the highest accuracy or solely relying on the control points that define the system is a laborious process that requires specialised and expensive tools and skills (for example, knowing how to perform a point triangulation using a total station, a device commonly used in land surveying).

On the other hand, in the execution of a project not all tasks involving geolocation have such strict requirements. In many cases, geolocation can be performed by less skilled staff by means of a GNSS receiver with real time kinematics (RTK) or post processing kinematics (PPK) reducing costs and work time. Geolocation can be done even without real time or post processing kinematics if the provided accuracy is enough, requiring much cheaper equipment. However, coordinates still need to be referenced to the site local system used in the project. In georeferencing terms, the local system is completely arbitrary and disconnected from any well known CRS. A site calibration (or site localization) is the process of finding a bijection between coordinates in a well known CRS and a site local system with a minimal error in the area of interest. The problem is normally formulated as a least squares optimization of the transformation between two sets of points. This transformation allows the geolocation of new positions with cm accuracy at a fraction of the cost of other high-accuracy surveying methods.

Many surveying devices provide a site calibration feature, but the algorithms are proprietary and the computed solution can only be exported to and used by software that is compatible with the closed proprietary formats involved. This effectively ties the user to the vendor ecosystem or requires to perform a new and potentially different calibration for every incompatible software tool used in the project. In this paper we present an complete and interoperable solution that can be implemented purely in terms of open source software and standards. While the mathematical formulation is a well known and solved problem, to the best of our knowledge, the novelty of our approach resides in its complete openness.

Our main contribution is the precise description of the workflow involved in obtaining the mathematical solution of the site calibration problem and its representation as a self-contained coordinate reference system. The mathematical problem can be solved using any linear algebra tool box, but we show how it can be implemented using functionality present in the open source library Eigen. As for the representation, our method relies on the OGC 18-010r7 open standard representation format [1], commonly known as WKT version 2. In this context, self-contained means that the final description of a site calibration embeds a well known CRS definition and the transformation method and parameters to transform coordinates from this system to the site local system. We have tested these coordinate transformations using several possible representations in the open source programming library PROJ version 9.2.0 [2]. The combination of WKT2 and PROJ allow for off-the-shelf interoperability for any application using them in an open and standard manner. The usage of WKT2 as a representation format is particularly convenient because it is a text-based representation that is very easy to store, transmit and process and on top of that, it is human readable. Part of the work carried out in this research has been contributed to PROJ 9.2.0 source code, as previous versions lacked required functionality or suffered from implementation issues.

A site calibration can be solved in different ways ([3], [4]). Another important contribution of this paper is the comparison and accuracy analysis of two mathematical methods that result in two different WKT2 representations. Following the terminology presented in the third version of ISO 19111 standard [5], the first and simpler method produces a derived projected system by solving a 3D problem and relying on a PROJ-specific 3D transformation. The second one splits the problem into its horizontal and vertical components. The output is a compound coordinate reference system made of a derived projected horizontal system and vertical system with a vertical and slope derivation. This other method relies only on well known transformations registered in the EPSG Geodetic Parameter Dataset. We discuss the merits and disadvantages of each approach in terms of self-explainability of the solution and sensitivity to different types of measuring errors, in particular in the vertical axis, where GNSS receivers are known to have less accuracy.

UBT E / N209 - Floor 3
11:00
11:00
30min
An end-to-end deep learning framework for building boundary regularization and vectorization of building footprints
Simon Šanca

With increasing digitalization and automation, there is a need to develop automatic methods to maintain and update public information stored in spatial databases. The building register stores public, building-related information and is the fundamental record for storing information and other relevant data necessary for taxation, public planning, and emergency services about buildings. Up-to-date building footprint maps are essential for many geospatial applications, including disaster management, population estimation, monitoring of urban areas, updating the cadaster, 3D city modeling, and detecting illegal construction cases (Bakirman, et al., 2022.). There are many approaches for building extraction from various data sources, including satellite, aerial, or drone images and 3D point clouds. However, there is still a demand for developing methodologies that can extract segment, regularize and vectorize building footprints using deep learning in and end-to-end workflow.

Today, automatic and semi-automatic methods have achieved state-of-the-art results in building footprint extraction by combining computer vision and deep learning techniques. Semantic segmentation is a method for classifying each pixel in an image and extract building footprints from remote sensing data. In the case of building segmentation, the goal is to classify each pixel on an image belonging to its corresponding class. Recent advances in deep learning for building segmentation have drastically improved the accuracy of the segmented building masks using Convolutional Neural Networks (CNNs).

Recently proposed semantic segmentation architectures include the application of advanced vision transformers for semantic segmentation. GeoSeg is one of the open-source semantic segmentation toolboxes for various image segmentation tasks. The repository has 7 different models, that can be used for either multi-class or binary semantic segmentation tasks, including four vision transformers: U-NetFormer, FT-U-NetFormer, DCSwin, BANet and three regular CNN models: MANet, ABCNet, A2FPN.

These specific methods for building segmentation involve training the neural network on a labeled image dataset, referred to as supervised learning. Semantic segmentation aims to distinguish between semantic classes in an image but does not individually label each instance. On the other hand, instance segmentation aims at distinguishing between semantic classes and the individual instances of each class. Many popular instance segmentation architectures exist, such as Mask R-CNN and its predecessors, R-CNN, Fast R-CNN, and Faster R-CNN. While the implementation of instance segmentation can be more challenging, the approach can be more effective in densely populated urban areas, where buildings may be close or overlapping.

A common problem with these methods is the irregular shape of the predicted segmentation mask. Additionally, the data contains various types of noise, such as reflections, shadows, and varying perspectives, making the irregularities more prominent. Further post-processing steps are necessary to use the results in many cartographic and other engineering applications (Zorzi et al., 2021).

The solution for the irregularity of the building footprints is to use regularization. Regularization is a technique in machine learning that applies constraints to the model and the loss function during the training process to achieve a desired behaviour (Tang et al., 2018). Applying regularization constrains the segmentation map to be smoother, with clearly defined and straight edges for buildings. As a result, the building footprint becomes less irregular when occluded and visually more appealing. Most studies apply regularization after image segmentation. According to our knowledge, there need to be more studies that apply regularization directly during model training. Another alternative would be to provide an end-to-end workflow for regularized building footprint extraction consisting of three parts: (1) segmentation, (2) regularization and (3) vectorization.

We propose an end-to-end workflow for building segmentation, regularization and vectorization using four different convolutional neural network architectures for binary semantic segmentation task: (1) U-Net, (2) U-Net-Former, (3) FT-UNet-Former and (4) DCSwin. We further improve the building footprints by applying the projectRegularization method proposed by (Li et al., 2021). The technique uses a boundary regularization network for building footprint extraction in satellite images combining semantic segmentation and boundary regularization with an end-to-end generative adversarial network (GAN). Our approach will perform semantic segmentation with our trained models and then perform boundary regularization on the segmentation masks. We aim to prove the scalability of projectRegularization on a different segmentation task, including aerial images as the data source. The last step in our approach is to develop a methodology for efficient vectorization of the segmented building mask using open-source software solutions. We aim to make the results practically applicable in any GIS environment. The dataset used for testing our developed method will be the MapAI dataset used for the MapAI: Precision in Building Segmentation competition (Jyhne et al., 2022) arranged with the Norwegian Artificial Intelligence Research Consortium in collaboration with the Centre for Artificial Intelligence Research at the University of Agder (CAIR), the Norwegian Mapping Authority, AI:Hub, Norkart, and The Danish Agency for Data Supply and Infrastructure.

We aim to produce better representations of building footprints with more regular building boundaries. After successful application, our method generates regularized building footprints, that are useful in many cartographic and engineering applications. Furthermore our regularization and vectorization workflow is further developed into a working QGIS-plugin that can be used to extent the functionality of QGIS. Our end-to-end workflow aims to advance the current research in convolutional neural networks and their application for automatic building footprint extraction and, as a result, further enhance the state of open-source GIS software.

UBT E / N209 - Floor 3
11:30
11:30
30min
Bulldozer, a free open source scalable software for DTM extraction
Dimitri Lallement

This paper introduces a scalable software for extracting Digital Terrain Models (DTM) from Digital Surface Models (DSM), called Bulldozer. DTMs are useful for many application domains such as remote sensing, topography, hydrography, bathymetry, land cover maps, 3D urban reconstruction (LOD), military needs, etc. Currant and incoming LiDAR and spatial Earth Observation missions will provide a massive quantity of 3D data. The spatial mission CO3D will deliver very high resolution DSMs at large scale over emerging landscapes. The IGN LiDAR HD mission is currently delivering high density point clouds of French national territory. This trend motivated the French spatial agency (CNES) to focus on the development of tools to process 3D data at large scale. In this context, we have developed a free open-source software, called Bulldozer, to extract a DTM from a DSM at large scale without any exogenous data while being robust to noisy and no-data values. Bulldozer is a pipeline of modular standalone functions that can be chained together to compute a DTM. A pre-processing step contains specific functions to clean the input DSM and prepare it for a future DTM extraction such as disturbed area detection, hole filling and outer/inner nodata management. Then the extraction of the DTM is based on the original Drape Cloth principle, which consists of an inversion of the Digital Surface Model, followed by a multiscale representation of the inverted DSM on which an iterative drape cloth computation is applied to derive the DTM. Finally a post-processing step is achieved to obtain the final DTM.

We have addressed a number of limitations that this type of algorithm may encounter. Indeed, when we do 3D stereoscopic satellite reconstruction, we can observe areas of residual noise in the DSM. They mainly come from uniform areas (shadows, water), or occlusions. These outliers disturb the drape cloth: it sticks to the edges of those disturbed areas and no longer fits the relief. This results in an underestimation of the DTM and generates pits in these noisy areas. To solve this problem, we have implemented a series of pre-processing steps to detect and remove these outliers. Once these areas are removed, we use a filling function that is more elegant than a basic interpolation method (e.g. rasterio fill nodata function). In addition, after the DTM extraction, we detect and remove potential residual sinks in the generated DTM. In order to keep track of the areas that we have interpolated or filled in, Bulldozer also provides a pixel-wise quality mask indicating whether it was detected as disturbed (and therefore removed and filled in) or as interpolated following a pit detection.

Current stereo and LiDAR DSMs have a centimetric spatial resolution. However, the need to have such a high spatial resolution for DTM is not always relevant for numerous downstream applications. The multi-scale approach in Bulldozer allows to produce a coarser DTM by just stopping the process earlier in the pyramid. A final resampling of the DTM is done to fulfill the user-specific resolution. One main advantage of this feature is the potentially short execution time to produce a high-quality DTM depending on the DTM coarseness.

Another main contribution of our work is the adaptation of the original Drap Cloth algorithm to process DSMs of arbitrary size and from arbitrary sources. As explained in our previous paper, we introduce the concept of a stability margin in order to use a tiling strategy while ensuring identical results to those obtained if the DSM were processed entirely in memory. This tiling strategy allows a memory-aware extraction of the DTM in a parallel environment. This scalable execution is heavily based on the concept of shared memory introduced in Python 3.8 and the multi-processing paradigm.

Since our previous version we have been working on the accessibility of Bulldozer. Bulldozer can handle any input DSM as long as it is in a raster format. We have set up several interfaces to allow users of different levels to use it. A QGIS plugin was developed in order to allow novice users to use Bulldozer. For more advanced users, there is a Command Line Interface (CLI) to launch the tool from a terminal. And finally, for developers, they can use the Python API to launch the complete pipeline or call the standalone functions from the pipeline.

The efforts to improve the algorithmic performances allow the management of large DSMs while guaranteeing stability in the results, memory usage, and runtime. Currently, we achieve to extract a DTM from a 40000*70000px input DSM in less than 10 minutes on a 16 core/64-GB RAM node. We believe that its ability to adapt to several kinds of sensors (high and low resolution optical satellites, LiDAR), its simplicity of use, and the quality of the produced DTMs may interest the FOSS4G community. We plan to present the tool during a workshop dedicated to CNES 3D tools, but we think that the method and the algorithmic optimizations could also interest the FOSS4G Academic Track audience through an academic paper.

The project is available on github (https://github.com/CNES/bulldozer), and we are currently trying to provide access to LiDAR and satellite test images in order to allow the community to reproduce the results.

UBT E / N209 - Floor 3
13:30
13:30
30min
Traffic speed modelling to improve travel time estimation in openrouteservice
Christina Ludwig

Time dependent traffic speed information on a street level is important for routing services to estimate accurate arrival times and to recommend routes which avoid traffic congestion. Still, most open-source routing services that use OpenStreetMap (OSM) as the primary data source rely on static driving speeds for different highway types, since comprehensive traffic speed data is not openly available. In this talk, we will present a method to model traffic speed by hour of day for the street network of ten different cities worldwide and its integration in route planning using the open-source routing engine openrouteservice.

Current datasets on traffic speed are either not openly available (e.g. Google traffic layer may be viewed but not downloaded), have very limited spatial coverage or do not follow a consistent data format (e.g. data published by municipalities). In addition, these datasets are often not based on the OSM street network, which means it would require extensive map matching procedures to transfer the traffic speed information to the OSM features. The most promising data set is currently provided by Uber Movement containing hourly traffic speed data along OSM street segments in 51 cities worldwide from 2015 until 2020. Still, this data only covers roads for which enough Uber user data is available.

In recent years, several studies have proposed methods and evaluated different data sources for traffic speed modelling. Most of them model traffic speed using machine learning methods and different indicators such as OSM tags (e.g. highway=*), points-of-interest (Camargo et al., 2020), centrality indicators (Zhao et al., 2017) or social media data (Pandhare & Shah, 2017). All of these indicators proved to be suitable for modelling traffic flow, but none of these studies has evaluated the effect of the modelled traffic speed on route planning and arrival time estimation.

In this study, we modelled traffic speed by hour of day on a street level for 10 cities worldwide based on OSM tags, an adapted betweenness centrality indicator and Twitter data. Uber traffic speed data was used as reference data to train and evaluate a gradient boosting regression model with different combinations of features. The simplest baseline model only used the OSM tags highway= and maxspeed= for prediction. The additional adapted betweenness centrality indicator was calculated to identify highly frequented street segments in each city by simulating several thousands of car trips in each city. In order to consider the geographic context, the original centrality indicator calculation was adapted to consider the spatial configuration of the city by including population distribution and relevant POIs during the calculation. Finally, Twitter data was used to account for the spatio-temporal distribution of human activity within the city. Using only the timestamp and geolocation of the tweets, the number of tweets in the vicinity of a street segment aggregated by the time of day was used as an indicator. The quality of the different models was evaluated with the help of the coefficient of determination (R2), the root mean square error (RMSE) and the mean absolute error (MAE). In all cities, the Twitter indicators improved the model, although this effect was only visible for certain road types. The Twitter indicators improved the accuracy especially for construction sites and motorways. For medium sized roads such as residential streets, the prediction did not improve. The centrality indicator improved the model as well but to a lesser extent. Best results were achieved in Berlin with an RMSE of 6.58 and R2 of 0.82.

To use the modelled traffic speed data in route planning, an experimental traffic integration was implemented in openrouteservice using which traffic speed data can be passed to openrouteservice as a CSV file. Each row contains the traffic speed at a certain hour of the day and for a certain OSM street segment specified by its OSM way id along with a start and end node. The data is structured the same way as the Uber Movement data making it possible to either integrate the raw Uber data or the modelled traffic speed. The effect of using external traffic speed data on the travel time estimation was evaluated by calculating multiple random car trips within different cities and at different times of the day and comparing it to the estimated travel time of the Google Routing API as well as the original openrouteservice implementation. In addition, the raw as well as the modelled traffic data were compared. The comparison between travel times in Google and openrouteservice showed regional differences in the accuracy of estimated travel times. These differences could be partly alleviated by incorporating raw or modeled traffic speed information.

Future research on traffic speed modelling using open data includes further development of the models and their transferability to other cities for which no Uber data is available. In this regard, the potential of deep learning approaches should be evaluated. Since Twitter has stopped providing their API for free, data from other social media platforms needs to be integrated. The potential for this is high though, since only the timestamp and geolocation of each tweet are used making the general approach easily transferable.

UBT E / N209 - Floor 3
14:00
14:00
30min
Methods and challenges in time-series analysis of vegetation in the geospatial domain
Agata Elia

The increasing availability and ease of access of global, historical and high-frequency remote sensing data has offered unprecedented possibilities for monitoring and analysis of environmental variables. Recent studies in the field of ecosystem resilience relied on indicators derived from time-series analysis, such as the temporal autocorrelation and the variance of a system signal (Dakos et al., 2015). The aforementioned availability of global, temporally and spatially dense time-series of indicators of biomass and greenness of vegetation, such as the normalized difference vegetation index (NDVI) among others, has boosted ecosystem resilience scientific applications to forests as well. The ecological definition of resilience corresponds with the capacity of a system to absorb and recover from a disturbance. When dealing with ecosystems increasingly affected by natural and anthropogenic pressures such as forests, monitoring their health is particularly relevant.

Forest ecosystems play a crucial part in the global carbon cycle and in any climate change mitigation strategy, despite being increasingly affected by natural and anthropogenic pressures. While anthropogenic action on forests is mainly represented by stand replacement, natural perturbations include wind throws and fires, as well as extended insects and disease outbreaks, such as the recent outbreak affecting Central Europe. These natural disturbances are strictly interconnected with the change in climate. A forest ecosystem with decreased resilience will be more susceptible to external drivers and their change and likely to shift into an alternative system configuration by crossing a tipping point.

However, remote sensing data quantifying vegetation and forests properties inherently carry information related to the climate as well. If not accounted for, these confounding factors, such as short-term climate fluctuations, may hide the actual vegetation anomalies focus of a study and the importance of other drivers in vegetation itself. In addition, the comparison of the same vegetation property between different geographical areas naturally affected by different climates is hindered.

In order to explore the relationships of a set of environmental metrics with an indicator of the resilience of forests and their relative predictive importances, a machine learning (ML) model is implemented. In this paper, we aim to present the general workflow and the challenges encountered in processing and analyzing the time-series of vegetation, climate and the other environmental variables data. Rather than focusing on the scientific outcomes of the implemented model, the focus of this paper will be on a workflow implemented to analyze the aforementioned time-series and on the methods and tools implemented to account for the climate effects on vegetation. Deseasonalization, detrending, growing season identification and removal of climatic confounding effects will be targeted by the presented tools and methods, being aware of the variety and heterogeneity of methodologies existing in the field of time-series analysis.

All data leveraged for this study are open. The long-term kNDVI is retrieved by processing the full time-series of daily MODIS Terra and Aqua Surface Reflectance at 500m from 2003 to 2021. The kNDVI is a nonlinear generalization of the NDVI that shows stronger correlations than NDVI and NIRv with forest key parameters. kNDVI is also more resistant to saturation, bias, and complex phenological cycles, and it is more robust to noise and more stable across spatial and temporal scales (Camps-Valls et al., 2021). Hourly ERA5-Land data with the same timespan at 10km are used to retrieve the set of climatic and environmental predictors including temperature, precipitation, etc. Most data are computed as 8 days averages or sums in order to retrieve resilience metrics from high temporal resolution time-series.

The data processing takes place mainly within Google Earth Engine (GEE) and the Joint Research Centre (JRC) Big Data Analytics Platform (BDAP). Google Earth Engine is a cloud-based geospatial analysis platform providing a multi-petabyte catalog of satellite imagery and geospatial datasets coupled with large analysis capabilities (Gorelick et al., 2017). The JRC Big Data Analytics Platform is a petabyte-scale storage system coupled with a processing cluster. It includes open-source interactive data analysis tools, a remote data science desktop and distributed computing with specialized hardware for machine learning and deep learning tasks (Soille et al., 2018). GEE is mainly used to pre-process MODIS data. The ERA5 pre-processing and the core time-series analysis are performed within the JEODPP, where main tools include R, Climate Data Operator (CDO) and netCDF Operators (NCO). The whole machine learning model is instead trained and run in R. The different platforms and tools implemented in the study highlight as well the heterogeneity of data involved, data availability and data formats, ranging from TIFF, netCDF and R objects.

The final aim of this paper is to present one of the many workflows that can be implemented when dealing with time-series of vegetation-related data in the geospatial domain, where climate plays a crucial role as a confounding effect. The importance of the availability of open data and open source tools and platforms in making this big data analysis possible is also strongly highlighted.

UBT E / N209 - Floor 3
14:30
14:30
30min
GeoAI for marine ecosystem monitoring: a complete workflow to generate maps from AI model predictions
Justine Talpaert

The world's oceans are being affected by human activities and strong climate change pressures. Mapping and monitoring marine ecosystems imply several challenges for data collection and processing: water depth, restricted access to locations, instrumentation costs and weather availability for sampling. Nowadays, artificial intelligence (AI) and GIS open source software could be combined in new kinds of workflows, to generate, for instance, marine habitat maps from deep learning models predictions. However, one of the major issues for geoAI consists in tailoring usual AI workflow to better deal with spatial data formats used to manage both vector annotation and large georeferenced raster images (e.g. drone or satellite images). A critical goal consists in enabling computer vision models training directly with spatial annotations (Touya et al., 2019, Courtial et al., 2022) as well as delivering model predictions through spatial data formats in order to automate the production of marine maps from raster images.
In this paper, we describe and share the code of a generic method to annotate and predict objects within georeferenced images. This has been achieved by setting up a workflow which relies on the following process steps: (i) spatial annotation of raster images by editing vector data directly within a GIS, (ii) training of deep learning models (CNN) by splitting large raster images (orthophotos, satellite images) and keeping raster (images) and vector (annotation) quality unchanged, (iii) model predictions delivered in spatial vector formats. The main technical challenge in the first step is to translate standard spatial vector data formats (e.g. GeoJSON or shapefiles) in standard data formats for AI (e.g. COCO json format which is a widely used standard for computer vision annotations, especially in the object detection and instance segmentation tasks) so that GIS can be used to annotate raster images with spatial polygons (semantic segmentation). The core process of the workflow is achieved in the second step since the large size of raster images (e.g. drone orthophoto or satellite images) does not allow their direct use into a deep learning model without preprocessing. Indeed, AI models for computer vision are usually trained with much smaller images (most of the time not georeferenced) and do not provide spatialized predictions (Touya et al., 2019). To train the models with geospatial data, both wide geospatial raster data and related vector annotation data have thus to be split into a large number of raster tiles (for instance, 500 x 500 pixels) along with smaller vector files sharing the exact same boundaries as the raster tiles (converted in GeoJSON files). By doing so, we successfully trained AI models by using spatial data formats for both raster and vector data. The last step of the workflow consists in translating the predictions of the models as geospatial vector polygons either on small tiles or large images. Finally, different state-of-art models, already pre-trained on millions of images, have been tuned thanks to the transfer learning strategy to create a new deep learning model trained on tiled raster images and matching vector annotations.
We will present and discuss the results of this generic framework which is currently tested for three different applications related to marine ecosystem monitoring dealing with different geographic scales: orthomosaics made of underwater or aerial drone images (for coral reef habitats mapping) and satellite images (for fishing vessels recognition). However, this method remains valid beyond the marine domain. The first use case was done with underwater orthomosaics of coral reef made with a photogrammetry model and annotated with masks. This dataset was composed of 3 different sites of orthomosaic acquisition. The second use case was on the recognition of species and habitat from geolocated underwater photos collected in different Indian Ocean lagoons. The last implementation of this method was done using satellite images of fishing harbors in Pakistan where vessels were labeled with bounding boxes. For the three use cases, model metrics are currently weak compared to similar computer vision tasks in the terrestrial domain but will be improved by using better training datasets in the coming years. Nevertheless, the technical workflow which manages spatialized predictions has been validated and already provides results which proves that AI-assisted mapping will value different types of marine images. Special attention is paid to large objects that can be spread over several tiles when splitting the raster. In this case, the model can indeed make errors by predicting different classes for the parts of the same object. Thus, a decision rule must make it possible to choose the most probable class among the different classes predicted by the model to designate the whole object. The spatialization of the results of the model can then be decisive for reducing the misclassified objects.

The code is implemented with free and open source software for geospatial and AI. The whole framework relies on Python libraries for both geospatial processing and AI (e.g. PyTorch) and will be shared on GitHub and assigned a DOI on Zenodo, along with sample data. Moreover, a QGIS plugin is under development in order to facilitate the use of pre-trained deep learning models to automate the production of maps whether on underwater orthomosaics, simple georeferenced photos or satellite images.

Beyond the optimization of model scores, one of the major perspectives of this work is to improve and ease AI-assisted mapping, as well as to include spatial information as input variables into a multi-channel deep learning model to make the most from spatial imagery (Yang et Tang, 2021, Janowicz et al, 2020).

UBT E / N209 - Floor 3
15:00
15:00
5min
GIS-based intelligent decision making support system for the disaster response of infectious disease
MIN YOUNG LEE

[Background and Purpose]
There are currently more than 7.5 million workers worldwide working in the field of fire, medical, and various emergency services with a total budget exceeding 400 billion euros. Additionally, approximately 15 billion euros are spent on equipment and other needs.
Water pollution caused by a downpour and climate change have a fatal impact on our health and the number of waterborne diseases continues to increase domestically and internationally. Therefore, the significance of technology which properly responds to various disasters caused by the climate crisis is increasing. Technologies regarding natural disasters have been widely developed; however, disaster response systems related to medical and biological emergencies are lacking. Many technologies for natural disasters have been developed, but there is no technology development and response system platform related to biological and medical risks, which are considered social disasters.

Furthermore, we aim to develop rapid and accurate pathogen detection technology which contains situational awareness, control/response methods, risk assessment, and epidemiological investigation methods. Eventually, by combining all these methods, we want to establish a user-centered GIS platform.

[Methods]
A decision support system that manages pathogen contamination was developed by using information obtained from sensors and fields. Moreover, risk assessment and epidemiological investigation technology which was developed through artificial intelligence and big data were included. The following three technologies were applied to analyze contaminated areas.

First, it provides a preview of data taken by satellites and collects images of aquatic regions to analyze and inform the pollution degree. Moreover, the turbidity of the water is provided from the data of aquatic regions which are constantly being filmed. Lastly, it also builds a water quality monitoring system based on data analyzed from water samples that are acquired from drones.

These images were taken from regions that humans cannot easily access. The technology provides both the spatial analysis result and images to users. Data and photos on social media are also analyzed to provide the severity of water pollution along with the specific spatial locations. To effectively provide and manage information on the platform, the system consists of seven layers: source management, data collection, interoperability, data harmonization, data application, data process, and security. All components in the data collection, interoperability, data harmonization, and security layers provide geographic information and statistics for users.

[Results]
Considering the functions of the system, the following platform can be applied in three fields: "Detection of pathogens and water pollution/situational response/post-investigation", "Infection management and decision support system " and "Protection and management of the first responder".

Two test locations were selected and the pilot case study was conducted in each location.

Limassol Pilot Case Study
- An earthquake near Limassol caused flash floods and landslides, polluting the Kouris Dam which is a primary reservoir in Limassol.

  • The water pollution over time can be checked by satellite images analyzed through PathoSAT. The turbidity and temperature of water detected by PathoSENSE and the results of satellite image analysis can be checked on the PathoGIS platform.

  • The user can check the areas heavily affected by flood and the magnitude of the tide is visualized on the PathoGIS data panel through graphs.

  • A warning alert appears when the pollution level exceeds the threshold and the user can check. If victims report the location of polluted areas on Twitter, it can be checked through PathoTweet.

Korean Pilot Youth Case Demonstration
A person in close contact with ASF wild pigs visited a farm near Soyang Dam and all pigs on the farm had to be dislocated due to mass infection. Many positive ASF cases were reported near the Soyang River inevitably.

  • Due to the unusually high precipitation in summer, the Soyang River Dam overflowed and caused leaked leachate to flow into the Soyang River Dam.

  • PathoSAT satellite images can be used to identify the boundaries of areas that can be potentially damaged by flooding. The time series visualization shows that water pollution is more severe near the location of ASF-positive cases.

-Since the government needs to respond to the rapidly increasing number of ASF cases, the results of ASF case analysis can be checked using the analysis application. Using such data to prevent African swine fever from spreading south, analysts can determine the optimal distance from SLL (Southern Limit Line), CLL (Civil Defense Limit Line), primary fence, or the need for additional fence installation.

  • As the number of ASF-positive cases increases, pollution in tap water can be easily found in Seoul since a large portion of water originates from Soyanggang Dam.

  • The PathoSENSE turbidity sensor notifies the current situation regarding water pollution. If the turbidity of tap water increases, Twitter reports on health problems also increase in Seoul simultaneously.

[Conclusion]
A platform that contains a database related to the spread of pathogens and provides AI-based information regarding the dangers of the situation will certainly help in responding to infectious diseases.

It will be able to strengthen its ability to respond to infectious diseases and disasters by using it as a tool to improve the capability of the first responder and reduce the time required to detect and respond to the situation.

In particular, there will be an effect of reducing industrial accidents by improving the ability to respond to unidentified risk situations that are likely to be encountered by first-time field responders.

By improving the ability to respond to unidentified situations, the number of industrial accidents will likely decrease. Shortly, when database expansion and the cost of maintenance becomes stable, in-depth data analysis of epidemiologic big data will be possible using pattern recognition and deep learning models.

UBT E / N209 - Floor 3
15:05
15:05
5min
Mapping COVID-19 epidemic data using FOSS.
Paolo Zatelli

The recognition of spatial and temporal patterns in pandemic distribution plays a pivotal role in guiding policy approaches to its management, containment and elimination.
To provide information about spatial and temporal patterns of a phenomenon four steps are required: the collection of data, the organization and management of data, data representation as tables, charts and maps, and finally their analysis with geo-statistical tools (Trias-Llimós et al 2020).
The collection of pandemic data poses a challenge: on the one hand, the highest spatial and temporal resolution is required to make the detection of patterns more effective (Carballada et al. 2021), allowing the application of containment tools as local as possible, on the other hand it presents major privacy problems.
For these reasons public COVID-19 datasets and maps are usually available at low spatial and temporal resolutions (Franch-Pardo et al. 2020), because averaging over time and space automatically provides a layer of anonymization by data aggregation.
In this research project, a database has been built and is continuously updated for the COVID-19 pandemic in the Trentino region, in the eastern Italian alps, near the border between Italy and Austria.
The Province of Trento, with a population of about 542,000 inhabitants, represents the primary corridor for transporting people and products between Italy, Austria and Germany. The area has also an intense tourist development, in particular for winter sports, with the presence of ski slopes, ski lifts and hotels.
These two features have been an important role in the diffusion of COVID-19 in the region because the movement of people, both through the main communication routes and the movement of tourists in the lateral valleys, has been the main driver in the virus spread. Therefore, the availability of a reliable database collecting COVID-19 cases, is fundamental to map the pandemic evolution (Mollalo et al. 2020).
At the same time the status of autonomous region of the Provincia Autonoma di Trento, allows greater discretion in the organization of health data, their scientific use and their dissemination. In this context the local government and the University of Trento, in particular its the Geo-cartographic Center (GeCo), have signed an Agreement for sharing COVID-19 data and their analysis (Gabellieri et al. 2021).
The resulting dataset collects the official number of the infected, clinically recovered, deceased people, and their age group. The dataset contains daily data at the municipal level, starting from the beginning of the COVID-19 epidemic in March 2020 until the whole 2022.
Data anonymization has been carried out by aggregating data on a weekly basis and by hiding data with small numbers, with the threshold set to 5.
The sole use of official data created by public agencies tasked with managing public health, specifically the local Health Authority (Agenzia Provinciale per i Servizi Sanitari, APSS), ensures the validity of their production process and strict observance of patient data confidentiality. rules
A database management system and a WebGIS has been created using Free and Open Source Software.
The back end of the system runs a database management system (DBMS), which manages the data, including the spatial components, and a web server, which provides access to the users.
The DBMS runs on MySQL, a relational database management system (RDBMS) available as Free Software under the GNU General Public License. MySQL provides the capability of storing and processing geographic data, following the OpenGIS data model. A custom procedure has been created to update the dataset, with the capability to import data from suitably formatted spreadsheets. A roll back option is provided in case of failure of the import procedure. Data base management and update functionalities are available only to authenticated WebGIS administrators and accessible through a dedicated web page.
The main goals in the design and development of the WebGIS have been the ease of use and clarity of data presentation, both on large screens and on mobile devices. This approach maximizes the user performance while exploring the data, by splitting the processing tasks and load between server and clients.
The system is comprised of a back end, running on a server, and a front end running in the user’s web browser.
Cartographic data include background maps from the OpenStreetMap (OSM) project and a map of the municipalities boundaries for the Province of Trento, which serves as a spatial basis for the dataset. Tabular data are linked to the respective geographic components using the unique municipal code field as key. OSM maps are available with the Open Database License, while the municipalities boundaries have been provided by the Provincia Autonoma di Trento under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license.
A virtual machine that houses both software and data powers the system on the server side.
The client side uses the Open Source Leaflet Javascript language libraries, available under BSD 2-Clause License, with custom scripts, which create the user interface and render geographic data into maps. This approach ensures flexibility and responsiveness on on desktop and mobile devices.
The exchange of the data between the server and the client is performed using geojson tables, created on the fly according to the user’s request. In a similar way, the data temporal variation graph is created by the js library, which automatically reads the dates and times of the analyses, extracts the relevant data from the database and display the graph.
As long as they fit within the database structure, the system automatically uses all of the accessible data. To protect the privacy of the patients, WebGIS users cannot access the source data even though maps and graphs can be downloaded as pictures.
The WebGIS is available at http://covid19mappa-trentino.geco.unitn.it/geosmart/index.php
Geo-statistical analysis aimed at the detection of spatial and temporal patters is underway.

UBT E / N209 - Floor 3
15:10
15:10
5min
JAXA EARTH OBSERVATOIN DASHBOARD WITH COG AND WMS/WMTS
Shinichi Sobue

In May 2020, NASA, ESA, and JAXA initiated a collaborative effort aiming at the establishment of the COVID-19 Earth Observation Dashboard and later in March 2021, extended its scope to global environmental change. Noting the increasing use of the joint Dashboard and the continuous users' requests for more information, NASA, ESA, and JAXA will continue through June 2024 to advance their joint work in the global understanding of the changing environment with human activities. This decision continues the collaboration on the analysis of three agencies' datasets and open sharing of data, indicators, analytical tools, and stories sustained by our scientific knowledge and expertise, to provide a precise, objective, and comprehensive view of our planet as an easy-to-use resource for the public, scientists, decision-makers, and people around the world. The dashboard provides an easy-to-use resource for all kinds of the public from the scientist to the decision-maker, including people not familiar with satellites. Based on accurate remote sensing observations, it showcases examples of global environmental changes on 7 themes: Atmosphere, Oceans, Biomass, Cryosphere, Agriculture, Covid-19, and Economy. The dashboard offers a precise, objective, and factual view without any artifacts of our planet. You can explore countries and regions around the world to see how the indicators in specific locations changed over time.

ESA, JAXA, and NASA will continue to enhance this dashboard as new data becomes available. This session explores this EO dashboard architecture, function, examples of thematic content through storytelling, and its utility amongst the broader EO and Data Science community.

To monitor COVID-19 environmental and economic impacts from space through provision of related indicators to the general public and decision makers, JAXA has developed and implemented earth observation (EO) dashboard jointly with ESA and NASA. In parallel, jointly developed EO dashboard, to provide climate change and earth science information to world wide users, JAXA also develops and operates one stop shopping portal site named “Earth-graphy” as JAXA's website for all news, articles, images related to JAXA's Earth Observation activities. Recently, to enhance JAXA’s “Earth-graphy” with interconnect with EO Dashboard through API, JAXA has developed the “JAXA Earth API” service to provide a wide variety of JAXA Earth observation satellite image data in an easy-to-use format and to promote the efficient and effective use of satellite data.

For earth observation satellite data provision, JAXA develops and operates G-Portal which is a portal system allowing users to search (satellite/sensor/physical quantity), and download products acquired by JAXA's Earth observation satellite including ALOS-2 ScanSAR data. With G-Portal standard product dissemination system, JAXA provides value added product services including Global Satellite Mapping of Precipitation (GSMaP), Himawari Monitor, JASMES, etc. For example, GSMaP provides a global hourly rain rate with a 0.1 x 0.1 degree resolution. JASMES provides the information on the current status and seasonal/interannual variability of climate forming physical quantities including solar radiation reaching the earth’s surface (photosynthetically available radiation), cloudiness, snow and sea ice cover, dryness of vegetation (water stress trend), soil moisture, wildfire, precipitation, land and sea surface, etc. (https://kuroshio.eorc.jaxa.jp/JASMES/index.html)

To provide easy access of JAXA’s earth observation data and information, JAXA developed JAXA Earth-graphy with API. The JAXA Earth API service consists of three main components shown in Figure 1: an API (Python language version, popular in fields such as data science, and a JavaScript language version (under development) for browser applications), a database, and a web application. First one is “JAXA Earth Data Explorer” which is a browser application that allows you to check various satellite data stored in a database. Second one is “JAXA Earth API for Python” allows users to acquire and use satellite data for any area, without being aware of differences in satellite, sensor type, resolution, etc. efficiently, effectively, and freely The API also has an IF function with the free GIS software QGIS, allowing for immediate acquisition and display of data. Final one is “JAXA Earth Database” which contains 74 types of data including elevation, surface temperature, vegetation index, precipitation, and land cover classification maps. The database contains data by cloud optimized GeoTIFF (COG) format and metadata by STAC format named “CEOS Analysis Ready Data for Land (CARD4L)”. And also, JAXA implement prototyping system of OGC WMS/WMTS system as a frontend system of JASMES and Earth-graphy to provide data and information to Trilateral cooperation EO dashboard since 2022.

Through this WMS/WMTS and JAXA Earth API, JAXA’s EO dashboard is linked with jointly developed EO dashboard. Thus, worldwide users can access JAXA’s data and information through joint developed EO dashboard. Furthermore, JAXA also started to develop Japanese language version of UI for joint developed EO dashboard with increasing products and information. This paper describes overview of JAXA’s EO dashboard system development. Especially Japanese Advanced Land Observing Satellite-2 (ALOS-2) L-band SAR data for forest monitoring.

UBT E / N209 - Floor 3
15:15
15:15
5min
A Comparative Study of Methods for Drive Time Estimation on Big Geospatial Data: A Case Study in the U.S.
Devika Kakkar, Xiaokang Fu

Travel time estimation is used for daily travel planning and in many research fields such as geography, urban planning, transportation engineering, business management, operational research, economics, healthcare, and more (Hu et al., 2020). In public health and medical service accessibility studies it is often critical to know the travel time between patient locations and health services, clinics, or hospitals (Weiss et al., 2020). In support of a study aiming to characterize the quantity and quality of pediatric hospital capacity in the U.S., we needed to calculate the driving time between U.S. ZIP code population centroids (n=35,352) and pediatric hospitals, (n=928) a total of over 32 million calculations. There currently exist numerous methods available for calculating travel time including (1) Web service APIs provided by big tech companies such as Google, Microsoft, and Esri, (2) Geographic Information System (GIS) desktop software such as ArcGIS, QGIS, PostGIS, etc, and (3) Open source packages based on program languages such as OpenStreetMap NetworkX (OSMnx) (Boeing, 2017) and Open Source Routing Machine (OSRM) (Huber & Rust, 2016). Each of these methods has its own advantages and disadvantages, and the choice of which method to use depends on the specific requirements of the project. For our project, we needed a low-cost, accurate solution with the ability to efficiently perform millions of calculations. Currently, no comparative analysis study evaluates or quantifies the existing methods for performing travel time calculations at the national level, and there is no benchmark or guidance available for selecting the most appropriate method.

To address this gap in knowledge and choose the best drive time estimator for our project we created a sample of 10,000 ZIP/Hospital pairs covering 49 of the 50 U.S. states with variable drive times ranging from a few minutes to over 4 hours. With this sample, we calculated the drive time using the Google Map API, Bing Map API, Esri Routing Web Service, ArcGIS Pro Desktop, OSRM, and OSmnx and performed a comparative analysis of the results.

For the Google, Bing, and Esri web services we used the Python requests package to submit requests and parse the results. Within ArcGIS Pro, we manually used the Route functions to calculate routes on a road network provided by Esri and stored locally. For OSMnx we utilized Python to perform the street network analysis using input data from OpenStreetMap. For OSRM we utilized C++ through the web API. OSRM provides a demo server to enable testing the routing without loading the road network data locally, and we used this for calculating drive times for our 10,000 samples. For generating visualizations we used Networkx and Igrah to display the shortest path of the drive time routing result, and graphs of our comparative analysis.

When comparing drive time estimations using these 6 technologies we found: (1) There are very little differences among Google, Bing, OSRM, ESRI web service, and ArcGIS Pro when the route drive time is less than roughly 50 minutes (2) For travel time estimations of routes greater than 50 minutes the Google and Esri methods were extremely close. The OSRM estimates produced travel times about 10% longer than other methods, and Bing’s estimates were about 10% lower than Google and ESRI. (3) Overall, OSmnx estimates travel times lower than any other method because it estimates the shortest distance using the maximum velocity. In general, the different methods employ different strategies for considering traffic conditions. When long-distance travel is estimated the use of highways is required, and each method employs specific parameters to account for traffic and resulting travel speed. Because of the complexity of modeling traffic conditions, it is difficult to say which method provides the most accurate and realistic driving times without empirical data being collected. Regarding cost, the OSmnx and OSRM are both open-source, while the other methods have a cost for API usage (Google, Esri, Bing) and desktop software (ArcGIS Pro). For processing efficiency, Google, Esri and Bing were all efficient, each able to process the dataset in roughly one hour. We found the processing power of OSMnx was limited in the size of the road network it could handle, so we had to divide the ZIP/Hospital pairs into subsets by state, and calculate them separately, which was a laborious process. We found OSRM to be the most efficient, able to handle 10,000 requests in less than a minute. We ran OSRM in a high-performance cluster computing environment. This process included one hour of setup to download the OpenStreetMap data for the entire U.S. onto the cluster. Then we used Python requests to calculate the drive times and parse the result for analysis. The total processing time for the 32 million calculations ended up being 12 minutes.

Using OSRM provided us with a low-cost, accurate, efficient solution to calculating drive times between 32M origin/destination pairs. We feel our study provides valuable guidance on calculating drive time in the United States, offering a benchmark comparison model between 6 different methods. We encourage others to utilize the code produced for this project; all of it is in the process of being published on GitHub as open-source. Our analysis was just for the U.S., and performing similar analyses in other countries will provide more insight into how useful the different methods are globally. In summary, this comparative study allowed us to produce drive times in the most efficient manner in order to support the larger objective of characterizing the quantity and quality of pediatric hospital capacity in the U.S.

UBT E / N209 - Floor 3
15:20
15:20
5min
Developing a FOSS4G based Walkable Living Area Planning Support Module to Assists the Korean 15-minute City
Junyoung CHOI

The concept of 15-minute cities, which aims to provide residents with access to amenities and services within a 15-minute walk, has gained popularity in recent years [1]. In Korea, there have been discussions about supporting the planning of walkable neighborhoods based on Chrono-Urbanism, a concept that is the basis of the 15-minute city concept that residents can receive services necessary for their daily lives in the same place where they live. Planning support based on Chrono-Urbanism measures the walkability of services for various age groups based on the distance to reach physical activity centers such as walking and bicycling from small living area units, and places necessary living infrastructure (urban amenities). The bottom-up planning approach that reflects the needs and living conditions of citizens, such as walking routines, can generate planning issues that reflect the needs of citizens through iterative alternative generation and evaluation in support of planning decisions by learning the surrounding environmental conditions using AI techniques.
However, to implement this concept, it is necessary to develop tools based on free and open source technologies for spatial planning. Previous studies have developed open source-based tools using Open Street Map (OSM) or open data of each city and used them effectively for urban planning [2,3]. In this study, we aim to develop a tool that supports the measurement of walkability and the distribution of urban amenities considering age groups as well as walkability, bicycle accessibility, and public transportation accessibility by utilizing free and open source software (FOSS4G) tools for spatial information.
First, we design walkability, including pedestrian walkability, bicycle accessibility, and transit accessibility, based on each home-based or residence-based trip.
To measure the walkability of a city, we need to consider pedestrian-friendly urban infrastructure elements such as sidewalks and crosswalks. When designing for measurement, design a walking network that provides information on the physical characteristics of the road network and a database that contains the distribution of residents by gender and age. By analyzing data based on the pedestrian network for different age groups, it is possible to determine the level of walkability in different urban space conditions. Similarly, the same data can be used to measure bicycle accessibility, taking into account bike lanes, bike parking facilities, and other factors. Access to public transportation can be measured using data from transportation agencies, including information about the frequency and routes of public transportation.

Second, we design a tool to measure accessibility to urban amenities based on Python spatial information and distribute the location of urban amenities according to accessibility.
We develop a tool that integrates data such as walkability, bicycle accessibility, and public transportation accessibility to determine the best locations for urban amenities. A network-based method of minimizing travel costs will be used to determine the locations [4]. The tool will be developed using QGIS and the Python programming language. The tool is designed by considering various parameters such as resident and traveling population, distance from existing amenities, and urban environment in various living areas.
Third, the tool is used to evaluate local 15-minute cities.
The implemented tool is designed to be used and evaluated by officials, planners, and researchers working on 15-minute cities. The tool can be used to identify areas that need more urban amenities and to deploy existing amenities in ways that enhance walkability. The tool can also be used to determine the feasibility of locating new facilities such as parks, community centers, and other public spaces. In addition, the tool is designed to be customizable to meet the environmental needs of different cities.
The development of a FOSS4G-based urban amenity distribution tool based on walkability measures can provide the following benefits. First, it provides an age- and facility-related data-driven approach to the placement of urban amenities, ensuring that amenities are located in areas that are easily accessible to citizens. Second, it provides a spatial structure that can promote the use of sustainable transportation modes such as walking, biking, and public transit. Third, it can encourage more inclusive urban development by ensuring that amenities are distributed in a more equitable manner.
In conclusion, the development of a FOSS4G-based urban amenity distribution tool can play an important role in the realization of the concept of walkable livability, a 15-minute city concept in South Korea. This tool can measure and distribute urban amenities based on walkability, bicycle accessibility, and public transportation accessibility, providing a way to create healthier, more equitable living areas. Implementing the tool to generate a range of alternatives will allow planners to learn from the alternatives about desirable walkable urban amenity alternatives. For urban planners and practitioners, open-source tools make it easy to take data-driven action and learn and innovate from what others have done. Transparency in the planning process allows citizens to understand the planning process and engage with planners, as well as be part of the planning process.

UBT E / N209 - Floor 3
16:00
16:00
30min
Impact of Geolocation Data on Usability in Augmented Reality: A Comparative User Test
Julien Mercier

In 2017, the Media Engineering Institute (MEI) and the Institute of Territorial Engineering (INSIT) developed a proof-of-concept location-based augmented reality (AR) application that enabled the visualization of geospatial data on biodiversity. A test with ten-year-old pupils confirmed the relevance of using this technology to support educational field trips. However, it also revealed usability challenges that needed to be addressed in a subsequent iteration. More precisely, three main issues were outlined: The system should allow non-expert users to create AR experiences using open geospatial data [2]; Users should be able to publish observations in AR rather than being restricted to a passive viewing role; The instability of the points of interest (POIs) causes usability problems such as a prolonged interaction time with the screen.
In an attempt to address the first two of these challenges, we have designed and developed a cartographic authoring tool for the creation of location-based AR experience powered by open web frameworks (A-frame, leaflet.js, vue.js, hapi.js…) by leveraging a user-centered methodology [1]. We also developed a minimalist library for the creation of WebXR location-based POIs in A-frame. The resulting application allows anyone without technological know-how to create AR learning experiences by importing/exporting open geospatial data and customizing the appearance of POIs by attaching medias (3D files, pictures, sound…) to them. These can be location-triggered (visible/audible) according to different conditions based on distance thresholds set by the user. The environments can be shared publicly so that anyone may contribute, or set to visible but non-editable for visualization privileges only. The application also features geolocation tracing and in-app event logging for analysis.

The third challenge disclosed by the proof-of-concept application was imputed to the inaccurate geolocation data available, as evidenced by previous studies [3–5]. Indeed, geolocated POIs are anchored in the AR interface by computing the geographical coordinates they are anchored to related to the user’s esteemed position. On mobile devices, GNSS accuracy typically lies between 1 m and 30 m. Due to its impact on anchoring, this lack of accuracy can have deleterious effects on usability. We wondered whether using more accurate data would lead to a better usability score. We thus designed a comparative user test (n = 54) to evaluate the application used in combination with two different geolocation data types: While half of the participants used the BiodivAR application in combination with data provided by the devices’ embedded GNSS as a control group, an experimental group used the application combined with Ardusimple RTK kits. During the test, in-app events and geolocated traces were recorded by the application. 47 participants also agreed to wear an eye-tracking device that captured their gaze direction in order to measure for how long they interacted with the screen versus nature. Directly after the test, participants answered an online survey containing a demographic questionnaire, an open question, and three different usability questionnaires:
System Usability Scale (SUS), for a generic evaluation of the system.
User Experience Questionnaire (UEQ), for a comprehensive measure of user experience in terms of attractiveness, efficiency, reliability, stimulation, and novelty.
Handheld Augmented Reality Usability Scale (HARUS), a mobile AR-specific questionnaire.

The in-app events and geolocated traces also allowed us to compute variables such as the total distance traveled, the time spent visualizing medias, or how long users have been using the interactive 2D map for navigation while in AR mode. Some of these results are still undergoing thorough analysis so that the role of each of these independent variables (interaction time, total distance, amount of POIs visited, etc.) on user-reported usability can be investigated by means of multiple linear regression. For example, encoding eye-tracking data to measure interaction with screen versus nature is particularly challenging and time-consuming. Thanks to this process, we expect to be able to further observe the impact of geolocation data on usability. will allow us to compare how much time users interacted with the screen versus nature within each group. Finally, thanks to unstructured feedback gathered through open questions, we shall be able to further improve the BiodivAR application before it is tested on the field, in the context of an educative field trip with pupils.

The collected data allowed us to get an overall evaluation of the system as well as more specific observations on the impact of the different geolocation data. While we expected the RTK group to give a better usability score, the exact opposite happened. We initially noticed that the using the RTK kit caused the CPU to crash more often than usual, because it required an additional NTRIP client application to run in the background. We therefore assumed that these crashes pejorated the usability. But when looking at the events logged, the RTK group actually suffered less crashes than the control group. It is by observing the geolocated traces’ shapes that we noticed the RTK group’s were star-shaped, revealing numerous outlying points in the data. The GNSS control group’s traces did not feature such outliers, which we found out was due to an embedded filter. It turns out this filter cannot be applied when using RTK positioning systems, because they also obliterate measurement times, which are typically required by professionals. Using our system in combination with RTK kits made the initial positioning of the augmented objects more accurate, but it also brought a new source of jittering, which we presume resulted in the lower reported usability score.

From the results of our comparative test, we draw the following conclusions: While we have failed to better the usability of our location-based AR system by combining it with RTK data, our test has however demonstrated a significant negative impact of varying geolocation data source on usability. This reinforced our intention to keep researching hardware and software solutions for efficient improvement of geolocation data.

UBT E / N209 - Floor 3
16:30
16:30
30min
Enabling Knowledge Sharing By Managing Dependencies and Interoperability Between Interlinked Spatial Knowledge Graphs
Nathan McEachen

Knowledge sharing is increasingly being recognized as necessary to address societal, economic, environmental, and public health challenges. This often requires collaboration between federal, local and tribal governments along with the private sector, nonprofit organizations and institutions of higher education. In order to achieve this, there needs to be a move away from data-centric to knowledge sharing architectures, such as a Geographic Knowledge Infrastructure (GKI) to support spatial knowledge-based systems and artificial intelligence efforts. Location and time are dimensions that bind information together. Data from multiple organizations need to be properly contextualized in both space and time to support geographically based planning, decision making, cooperation and coordination.

The explosive uptake of ChatGPT seems to indicate that people will increasingly be getting information and generating content using chatbots. Examples of AI-driven chatbot technology providing misleading, harmful, biased, or inaccurate information due to a lack of access to information highlight the importance of making authoritative knowledge accessible, interoperable, and usable for machine-to-machine readable interfaces though GKIs to support AI efforts.

Spatial knowledge graphs (SKG) are a useful paradigm for facilitating knowledge sharing and collaboration in a machine-readable way. Collaboration involves building graphs with nodes and relationships from different entities that represent a source of truth, trusted geospatial information, and analytical resources to derive new and meaningful insights through knowledge inferencing by location or a network of related locations.

However, due to a lack of standardization for representing the same location and for managing dependencies between graphs, interoperability between independently developed SKGs that reference the same geographies is not automated. This results in a duplication of effort across a geospatial ecosystem to build custom transformations and pipelines to ensure references to geographic data from different sources are harmonized within a graph for the correct version and time period and that these references are properly maintained over time.

What is needed is a way to manage graph dependencies, or linking, between organizations in a more automated manner. References to geographic features (i.e., geo-objects) from graphs that are curated by external (and ideally authoritative) entities should come from formally published versions with the time period for which they are valid (i.e., the period of validity). As newer versions of SKGs are published for different periods of validity, updating dependencies between graphs should be controlled and automated.

It turns out that an approach for a similar kind of dependency management has been in mainstream use for decades in a related field. Software developers long ago abandoned the practice of manually managing code artifacts on filesystems and manually merging changes to code. Rather, they use a combination of namespacing for identity and reference management along with formally managing versioned releases in a code repository. Although there are nuanced differences between software code versioning and dependency management between SKGs, there are enough similarities to indicate distinct advantages to treating geospatial data as code for the purpose of managing graph dependencies to automate knowledge sharing.

We have been developing such an approach since 2018 with the core principles implemented in an open-source application called GeoPrism Registry (GPR), which utilizes spatial knowledge graphs to provide a single source of truth for managing geographic data over time across multiple organizations and information systems. It is used to host, manage, regularly update hierarchies and geospatial data through time for geographic objects. GPR is being used by the ministry of health in the country of Laos to manage interlinked dependencies between healthcare related geo-objects and geopolitical entities. More recently it has been installed in Mozambique for use by the national statistics division (ADE) to meet their National Spatial Data Infrastructure (NSDI) objectives to facilitate cross-sectoral information collaboration using common geographies for the correct periods of time.

Currently, GPR is being considered by the US Federal Geospatial Data Committee (FGDC) to help build a GKI for GeoPlatform.gov, which is mandated by the United States Geospatial Data Act of 2018 (GDA) to improve data sharing and cooperation between public and private entities to promote the public good in a number of sectors. US federal agencies are developing spatial knowledge graphs, but they are not interoperable using machine-to-machine readable interfaces with those from other agencies. We led a requirements, design, and scoping effort that revealed a GKI architecture for GeoPlatform, will at a minimum, require the following machine-readable characteristics to enable knowledge interoperability using SKGs at scale.

Authoritative:
Copies of data always remain authoritative by preserving the identity of its source.

Temporal:
The period of validity should be specified in metadata as a moment in time (such as a date), a frequency (e.g., annually or quarterly), or an interval (year 2000 to 2005) in which data have not changed relative to when they were published.

Distributed:
Utilize the Data Mesh architecture pattern by giving organizations the ability to publish locally hosted graph assets. Other organizations can build fit-for-purpose graphs by pulling and merging only what is needed from authoritative sources.

Transitive:
Changes made to graphs should automatically propagate to the graphs that reference them, even if the dependency occurs via multiple layers of indirection (i.e., a dependency of a dependency).

Versioned:
Metadata should capture the published version.

Interoperable:
The semantic identity of data types, attributes, and relationships should be defined such that equivalency and identity can be established. This would include the use of namespaces, controlled vocabularies, taxonomies, ontologies, geo-object types, and graph edge types.

In this paper we will present the approach for implementing these GKI requirements and GeoPlatform.gov interoperability use cases using open-source software. This will include the Common Geo-Registry concept for managing the authoritative and interoperable requirements, the Data Mesh framework for making the solution distributed and transitive, and the spatial knowledge graph repository for managing temporal, and versioned dependencies. We will also present the metamodel architecture used by GeoPrism Registry for managing graph dependencies, facilitating interoperability, publishing, and how it currently is being used as a graph repository.

UBT E / N209 - Floor 3
10:30
10:30
30min
Mobile mapping solutions for the update and management of traffic signs in a road cadastre free open-source GIS architecture
Federica Gaspari

Digitization and update of road network databases represents a crucial topic for a good management of critical infrastructures by public administrations. Similarly to other European countries such as Cyprus (Christou et al., 2021), since 2001, Italian road-owning agencies have been required by the Ministry of Infrastructure and Trasport to build and maintain a road cadastre, i.e., a mapping inventory of their road networks. Such architecture should include georeferenced information about streets as well as all ancillary elements regulated by road regulations, ranging from safety and protection assets to traffic signs. In particular, due to the high frequency of new signals installation and substitution, traffic signs require a well-structured, flexible and efficient workflow for collecting and manipulating georeferenced data.

In agreement with the official national requirements, in 2019 the Province of Piacenza adopted and implemented a digital cadastre with GIS and WebGIS functionalities built on top of free and open-source software like PostgreSQL as Database Management System and QGIS for the manipulation of geodata. Such software infrastructure ensures flexibility of usage as well as the possibility to expand its functionalities with other easy-to-use open-source applications in an architecture (Gonzalez Alba et al., 2019, Gharbi & Haddadi, 2020). In this framework, this work illustrates a case study of a flexible and low-cost mapping methodology for documenting the current state of traffic signs. Indeed, mobile applications are able to substitute the old procedure that consisted in the documentation of element installation on paper support, implying the risk of transcript errors as well of loss or deterioration of the original survey document.

Before defining the required steps of mobile mapping, understanding how traffic signs are modelled inside the adopted DB model was crucial. Such elements are indeed implemented through a one-to-many relationship between an entity representing the sign holder (parent table) and another one for the signs themselves (child table). In this way, it is possible to collect and manage individually information about each sign (main ones and supplementary ones as well) always linked to their support pole.

Together with the road cadastre responsible, an analysis was conducted to understand the specific needs for the application and the type of users involved in the in-situ survey process. This phase resulted in the choice of two possible open-source solutions to be tested and compared in terms of compatibility and usability by users with different technical background, integration with the actual infrastructure and possibility of customisation: Qfield because of its native compatibility with QGIS libraries and ODK Collect thanks to its simplified graphic user interface (GUI) that resembles commonly used data collection forms without a visible GIS GUI.

For the entire workflow, differences in the two applications were evaluated. For instance, having a direct inheritance of the original QGIS attribute table for Qfield, while in the case of ODK Collect the definition of each attribute of the form is required. Peculiarities in implementing 1-n relationships and widget formats have been identified too, aiming to understand the reproducibility of both procedures. Once the form design was finalised for both the applications, a guided field survey was conducted in order to train new users and to test the usability of the mobile mapping solutions. For this purpose, a series of test sites was chosen, identifying roads to be surveyed with different features or conditions. A diverse sample of test users was involved in the data collection activity, ranging from people with no previous experience in geospatial technologies to GIS technicians.

Finally, the data collected with both applications were reviewed in QGIS environment in a validation phase aimed at identifying differences between the dataset, their completeness, their position accuracy and their coherence with a ground truth represented by photos of corresponding traffic signs taken on field with mobile devices. First, the validation consisted in checking if the mapped elements were located within buffers (of 5, 25 and 50 meters) calculated along the surveyed streets and then evaluating the coherence between the street code inserted in the element field and the one of the roads in whose buffer such sign is included. Hence, a similar approach was adopted for comparing the value of municipality associated to the single sign field to the administrative boundaries within which it falls. A semantic validation on the traffic sign type documented with the mobile mapping was conducted by comparing values with what was depicted in photos taken on field. The entire validation routine process was automatized as much as possible with Python scripts using the PyQGIS library. All the validation scripts together with sample dataset will be included in a Github repository in order to make them openly reusable and adaptable to other specific project needs. An analysis on the synchronization process of the collected data on the original main database was evaluated too, marking different approaches involving plugins or automatic scripts.

In order to evaluate user experiences with the different mobile applications, a LimeSurvey feedback form was provided to users who tested the tools on field. Such form was designed to collect insights on different steps of the workflow – form design, data collection and post-processing -, tracking and evaluating possible differences between users with different background and no previous knowledge of geospatial concepts. This resulted in highlighting potentials and issues linked to the adoption of Qfield or ODK Collect for traffic signs mapping.

This work aims at presenting a case study for the adoption of a mobile mapping solution in the field of public administration, understanding potentials and limitations of these possible approaches, also in terms of introducing new users to FOSS4G applications. Because of this, the transparency of the entire workflow is being documented on a dedicated Github repository with informative guides, a QGIS demo project, ODK format definition files and all codes adopted for validation and synchronization purposes.

UBT E / N209 - Floor 3
11:00
11:00
30min
COMTiles: a case study of a cloud optimized tile archive format for deploying planet-scale tilesets in the cloud
Markus Tremmel

Motivation

The state-of-the-art container formats for managing map tiles are the Mapbox MBTiles specification and the OGC GeoPackage standard. Since both formats are based on an SQLite database, they are mainly designed for a block-oriented POSIX-conform file system access. This design approach makes these file formats inefficient to use in a cloud native environment, especially in combination with large tilesets. To depoly a MBTiles database in the cloud, the tiles must be extracted and either uploaded individually to an object storage or imported in a cloud database and accessed by an additional dedicated tileserver. The main disadvantages of both options are the complex workflow for the deployment and the expensive hosting costs. The Cloud Optimized GeoTIFF (COG) format already solves the problem for providing large satellite data in the cloud, creating a new category of so-called cloud optimized data formats. Based on the concepts of this type of format, geospatial data can be deployed as a single file on a cheap and scalable cloud object storage like AWS S3 and directly accessed from a browser without the need for a dedicated backend. COMTiles adapt and extend this approach to provide a streamable and read optimized single file archive format for storing raster and vector tilesets at planet-scale in the cloud.

Approach

The basic concept of the COMTiles format is to create an additional streamable index which stores the offset and size to the actual map tiles in the archive as so-called index entries. In combination with a metadata document, the index can be used to define a request for a specific map tile in the archive stored on a cloud object storage based on HTTP range requests. The metadata are based on the OGC “Two Dimensional Tile Matrix Set” specification which enables the usage of different tile coordinate systems. To minimize the transferred amount of data and to optimize the decoding performance, a combination of two different approaches for the index layout is used. As lower zoom levels are accessed more frequently and the number of tiles is manageable up to a certain zoom level (0 to 7 for a planet-scale tileset), all index entries are stored in a root pyramid and retrieved at once when the map is initially loaded. To minimize the size, the root pyramid is compressed with a modified version of the RLE V1 encoding of the ORC file format. For lazy loading portions of the index on higher zoom levels index fragments are used. To enable random access to the index without any additional requests, the index entries are bitpacked per fragment with a uniform size. Since the data are only lightweight compressed, the index entries can also be stream decoded and processed before the full fragment is loaded. To further minimize the number of HTTP requests the queries for the index fragments and tiles can be batched as they are both ordered on a space-filling curve like the Hilbert curve.

Results

One advantage that became obvious during the evaluation of COMTiles is the simplified workflow of deploying large tilesets. As only a single file must be uploaded to a cloud storage and no dedicated tile backend to be setup, COMTiles can also be deployed by non-GIS experts in a quick and easy way. During evaluation the main hypothesis could be confirmed that COMTiles can be hosted on a cloud storage with only fraction of the costs compared to the usage of a dedicated tile backend or an individual tile deployment. To determine the actual hosting costs of a planet-scale OSM tileset with 90 gigabytes in size was deployed on a Cloudflare R2 storage and accessed with 35 million tile requests. With the pricing plans of Cloudflare at the time of writing, only a cost of $1.35 per month has been incurred for the specified deployment. In this context the tile batching approach turned out to be an additional effective way of reducing the number of tile requests and therefore the costs. For example, when displaying a map in fullscreen mode the number of requests could be reduced by up to 80% on HD display and up to 90% on a UHD display. In terms of user experience, test users rated the additional latency for the index requests as negligible, especially when an additional CDN was used. Testing COMTiles against PMTiles, another cloud-optimized tile archive solution, was performed using two different map navigation patterns to measure the differences in the number of requests, data size transferred, and decoding performance. COMTiles outperformed PMTiles in a about 63 times faster decoding of portions of the index, reducing the processing time from about hundreds of milliseconds to a few milliseconds in a single user session. COMTiles also fetches about 3 times less data on average from a cloud storage. In addition the random-access design of the COMTiles index leads to one initial roundtrip less to the server, resulting in a faster initial map load. The main advantage of PMTiles is a about 10 times smaller size for a planet-scale index (~91 MB to ~880 MB). However, since cloud storage is cheap, the additional cost of the difference in the index size proved to be negligible.

Conclusions

In the evaluation it could be proven that COMTiles can simplify the workflow for deploying large tilesets and significantly reduce the storage costs while preserving almost the same user experience compared to a dedicated tile backend. The author is therefore confident that the concepts of the COMTiles format will play an essential role in the future for managing and deploying map tiles in a cloud native environment.

Sources

The evaluation steps and further improvements of the existing COMTiles format which form the basis of this paper are available under https://github.com/mactrem/com-tiles-evaluation. The derived improvements will be merged into the main repository under https://github.com/mactrem/com-tiles.

UBT E / N209 - Floor 3
11:30
11:30
30min
Motivating environmental citizen scientists and open data acquisition on openSenseMap with Open Badges
Frederick Bruch, Mario Pesch

The openSenseMap(1) is an open-source(2) citizen cyber-science platform that facilitates environmental monitoring by allowing individuals to measure and publish sensor data. The platform is designed to create a community-driven network of sensors to monitor various environmental factors, such as air, water quality, and much more. A significant advantage of the platform is that it operates on open data principles, whereby all sensor data
is accessible to the public(3). This openness encourages collaboration and facilitates innovation, which has led to numerous applications in environmental monitoring. Despite its success, the platform still faces challenges regarding user engagement and motivation, necessitating the incorporation of gamification strategies to enhance participation.

The christmas bird count, started by ornithologist frank chapman in 1900, is one of the earliest and longest-running citizen science projects in the world. Today, it involves thousands of birdwatchers who count birds over a 24-hour period in mid-december. The data collected during the christmas bird count provides scientists with valuable information about bird populations, migration patterns, and other important ecological trends. This project set the stage for the growth of citizen science initiatives, where people participate in scientific research.
Recently, there has been an increase in the number of citizen (cyber-)science projects, which leverage the power of the internet and digital technology to involve people in scientific research. These projects have had a significant impact on society, contributing to advancements in fields such as astronomy, ecology, and health. While these projects can be a lot of fun, sometimes the tasks for participants can be really monotonous, and they can lose motivation to continue being a part of the project. Therefore, project organizers need to keep participants engaged. This is where gamification comes into play. Applying game elements to anything that isn't a game is known as gamification. Adding elements of competition and rewards can help people stay engaged in the project and continue making contributions (Haklay, 2012). This can be especially helpful for long-term projects that require continual effort from participants.

Digital badges can be earned in a variety of settings and are a recognized symbol of skill or accomplishment. Although badges are a common gamification component, they are typically only usable in closed environments. The possibility of awarding badges for voluntarily participating in scientific research can increase participant motivation. The ability to display, share, and verify badges alongside skills and credentials from other environments has changed the game of digital credentials. This technology is called Open Badges.
This paper focuses on the motivational impact Open Badges can have on citizen science in the context of the openSenseMap platform. Users of the openSenseMap platform were surveyed for this study. Based on the results, a prototype was implemented, combining an open badge platform with the existing openSenseMap platform. The prototype added an open badge component to the platform, allowing users to earn badges for various achievements, such as contributing a certain number of measurements or completing a specific task.
The badges were designed to be displayed on the users profiles and could be shared on social media or other online platforms. This feature enabled participants to showcase their contributions and achievements, increasing their motivation to continue participating in the project. The survey results indicated that participants found the open badge component to make the citizen science platform more interesting, which may suggest that open badges have the potential to increase motivation and engagement in citizen science projects.
Furthermore, its important to note that the open badge platform (called mybadges(4) ) used in this project is open source(5), aligning with the spirit of collaboration and transparency in citizen science. By leveraging the power of open badges and open-source technology, this project has the potential to drive significant positive change in the field of cyber-science and promote reproducibility in scientific research.
In addition to its potential impact on citizen cyber-science, open badges can also be adapted to the open (geo)education context. Open Badges can provide learners with an opportunity to showcase their knowledge and skills in a tangible and transferable way (Halavais, 2012). A genealogy of badges: inherited meaning and monstrous moral hybrids). By earning badges for completing educational tasks, learners can build a portfolio of evidence that can be used to demonstrate their achievements and credentials. This can be particularly valuable in fields such as geospatial science, where there is a growing demand for individuals with specific technical skills and knowledge. The use of
Open Badges in open (geo)education can enhance the learning experience and increase learner motivation, leading to improved educational outcomes and better-equipped professionals in the field.
This paper explores the use of Open Badges, a gamification component, to enhance engagement and motivation in citizen cyber-science projects. The proposed approach uses an open-source citizen cyber-science platform, the openSenseMap, to collect and publish sensor data, making it accessible to the public. The incorporation of Open Badges can incentivize participants to contribute to the project continually. The results of our survey indicated that participants found the open badge component to be an engaging and motivating feature, which suggests that Open Badges have the potential to increase engagement in citizen science projects. This papers contribution aligns with the foss4g academic track audiences interest in exploring innovative approaches to open-source technologys use to address environmental and social challenges. Therefore, this papers findings and implementation approach could be of significant interest to the foss4g academic community.

1 - https://opensensemap.org
2 - https://github.com/sensebox/openSenseMap-API
3 - https://docs.opensensemap.org
4 - https://mybadges.org/public/start
5 - https://github.com/myBadges-org/badgr-server

UBT E / N209 - Floor 3
12:00
12:00
30min
SafoMeter - Assessing Safety in Public Spaces: The urban area of Prishtina
Gresa Neziri

As cities expose people to increasing threats, urban planning perspectives on safety remain on the periphery of urban design and policy. Public spaces cause different emotions in individuals, and the feeling of safety is the primary emotion that affects their well-being and behavior (Pánek, Pászto, & Marek, 2017). For this reason, an urban planning strategy should pay special attention to providing a safe environment, especially in public spaces.

Negotiating with the use of public spaces poses a more significant challenge for marginalized groups, especially for women in every social group for whom sexual harassment and other forms of gender-based violence in public areas are a daily occurrence in every city worldwide (UN Women, 2021). Nonetheless, there is a very limited amount of data that showcases the level of safety of site-specific public spaces, especially for cities in developing countries like Kosovo.

In this regard, aiming to contribute to the effort for developing a methodology for assessing site-specific safety in public space, we have developed SafoMeter. SafoMeter is a methodological framework for assessing safety in public spaces and their spatial distribution. SafoMeter's approach adheres to a human-centered approach that analyzes public spaces by looking closely at people's everyday experiences. Its framework is built by mediating indicators that assess both objective safety and subjective perceptions of safety.

The objective indicators for measuring safety fall into two broad categories: urban fabric and accessibility. Research on the relationship between the built environment and perceived safety highlights several physical components attributed to feelings of safety (UN-Habitat, 2020). In addition, spatial criteria/features used in previous research include urban structure and accessibility as two broad categories of spatial elements that positively or negatively affect people's sense of safety (Wojnarowska, 2016).

The subjective indicators for measuring emotional safety fall into the categories of threats and comfort. Contrary to conventional methods, the framework highlights the necessity for collecting data from the individual evaluation of perceived safety. Subjective evaluations of the users of public spaces are considered very important due to the low correlation between objective safety and subjective assessment of one's well-being, as shown in previous research (Von Wirth, Grêt-Regamey & Stauffacher, 2014).

The pilot location used for applying SafoMeter’s methodology to measure safety in public spaces was the urban area of Prishtina. The official population of the Municipality of Prishtina is about 200,000 inhabitants, of which almost 150,000 live in the city area. Being the capital city of Kosovo, the Municipality of Prishtina is the central city of significant political, economic, and social developments in the country.

The data for each indicator of the SafoMeter methodology were collected for a period of three months (July, August, and September 2022) at different hours during the day. Mergin Maps application was used via mobile phone to collect the field data recording both objective and subjective indicators. The data collection project was developed in QGIS, version 3.22.12 LTR, including 8 layers, one for each indicator. A hexagonal grid of 0.86 ha was used to aggregate data into a Safety Index. Furthermore, the results of the Safety Index were calculated and visualized via QGIS. A particular focus was drawn to visualizing unsafe hotspots in the city and showcasing their spatial distribution to inform citizens and decision-makers about spaces that need more urgent intervention.

For the Safety index with a scale from 0 (least safe) to 10 (most safe), all spaces evaluated in the study area result below half or with a maximum value of 5.57. Therefore, it can be concluded that the indicators measured in Prishtina point to an urgency for intervention, both in physical infrastructure and in terms of improving safety that comes as a threat from the human factor. Additionally, besides being very few, the areas considered safer within the city are not connected to each other, not allowing users to move safely from one place to another. Parks or green spaces, which are scarce spaces in Pristina, turn out to be amongst the main hotspots with the lowest score.

Applying the SafoMeter methodology generated valuable insights for assessing safety in the public spaces of Prishtina. The results of the pilot study reveal an urgent need for intervention. These findings suggest that policymakers and urban planners should prioritize the creation of safer public spaces in Prishtina and other cities facing similar challenges.

At the same time, a systematic safety assessment requires systematic year-round data collection processes to design effective area-based interventions and policies. Therefore, a more detailed, further data collection process should be established. In addition, this process should aim at increasing the number of participating citizens in evaluating the safety indicators. All the data collected via the SafoMeter framework will be published via a web-based platform where different user groups can use them. Finally, via SafoMeter, we aimed to provide a tool that can be replicated for further studies by other users shared according to the principles of open-source knowledge.

UBT E / N209 - Floor 3
13:30
13:30
30min
GEOSPATIAL BIG DATA ANALYTICS FOR SUSTAINABLE SMART CITIES
Muhammed Oguzhan Mete

Growing urbanization cause environmental problems such as vast amount of carbon emissions and pollution all over the world.
Smart Infrastructure and Smart Environment are two significant components of the smart city paradigm that can create opportunities for ensuring energy conservation, preventing ecological degradation, and using renewable energy sources. United Nations Sustainable Development Goals (SDGs) such as “Sustainable Cities and Communities”, “Accessible and Clean Energy”, “Industry, Innovation and Infrastructure”, and “Climate Action” can be achieved by implementing the smart city concept efficiently. Since a great portion of the data contains location information, geospatial intelligence is a key technology for sustainable smart cities. We need a holistic framework for the smart governance of cities by utilizing key technological drivers such as big data, Geographic Information Systems (GIS), cloud computing, Internet of Things (IoT). Geospatial Big Data applications offer predictive data science tools such as grid computing and parallel computing for efficient and fast processing to build a sustainable smart city ecosystem.

Handling geospatial big data for sustainable smart cities is crucial since smart city services rely heavily on location-based data. Effective management of big data in storage, visualization, analytics, and analysis stages can foster green building, green energy, and net zero targets of countries. Geospatial data science ecosystem has many powerful open source software tools. According to the vision of PANGEO, a community of scientists and software developers working on big data software tools and customized environments, parallel computing systems have the ability to scale up analysis on geospatial big data platforms which is key for ocean, atmosphere, land, and climate applications. Those systems allow users to deploy clusters of compute nodes for big data processing. In the application phase of this study, Pandas, GeoPandas, Dask, Dask-GeoPandas, and Apache Sedona libraries are used in Python Jupyter Notebook environment. In this context, we carried out a performance comparison of two cluster computing systems: Dask-GeoPandas and Apache Sedona. We also investigated the performance of the novel geospatial data format GeoParquet together with other well-known formats.

There is a common vision, policy recommendations, and industry-wide actions to achieve the 2050 net zero carbon emission scenario in the United Kingdom. The energy efficiency of the English housing stock has continued to increase over the last decade. However, there is a need for systematic action plans in parcel scale to deliver on targets. In the study, open data sources are used such as Energy Performance Certificates (EPC) data of England and Wales, Ordnance Survey (OS) Open Unique Property Reference Number (UPRN), and OS Building (OS Open Map) for analysing energy efficiency level of domestic buildings. Firstly, EPC data is downloaded from Department for Levelling Up, Housing & Communities data service in Comma Separated Value (CSV), UPRN data from OS Open Hub in GeoPackage (GPKG), and buildings data from OS in GPKG formats. After saving each file in GeoParquet format, EPC data and UPRN point vector data are joined based on the unique UPRN id. Then each UPRN data attribute is appended to the relative building polygon by conducting spatial join operation. Read, write, and spatial join operations are both conducted on Dask-GeoPandas and Apache Sedona in order to compare the performances of the two big spatial data frameworks.

Cluster computing system enables much faster data handling when compared with the traditional approaches. Comparing performances of the frameworks, local computing hardware (11th Gen Intel Core i7-11800H 2.30 GHz CPU, 64 GB 3200 MHz DDR4 RAM) is used. According to the results, Dask-GeoPandas and Apache Sedona prevailed GeoPandas in read, write, and spatial join operations. Apache Sedona performed better during the performance tests. On the other hand, GeoParquet file format was much faster and smaller in size when compared with the GPKG data format. After spatial join operation, energy performance attributes are included in building data. In order to reveal regional energy efficiency patterns, SQL statements are used for filtering the data according to the energy rates. The query result is visualized using Datashader which provides highly optimized rendering with distributed systems.

This study answers the question “Can geospatial big data analytics tools foster sustainable smart cities?”. Volume, value, variety, velocity, and veracity of big data require different approaches than traditional data handling procedures in order to reveal patterns, trends, and relationships. Using spatial cluster computing systems for large-scale data enables effective urban management in the context of smart cities. On the other hand, energy policies and action plans such as decarbonization, and net zero targets can be achieved by sustainable smart cities supported by geospatial big data instruments. The study aims to reveal the potential of big data analytics in the establishment of smart infrastructure and smart buildings using large-scale geospatial datasets on state-of-the-art cluster computing systems. In future studies, larger spatial datasets like Planet OSM can be used on cloud-native platforms to test the capabilities of the geospatial big data tools.

UBT E / N209 - Floor 3
14:00
14:00
30min
The Role of 3D City Model Data as an Open Digital Commons: A Case Study of Openness in Japan's Digital Twin "Project PLATEAU”
Toshikazu Seto

This paper aims to clarify the state of development of highly accurate and open 3D city model data and its usage methods, which started in Japan in 2020, from two aspects: quantitative geospatial analysis using publicly available data, and qualitative evaluation analysis of 40 use cases using the data.

As a background to this study, digital twins, which are virtual replicas of the physical urban built environment (Shahat et al., 2021), are gaining global attention with the development of geospatial information technology to understand current conditions and plan future scenarios in cities (Lei et al., 2022). This trend can be applied in areas related to a wide range of urban issues, such as urban development, disaster prevention, and environmental and energy simulation, and has the potential to be used for urban planning through an intuitive approach via various GIS tools. On the other hand, the geospatial information required by the digital twin also needs to be accompanied by three-dimensional shape information and many attribute information of building units. Data development and related research using CityGML (Kolbe et al., 2021), a representative standard specification, has mostly been carried out in European and US cities, and there have been few efforts in Asia (https://github.com/OloOcki/awesome-citygml).

In Japan, urban planning has mainly been carried out using analogue methods such as paper maps and window services. However, as citizens' lifestyles and socio-economic systems are drastically changing due to the high interest in smart cities and the spread of COVID-19 infection, urban policies such as disaster prevention and urban development using digital technology are becoming increasingly important. The digital transformation of urban policies such as disaster prevention and urban development using digital technology has become an urgent issue. “Project PLATEAU (https://www.mlit.go.jp/plateau/)” is a project initiated in 2020 under the leadership of the Ministry of Land, Infrastructure, Transport and Tourism (MLIT) to develop a high-precision 1:2500-level 3D city model CityGML format in a unified manner and convert it into open data format via CKAN's data portal (CityGML, 3D Tiles, GeoJSON, MVT, ESRI Shapefile), to develop an open-source data viewer and to explore use cases.

This study details the history of the "Project PLATEAU" initiative and discusses the relationship between openness and urban data commons. Many of the data specifications, converters and online viewers are closely related to FOSS4G. Next, data for 126 cities in Japan (about 19,000 square kilometers) developed as open data over a three-year period are regionally aggregated and then quantitatively compared with OSM building data in Japan. Trends such as coverage rates between cities and micro-regional analysis within Tokyo are then attempted. To analyze a large amount of data, this part was carried out using data converted to FlatGeobuf format.

Some of the results of the data preparation analysis are as follows: The basic analysis of the cities covered by PLATEAU showed that the total number of buildings in LOD1 was about 15.7 million, with a population coverage of about 38.4%. These cities have shown an increasing trend in population over the last five years (an average of about +10,000 for the 126 cities). By comparison, the total number of OSM buildings in the country is about 12.7 million, generally widely distributed across the country's 1903 administrative districts (about 38,000 square kilometers). Therefore, only the cities maintained by PLATEAU provide data with a higher level of detail than OSM. However, the detailed LOD2 building data with roof shape is limited to about 480,000 buildings (about 300 square kilometers in 97 cities nationwide), which are high-rise buildings and landmarks in large cities.

To identify more micro trends, we compared the accuracy of the building data for central Tokyo, which has the largest number of units in both datasets, in 2020, the year the PLATEAU data was created. The number of units in each building dataset is OSM (726,685 units) and PLATEAU (1,768,868 units). When PLATEAU is used as the base data, the coverage of OSM is about 40%. On the other hand, of the 3190 city blocks in central Tokyo, 502 (about 15.7%) were identified as having more OSM buildings than PLATEAU. As a factor contributing to this discrepancy, a historical analysis of the timestamps and versions of the building data (about 80,000 units) that exist only in OSM revealed that most of them were created more than two years before the PLATEAU data and have never been updated. Therefore, the PLATEAU data should be updated to keep the data fresh, even in areas where OSM data are already widely distributed, if only data older than 2020 are maintained.

For example, open 3D urban model data for cities of various sizes have been released in Japan, and they are highly accurate and complementary to OSM data. In addition, these data have begun to be used in administrative practice, and a total of 44 applications in new areas such as citizen participation and entertainment (especially services using XR) have been identified. The evaluation of the exploitation methods is explained in the paper, but the cases related to smart cities and disaster prevention are particularly striking. The issues to be addressed in these efforts are the nationalization of the scope of maintenance, the organic merging with open data as represented by OSM, and the further GIS education in the field of urban planning. Finally, as data contributing to the reproducibility of this study, the data sources used in the analysis are themselves open data and thus readily available. Therefore, we plan to provide a download list of each data source and GIS data summarizing the tabulation results as open data on Github.

UBT E / N209 - Floor 3
14:30
14:30
30min
An interoperable Digital Twin to simulate spatio-temporal photovoltaic power output and grid congestion at neighbourhood and city levels in Luxembourg
Ulrich Leopold

Background

Cities are home to 72% of the population in Europe and account for 70% of the energy consumption. Being particularly vulnerable to climate change impacts, urban areas play a key role in carbon mitigation and energy transition. It is, therefore, of particular importance to increase renewable energy production for urban areas and cities.

Cities urgently require information about their potential for renewable energy production to target ultra-sustainable policies. Luxembourg has set very ambitious goals with its Plan National Intégré Énergie Climat (PNEC). It describes policies and measures to achieve ambitious national targets for reducing greenhouse gas emissions (-55%) as well as pushing renewable energy production (+25%) and energy efficiency (+40-44%) by 2030.

Public authorities often lack the expertise in integrated assessment and relevant simulation tools to support scientific evidence-based decisions about energy strategies, enhance interaction with stakeholders and accelerate energy transition. The main outputs of this study are related to the demonstration of the role of interoperable geographical digital twins based on Free and Open-Source software and geospatial software technologies in the simulation, monitoring and management of the current and near future renewable-based energy systems.

Approach and Concept

The scope of the presented work is to demonstrate the role of a 3D geographical urban digital twin in the context of a high penetration and optimised installation of PV and the impact of its power generation on the grid. Free and Open-Source software technologies build the basis of a web platform which implements a geographical digital twin based on open data, open OGC standards to build a fully interoperable Digital Twin. This allows the integration of open 3D CityGML data with simulation algorithms of renewable potentials and the energy grid system all into one technical interoperable architecture.

The objective of this study is to simulate the potential for building integrated and building attached solar photovoltaic (PV) electricity generation in use case cities, and later to scale up the results to a nationwide level. The approach taken involves several key steps:

  1. Estimation of electricity consumption of the building stock at local level, in order to understand the demand for electricity and the potential for PV generation.

  2. Simulation of the electricity generation potential of building-integrated and building-attached PV systems, considering factors such as rooftop and facade orientation and shading effects.

  3. Development and analysis of scenarios for different PV technologies, including consideration of techno-economic parameters such as feed-in-tariffs, lifetime of installation, efficiency, and power consumption.

  4. Selection of optimal locations for PV placement across the city, based on a combination of rooftop and facade suitability, electricity demand, and electricity grid nodes.

  5. Implement all steps into an interoperable web-based decision support platform providing advanced simulation and assessment tools using high resolution open building information.

Results

Geospatial software technologies and 3D and 4D algorithms are building the core of the platform (based on iGuess®) to enable the planning of PV electricity generation from the local to the national scale. Global solar irradiation is simulated for each roof-top and façade at a very high resolution, taking into account 3D shading effects of the surroundings in the urban environment. Scenarios for different PV technologies, feed-in tariffs and cost efficiencies and amounts of PV installations are computed to show impacts of spatio-temporal differing PV generation. This simulates the large increase of PV installations required to accelerate the development of sustainable energy and climate action plans (SECAPs) for all municipalities in Luxembourg and the entire nation.

The developed platform serves multiple beneficiaries, e.g., Municipalities, urban planners etc. to support 3D based realistic urban energy planning. Citizens and energy communities can help shape their city and get access to high resolution information. This platform provides a tool for estimating long- as well as short- and mid-term PV power generation at high resolution across entire neighbourhoods and districts generating time-series data.

Furthermore, we have implemented tools for the identification of cost-efficient PV placement/integration in buildings on roof-tops and facades to test the different scenarios and allow for interactive selection for optimal PV placement identification across the study area.

Conclusions

This paper presents the importance of geographical digital twins providing the core platform for the current energy transition from fossil fuels to renewables. The advantage of an interoperable geographical urban digital twin, as proposed here, provides the flexibility necessary to simulate and test scenarios for rapid, integrated urban planning under climate change. Based on open-source, open standards and open APIs, open data, simulation and assessment methods and tools can be seamlessly integrated to provide a 3D real world environment to assess and develop energy transition approaches. Different stakeholders, such as citizens, municipalities and businesses can act and be stimulated to enable a faster transition to renewable energy and harvest the full potential of improved urban planning based on geographical Digital Twin technologies.

UBT E / N209 - Floor 3
15:00
15:00
5min
UAVIMALS: the "open" remote sensing system for surface archaeological investigations.
Federica Vacatello

Today, there is a growing use of airborne sensors in archaeology, especially to investigate the surface of vast territories quickly and accurately. Airborne laser scanning technologies from small remotely piloted aircraft are rapidly turning towards more and more performing solutions for the investigation of archaeological traces hidden by vegetation or soil deposits substantial. The proposed contribution aims to fit into this field of archaeological research by presenting "UAVIMALS" (Unmanned Aerial Vehicle Integrated with Micro Airborne Laser Scanner), a new system of aerial remote sensing of "shadow marks” (Masini – Lasaponara 2017; p. 32) designed for surface archaeological investigations, the result of an Early Carrer Grant funded by the National Geographic Society. The system, consisting of a custom drone based on an open architecture and software for vehicle control and data processing, integrates a solid-state laser sensor, commonly designed to avoid obstacles, but here exploited to process a DTM (Digital Terrain Model) accurate of small surfaces with a significant reduction in acquisition time and cost. The ambition of the UAVIMALS project was not to create an airborne LIDAR at low cost and less performing than those already on the market, but rather to create an instrument of easy transportability, less expensive and equally precise. We believe the solution represents a breakthrough in research into airborne laser scanner technologies.
The acquisition of three-dimensional images at very high morphometric resolution, has proved to be a fundamental practice for the study of various contexts of our planet, but in the archaeological field, in particular, drone remote sensing is an extremely important practice for the investigation of ancient structures, sometimes still unexplored, not otherwise searchable by other means, such as excavation and reconnaissance activities, due to uncomfortable geomorphological conditions, places of difficult access and traces invisible to the human eye at short distances and in particular climatic conditions (Štular- Eichert- Lozić 2021). Nevertheless, most of the instruments currently on the market still have prohibitive costs for archaeological research, as well as unfavourable dimensions to meet transport needs in inaccessible places in the absence of transport. The realization of the system presented has tried to overcome these critical issues by working on the hardware solution best suited to the needs of an investigation of aerial archaeology, by using a type of lidar sensor never used for remote sensing by drone. The instrument, with its low cost and dimensions, was born as a system for autonomous driving on road vehicles (https://leddarsensor.com/solutions/m16-multi-segment-sensor-module/) and was customized on a self-built drone to obtain a prototype of the 'very light' class. Following the experimentation in two different archaeological contexts, the work continued with the resolution of the second criticality, that is the creation of a software useful for the control of the medium in phase of flight but also able to monitor the acquired data finalizing a first graphical elaboration. Currently, in fact, it is possible to work the clouds of lidar data points only through dedicated software (Cloude Compare; 3D Zephyr; QGIS etc.) that not being connected with the drone, do not allow a real-time visualization of what is seen by the sensor and prevent a preliminary first monitoring of any archaeological presence hidden in the overflight area. The DEM, meshes and clouds of points obtained from the sensor can then be loaded into geospatial software such as QGIS, allowing spatial, territorial, and geomorphological analysis of the data acquired using specific tools. If for other contexts of application such activity may be superfluous, in the archaeological field, a system like the one thought can represent a concrete possibility of widening of the archaeological investigations that in such a way would be speeded up by a tool of observation as well as facilitated by a cost widely accessible to the funds given to the research university. The system, moreover, would allow to speed up also the preliminary archaeological operations preliminary to the realization of any public work, through an immediate verification of the possible archaeological presence in the areas affected by the operations, thus avoiding costly design changes in the process. The proposed contribution, therefore, would present not only the hardware and software solution developed, but also the preliminary results obtained from its application in the archaeological context of Leopoli - Cencelle, a medieval city, about 60 km north of Rome, where critical issues such as the extent of the site, the presence of large elevation changes and dense vegetation have always complicated the excavation activities on the hill, still leaving much of the unexplored city. In this context, in fact, remote sensing by drone, has proved to be an effective method for the investigation of ancient structures with a different degree of archaeological visibility in which the evidence is not yet completely above ground and are obliterated by high and medium stem vegetation. The examination, although the result of an experimental activity, not only made it possible to identify anomalies relating to structures not yet intercepted by the excavation operations but also encouraged the planning of future investigation campaigns, allowing a more conscious planning of the areas of interest.

UBT E / N209 - Floor 3
15:05
15:05
5min
Agro-tourism impact analysis of climate change using Google Earth Engine in the Rahovec wine region of Kosovo.
Dustin Sanchez

This project conducts a statistical model using the Mann-Kendell and a Sens Slope on the completed MODIS LST mission data for analysis of climate change thermal shifts across the Republic of Kosovo. This project leverages Google Earth Engine open data to build the statistical models that are extracted and analyzed in Q-GIS. This approach uses non-parametric statistical timeseries analysis for completed MODIS LST mission data to analyze and understand day and night land surface temperature shifts over different temporal periods to gather an understanding of the current and project the future expected impacts of climate change on various developing tourist economies in the Republic of Kosovo.
Water balance is utilized as a function of understanding the impacts of climate change on wine grape capacity and the attempt to functionally understand the future disruptions of climate change through linear geographic regressions. These regressions will guide the understanding of the climate changes that are occurring with the country and provide a basis for analysis to develop resilience methods. This model will be broken down into viticultural regionality to understand the dynamism of the impacts across the country. The two data sets used will be MODIS land surface temperature and Tropical Rainfall Measuring Mission which will create an understanding of surface temperature shifts within Kosovo and water balance shifts that are occurring due to climate change in Kosovo. The datasets also be correlated between each other using a Pearson’s correlation coefficient to understand if a relationship exists between land surface temperature and water balance within wine region of Kosovo.
The findings of this project will reveal geographic dispersion of anomalous rain patterns and long-term temperature shifts occurring that can have disruptive impacts on agricultural production of grapes. The results will provide insights based on known geographic extent of wine grape region to determine the significant temperature changes occurring over the past 20 years and the trends for both Day and Night LST within the Republic of Kosovo. Further, the analysis will seek to develop an understanding on the immediate to long-term impacts based on the satellite data trends. Water balance data analysis will provide precipitation shifts that are occurring on a monthly basis and can be assessed with Land Surface Temperature as a means of understanding areas that are susceptible to flood-based natural hazards and amplification through increased temperatures and loss of water balance. The connection between the two can be assessed to understand systemic vulnerabilities occurring within regions that require environmental quality for success.
Additionally, the project is a novel framework for timeseries analysis of bigdata to provide insights into climate change impacts on the economies of the developing world. The analysis will focus on the geographic dispersion of touristic economy assets that are being built and improve the use of big data approaches to derive an understanding of temperature changes in data poor environments. The results of this paper will leverage open datasets, an analysis of the impacts of temperature changes on the developing tourist economy in the Republic of Kosovo, and the knowledge of capacity for leveraging large geographic datasets for open climate change research.
The use of big data and open modeling provides a considerable resource for governments, municipalities, and NGOs to develop an understanding of how climate change will impact their communities. The paper discusses the statistical concepts used on the MODIS complete dataset and interpretations of the results. The major concepts approached are the use of Google Earth Engine utilization for modeling remote sensed data to understand the environmental conditions being caused by climate change. The underlying data analysis and implications draw connections within local conditions and how human environmental conditions are impacted for wine tourism development. This paper does not assess the loss of economic value but rather interpret the data to understand the positionality of the underlying environmental commodity conditions such as snowpack and grape vine stock. We discuss an analysis utilization within a novel framework for open-source climate intelligence building for those regions without the resources for pay-to-use products and data. This paper will build an understanding of methodological analysis approaches with multiple models to develop and deliver products capable of informing national and regional climate adaptation strategies in both the long and short-term.
The Republic of Kosovo is working towards developing many touristic economic sectors that are heavily reliant on climate including the wine region of Rahovec and Prizren, both of which face tremendous uncertainty in the face of climate change. Development of tools and technics to display the capabilities of open big data analysis and provide vital analysis into the impacts of climate change. We seek to explore the capability for utilization of open-source learning tools, to build open-data models capable of providing vital insights into the impacts of climate change in countries that have the least resources and the most risk.

UBT E / N209 - Floor 3
15:10
15:10
5min
A free and open-access GIS for the documentation and monitoring of urban transformations in the area of the Expo 2015 exhibition in Milan
Federica Gaspari

This work is based on the design and development of a system aimed at monitoring the urban transformations of the area used for the Expo 2015 exhibition in Milano, exploiting the potential offered by the storage and management of geographic data in a GIS environment (Burrough, 1986). The system is designed to collect and analyze data showing the changes of the urban landscape going through pre-Expo, to Expo and post-Expo transformations (Gaeta & Di Vita, 2021).

One of the reasons behind this work is the fact that a complete digital database documenting the urban transformations in the Expo 2015 area is not yet available. In fact, all the data which were needed to implement the GIS were originally represented by maps (in paper or in digital non georeferenced format) of development projects and by cartographic work attached to city plans. After checking the compatibility of the process with the original licenses, the maps had been made openly accessible to the public after being scanned, so they had to be geo-referenced and vectorized in order to be able to insert the data in the GIS database.

The implementation and use of GIS technology implied (i) the definition of the database conceptual and logical model; (ii) the acquisition of a large number of geographic data layers, which were structured according to the design of a relational database. Layers which were acquired included data on: cadastral parcels; buildings; players involved in the urban transformations; land regulations; open spaces; land cover; functional lots; public transport stops; roads and underground utility lines.

The structure of the DB has been designed based on a relational model (Codd, 1970) by following the standard methodology defined in 1975 by the ANSI - SPARC Committee, going through successive phases and originating the external, conceptual and logical models. Following this strategy, the external model was defined on what were assumed to be the future users’ needs in terms of data storage, consultation and queries on the data. Aiming at documenting also the timeline of the urban transformations of the area, the Entity Relationship Diagram (ERD) was designed integrating in a unique conceptual scheme the temporal dimension of the transformations, going from the pre-Expo, to the Expo which took place in Milano from May to October 2015 and finally the post-Expo layout of the area. Subsequently, the logical model of the database was also designed.

The data acquisition required to research a large number of sources, which were mainly represented by images of maps available online on the websites of the different stakeholders, ranging from public administration channels and OpenStreetMap crowdsourced geodata to official Expo 2015 communication platforms. They were then geo-referenced in order to acquire spatial elements in vector format which were afterwards stored in the spatial database of the GIS, becoming easily manageable and upgradeable in an interactive way. Notably, the topological models of the streets and of the underground network of the district heating were implemented, in the latter case also connecting each building with the corresponding segment of the network (Cazzaniga et al., 2013). Finally, the topological consistency and coherence of such network and its components was validated.

The application of GIS technologies to monitor the transformations of the entire site allowed to understand and analyze the different phases of the evolution of the urban territory, identifying critical issues and strengths of the development projects. Indeed, in the GIS environment it is now possible to perform reproducible elaborations and analysis useful to understand how the area changed in time, especially from an urban planning point of view. This approach can provide insights on the surface covered by buildings in the different periods and on the change of destination or decommissioning of exhibition pavilions in the post-EXPO environment. Moreover, the database model allows users to query the data in order to identify underground services as well as buildings that may be affected by future works on roads or structures located in the area of interest. Such functionalities and retrieved information could be crucial especially considering the recent construction of a critical structure like the new Galeazzi hospital, which has been operative since 2022. Finally, the possibility to present the project, the data and its related metadata and to communicate them also to a wider audience of non-technical users was envisaged through the publication of a WebGIS-on the Internet, which was tested with a demo. In future, by implementing further improvements, this prototype could lead to a decision support system, to be used as a tool to understand the area for the benefit of all actors involved with different expertise and background in the urban transformations. In particular, the choice of the web platform was driven by the possibility to make the project as accessible as possible also through expandable tools in support of geo-narratives and storytelling as well as easy-to-understand dashboards for visualizing quantitative analysis results.

The whole project has been developed by using free and open-source technologies, namely MySQL Workbench for the development of the database model, QGIS for the implementation of the system and GeoNode for the testing of the publication of the System on the Internet. The choice to use free and open-source technologies is both an economical and ethical solution aimed at knowledge sharing and at making the DB flexible and easily expandable, facilitating the integration of new data, their updating and the implementation of future functionalities, paying attention also to the technical accessibility even by non-expert users.

UBT E / N209 - Floor 3
15:15
15:15
5min
TOWARDS A PAN-EU BUILDING FOOTPRINT MAP BASED ON THE HIERARCHICAL CONFLATION OF OPEN DATASETS: THE DIGITAL BUILDING STOCK MODEL - DBSM
Pietro Florio

Currently, a reliable harmonized and comprehensive pan-EU map of the building stock provided in vector format is not publicly available, not even for a level-of-detail LOD0 (according to the CityGML standard), where the buildings’ footprints can be identified.
European countries offer vector maps of their building stock through a variety of levels of detail, formats, and tools; data across countries is often heterogeneous in terms of attributes, accuracy and temporal coverage, available through different user interfaces, or hardly accessible due to language barriers. Bottom-up solutions from local cadastral data in the framework of the INSPIRE initiative and top-down standard-setting regulations like the EU Regulation 2023/138 laying down a list of specific high-value datasets and the arrangements for their publication and re-use [1], are increasing and improving the homogeneity in the data availability.
However, crowd-sourced providers of building footprint vectors like OpenStreetMap (www.openstreetmap.org) are covering an increasing fraction of territory within the European Union. Simultaneously, improvements in remote sensing increased the resolution of satellite imagery and allowed for building footprints segmentation on very high-resolution images based on deep learning: major stakeholders in the field of information technology were able to disseminate large vector datasets with extensive territorial coverage publicly (like Microsoft and Google). Other research institutions released grid-based maps of built-up, covering the world at the resolution of 10 metres (like the Built-Up Surface of the Global Human Settlement Layer) or Europe at the resolution of 2 metres (like the European Settlement Map). Another project called EUBUCCO [2] has compiled a vector database of individual building footprints for 200+ million buildings across the 27 European Union countries and Switzerland, by merging 50 open government datasets and OpenStreetMap, which have been collected, harmonized and partly validated.
The methodology presented here provides a replicable workflow for generating seamless building datasets for each of the EU-27 countries, by combining the best available public datasets.
After reviewing existing literature and assessing publicly available buildings data sources, the following were identified as core input datasets:
• OpenStreetMap (OSM): a free and open-source global dataset of geographic features, including building footprints and attributes;
• Microsoft Buildings (MSB): a freely available dataset of building footprints developed by Microsoft using machine learning algorithm on very high-resolution satellite imagery [3];
• European Settlement Map (ESM): raster dataset of built-up areas classified using Convolutional Neural Networks from 2-meter spatial resolution from very high-resolution imagery available through Copernicus [4].
Building footprints are available in OpenStreetMap across all 27 countries, but with different levels of completeness and coverage. Human contributors trace data in OSM manually, thus the available building footprints are considered of higher geometric quality compared to those extracted by machine learning algorithms of the MSB and ESM datasets. Microsoft provides high resolution building footprints for all 27 countries, but their coverage within the country areas varies considerably. The ESM dataset was derived from a seamless mosaic covering the entire EU-27 area, so it is considered being the most complete in terms of coverage, although the lower resolution and quality does not allow for extracting detailed building footprints as available with OSM and MSB.
The combination of the above-listed dataset is carried out with a stepwise approach. First, the MSB dataset is compared to OSM, and buildings are selected for any area where they don’t overlap or intersect. MSB buildings below 40 m2 of surface are filtered out as outliers. Then, the ESM data is compared to the combined OSM and MSB buildings and vectorised, to fill in any gap that is not covered by the latter. Building footprints issued from ESM are further refined with various geo-spatial post-processing operations (e.g., buffer, holes filling, …), then filtered to retain only features above 100 m2 of surface, thus discarding outliers.
To implement and automate the described logical workflow, an interactive model has been developed to work in the popular QGIS desktop software. The QGIS model builder allows for building logical processing workflows by linking input data forms, variables and all the analysis functions available in the software.
The conflation process is conducted at the country level since OSM and MSB sources are already conveniently provided in country extent packages. Depending on the geographic size of each country and the amount of data included, some countries are further split into tiles for processing. The resulting building footprints from each input dataset are kept in separate files for easier handling, but can be combined visually in GIS software or physically merged in a single file.
There are several known limitations to the data and the processing workflow:
• Many MSB building footprints present irregular geometries that are caused by faulty image interpretation. These can be filtered by calculating the vertex angle values of each polygon and removing specific outlier values. A methodology was developed at small scale, but it was not possible to implement it at country scale yet.
• The ESM geometries do not accurately describe the actual building footprints but only the rough block outline. While ESM has seamless coverage, its best application would be for guiding additional feature extraction from VHR imagery in areas where OSM and MSB have poor coverage.
• The default overlap settings could be tweaked and dynamically adjusted, based on the built-up pattern (e.g., less in urban areas, more in rural areas).
• Filters of minimum feature size of 40 m2 for MSB and 100 m2 for ESM can be optimised to find the most robust balance between including non-building features and actual smaller buildings.
The resulting buildings dataset is compared with the European Commission’s GHSL Built-up surface layer [5] to get an understanding of the respective coverage at pan European level. A more focused look into the comparison with available cadastral data for a particular city, provides a preliminary understanding of the accuracy of the new layer along with its limitations.

UBT E / N209 - Floor 3
15:20
15:20
5min
Validating the European Ground Motion Service: An Assessment of Measurement Point Density
Joan Sala Calero, Joan Sala Calero

The European Ground Motion Service (EGMS) constitutes the first application of high-resolution monitoring of ground deformation for the Copernicus Participating States. It provides valuable information on geohazards and human-induced deformation thanks to the interferometric analysis of Sentinel-1 radar images. This challenging initiative constitutes the first ground motion public dataset, open and available for various applications and studies.

The subject of this abstract is to validate the EGMS product in terms of spatial coverage and density of measurement points. A total of twelve sites have been selected for this activity, covering various areas of Europe, as well as representing equally the EGMS data processing entities. To measure the quality of the point density we employ open land cover data to evaluate the density per class. Furthermore, we propose statistical parameters associated with the data processing and timeseries estimation to ensure they are consistent across the different selected sites.

The usability criteria to be evaluated concern the completeness of the product, its consistency, and the pointwise quality measures. Ensuring the completeness and consistency of the EGMS product is essential to its effective use. To achieve completeness, it is important to ensure that the data gaps and density measurements are consistent with the land cover classes that are prone to landscape variation. Consistency is also vital for point density across the same land cover class for different regions. For instance, urban classes will have higher density than farming grounds, and this density should be consistent between the ascending and descending products. Pointwise quality measures are critical in assessing the quality of the EGMS PSI results. For example, the temporal coherence is expected to be higher in urban classes, and the root-mean-square error should be lower. Overall, these measures and standards are crucial in ensuring the usefulness and reliability of the EGMS product for a wide range of applications, including environmental management, urban planning, and disaster response.

For the validation of point density, a dataset of 12 selected sites across Europe is used, representing the four processing entities (TRE Altamira, GAF, e-GEOSS, NORCE). The aim of the point density validation activity is to ensure consistency across the EU territories by comparing the point density at three sites for each algorithm, one of which is in a rural mountainous area and the other two are urban. The dataset is obtained directly from the Copernicus Land – Urban Atlas 2018 and contains validated Urban Atlas data with the different land cover classes polygons, along with metadata and quality information. We have extensive Urban Atlas (version 2018) verified datasets on the cities of Barcelona/Bucharest (covered by TRE Altamira), Bologna/Sofia (covered by e-GEOSS), Stockholm/Warsaw (covered by NORCE) and Brussels/Bratislava (covered by GAF). In parallel we select four different rural and mountainous areas to analyse more challenging scenarios as well for the four processing chains of the providers.

There are 27 different land cover classes defined in Urban Atlas. To facilitate the analysis and the interpretation of the results, we aggregate and present our findings for each of the main CLC groups: Artificial Surfaces, Forest and seminatural areas, Agricultural areas, Wetlands and Water bodies.

For the validation measures, key performance indices (KPI) are calculated, with values between 0 and 1. We normalise the estimated density values for each service provider with respect to the highest value for Artificial surfaces, Agricultural areas and Forest and seminatural areas. Users expect consistent and good densities in these classes, specifically in the Artificial surfaces. And the lowest value for Wetlands and Water bodies. This will enable outlier detection since the applied algorithms should barely produce any measurement points on these surfaces.

Regarding the pre-processing of the data from EGMS, one of the challenges was the overlapping of bursts from different Sentinel-1 satellite tracks. If all bursts were included in the analysis, areas with more track overlaps would result in a higher point density, creating a bias in the data. To address this issue, a custom algorithm was designed to identify and extract the unique, non-overlapping polygon for each burst. This iterative algorithm was specifically designed to ensure a fair comparison among different areas, and to eliminate any biases that could impact the results of the analysis.

In conclusion, as an open and freely available dataset, the EGMS will provide valuable resources for a wide range of applications and studies, including those that leverage free and open-source software for geospatial analysis. The validation results presented here will help to ensure the accuracy and reliability of the EGMS product, thereby enabling further research and applications in areas such as geohazards, environmental monitoring, and infrastructure management.

References

Costantini, M., Minati, F., Trillo, F., Ferretti, A., Novali, F., Passera, E., Dehls, J., Larsen, Y., Marinkovic, P., Eineder, M. and Brcic, R., 2021, July. European ground motion service (EGMS). In 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS (pp. 3293-3296). IEEE.

Atlas, U., 2018. Copernicus Land Monitoring Service. European Environment Agency: Copenhagen, Denmark.

UBT E / N209 - Floor 3
16:00
16:00
30min
OpenStreetMap as an input source for producing governmental datasets: the case of the Italian Military Geographic Institute
Marco Minghini, Alessandro Sarretta

The collection, curation and publication of geospatial information has been for centuries the sole prerogative of public sector organisations. Such data has been traditionally considered the reference source for datasets and cartographic outputs. However, new geospatial data sources (e.g. from the private sector and citizen-generated[1]) have emerged that are currently challenging the role of the public sector [2]. In response to this, governments are currently exploring new ways of managing the creation and update of their geospatial datasets [3].
Datasets of high relevance are increasingly produced by both private companies and crowdsourced initiatives. E.g., in 2022 Microsoft released Microsoft Building Footprints, a dataset of around 1 billion building footprints extracted from Bing Maps imagery from 2014 to 2022. More recently, in December 2022n Amazon Web Services (AWS), Meta, Microsoft, and TomTom founded the Overture Maps Foundation (https://www.linuxfoundation.org/press/linux-foundation-announces-overture-maps-foundation-to-build-interoperable-open-map-data), a joint initiative in partnership with the Linux Foundation with the aim to curate and release worldwide map data from the aggregation of multiple input sources including civic organisations and open data sources, especially OpenStreetMap data.
These initiatives aim to improve the coverage of existing governmental geospatial information through the release of open data and a strong dependency on OpenStreetMap. In particular, the Overture initiative has the explicit goal to add quality checks, data integration, and alignment of schemas to OSM data.
Recently, the Italian Military Geographic Institute (IGM, one of the governmental mapping agencies in Italy) has released a multi-layer dataset called “Database di Sintesi Nazionale” (DBSN, https://www.igmi.org/en/dbsn-database-di-sintesi-nazionale). The DBSN is intended to include geospatial information relevant to analysis and representation at the national level, with the additional purpose to derive maps at the scale 1:25,000 through automatic procedures. The creation of the DBSN builds on top of various information sources, with regional geotopographic data as primary source of information and products from other national public bodies (e.g. cadastral maps) as additional sources. The source is recorded in a specific attribute field for each feature in the database, with a list of codes referencing the various sources. Among the external sources used as input for the work of integration in the DBSN, OpenStreetMap was explicitly considered and used.
One of the elements of novelty, at least in the Italian context, is the release of the DBSN under the ODbL licence (https://opendatacommons.org/licenses/odbl), caused by the fact that the inclusion of OSM data requires derivative products to be released with the same licence.
Currently, the DBSN includes data covering only 12 out of the 20 Italian regions (Abruzzo, Basilicata, Calabria, Campania, Lazio, Marche, Molise, Puglia, Sardegna, Sicilia, Toscana, Umbria). The remaining ones will be released in the near future.
The datasets have been downloaded from the official IGM website in January 2023.
The DBSN schema is a subset of the specifications defined in the "Catalogue of Spatial Data - Content Specifications for Geotopographic Databases” (Decrete 10 November 2011) and is composed of 10 layers, 29 themes and 91 classes. We compared it with the OpenStreetMap specifications (based on the community-based tagging scheme at https://wiki.openstreetmap.org/wiki/Map_Features) and selected two main themes (buildings and streets).
The analysis was performed through a set of Python scripts available under the open source WTFPL licence at https://github.com/napo/dbsnosmcompare.
Firstly, we analysed—for buildings and streets in the IGM database—where OSM data was used as the primary source of information. The percentage of buildings derived from OSM is minimal, ranging from 0.01% in Umbria to 1.3% in Marche; regarding streets, the differences between regions increase, ranging from almost 0% in Abruzzo and Calabria to 94% in Umbria.
Secondly, we calculated the area covered by buildings and the length of streets in both the IGM and OSM databases to understand how much OSM completeness is good, compared to the official IGM dataset.
In the 12 regions, the area covered by buildings in OSM is on average about 55% of the corresponding area in IGM, while the percentage of the length of streets is about 78%. Anyway, these numbers are highly variable among regions, ranging between 32% in Calabria and 105% in Puglia for buildings, and between 46% in Calabria and 103% in Umbria for streets.
These first results show that the main source information in the DBSN (namely the official regional data) is highly variable across the 12 regions, which required the IGM to find additional data sources to fill the gaps. OSM plays a minor role for integrating buildings in the database, while it demonstrates a high potential for contributing to street information.
Results also show that, even with only a small contribution, some elements that are present in OSM are still not included in the DBSN. This can be due to at least two reasons: (i) the current workflow of selection of elements in OSM (through tags) does not include some potentially relevant elements; ii) the (ideally) daily update of OSM is able to bring in the database new features at a pace that is unbeatable by the IGM, and governmental organisations in general.
While this study highlights the importance that OpenStreetMap has achieved as a reference source of geospatial information for governmental bodies as well, providing evidence of its contribution to the national database of the IGM, iit also paves the way for improving OpenStreetMap itself by importing data for various layers, benefiting from the release of the DBSN under the ODbL licence.

UBT E / N209 - Floor 3
16:30
16:30
30min
3D4DT: An Approach to Explore Decision Trees for Thematic Map Creation as an Interactive 3D Scene
Auriol Degbelo

Background & Problem: There are currently several software dedicated to the automatic creation of thematic maps. These can be proprietary (e.g., ArcGIS Online, Carto) or non-proprietary solutions (e.g., SDG Viz, AdaptiveMaps, the GAV Toolkit, the Geoviz Toolkit). An important drawback of these state-of-the-art solutions is that the expertise encapsulated in such software (e.g., enabling to choose a type of map or visual variables depending on the characteristics of the data contained in the map), is usually not well communicated to the user. That is, users can use these tools to create meaningful maps for their open geographic datasets but are offered little support regarding knowing why some suggestions of thematic map types were made (e.g., why a dot map is proposed by a toolkit instead of a choropleth map). Put simply, users get little insight into the decision processes of current tools/toolkits for thematic web creation.

Contributions & Target audience: To help users learn about the decision processes of software for automatic map creation, this work introduces the 3D4DT approach. The approach uses JSON (JavaScript Object Notation) as a machine-readable format to represent decision trees and subsequently maps JSON elements to user interface elements for an interactive 3D scene. The contributions of the work are twofold: 1) a controlled vocabulary to support the creation of machine-readable descriptions for decision trees of the Cartography literature; and 2) an approach to navigate these decision trees as interactive scenes in 3D. The approach is implemented as an open-source prototype. It is relevant to both developers and users of software for automatic thematic map creation. The controlled vocabulary is relevant to developers, who can encode the decision trees underlying their software as machine-readable data, and make the ‘brain’ of their software available for reuse in multiple use cases. The exploration of the decision trees as an interactive scene is relevant to users who can retrieve information about the inner workings of software for map creation in an interactive format.

Implementation: The prototype is available as a web-based application on GitHub. The server is run using Node.js. To speed up the development of the frontend, we have used Vitejs. The 3D interactive scene is implemented using the JavaScript library Three.js. The choice of Three.js was motivated by the fact that it is 1) open source, 2) expressive enough to create a variety of 3D scenes in the browser, and 3) is actively maintained by a community of contributors.

Evaluation: To evaluate the expressiveness of the controlled vocabulary (contribution 1), the work used three decision trees for thematic map creation: 1) DecisionTreeA: the decision tree of the AdaptiveMaps open-source prototype (Degbelo et al., 2020); 2) DecisionTreeB: the decision tree for the choice of thematic map types from (Kraak et al., 2020); and DecisionTreeC: the visual variable syntactics from (White, 2017), which was converted to a decision tree. To evaluate the usability of the 3D interactive scene (contribution 2), the open-source prototype was tested through a lab-based user study. The study compared the interaction with two decision trees using interactive 3D scenes to the same information displayed as a simple website (text+pictures). 12 participants were recruited via personal messages. They were asked to interact with DecisionTreeA and DecisionTreeB using both conditions (interactive 3D vs static). Six participants stated to have no experience at all in the field of geoinformatics, four claimed to be slightly experienced and two considered themselves very experienced. None of the participants was familiar with the literature which was used for DecisionTreeA and DecisionTreeB. A critical difference between DecisionTreeA and DecisionTreeB is that the latter was simpler in its hierarchical structure. We measured the efficiency (time taken to answer questions), effectiveness (number of correct answers, during the interaction with the prototype), and memorability (number of correct answers to questions asked to the users after the prototype has been shut). The key takeaways from the experiments were: 1) participants were slightly faster in the text+pictures condition, but the differences in efficiency values were not statistically significant; 2) using the 3D interactive scene, participants could answer questions pertaining to DecisionTreeB more accurately; differences in effectiveness for the more complex DecisionTreeA were not statistically significant; and 3) the differences in memorability between the two conditions (interactive 3D vs static) were not statistically significant. Hence, an interactive 3D scene could be used as a complementary means to help users understand how thematic maps are created especially when designers wish to convey this information most accurately.

Relevance for the FOSSG Community: Since DecisionTreeA is the brain of the AdaptiveMaps open-source prototype that helps create web maps semi-automatically, helping users visually explore that decision tree through the 3D4DT approach is one way of realizing the requirement of algorithmic transparency for intelligent geovisualizations. The controlled vocabulary is relatively simple and could be reused to promote algorithmic transparency for other types of open-source geospatial software, if their decision rules can be modelled as decision trees (i.e., if-then-else rules).

Reproducibility: the data collected during the user study, the script for the analysis as well as all questions answered by the participants can be accessed at https://figshare.com/s/60b1a4a12f9bd32d2759. The source code of the AdaptiveMaps prototype, which used DecisionTreeA to create various thematic maps, can be accessed at https://github.com/aurioldegbelo/AdaptiveMaps . The source code of the 3D4DT prototype, the JSON schemas, and the encoding of the decision trees as JSON can be accessed at https://github.com/aurioldegbelo/3D4DT .

References:
Degbelo, A., Sarfraz, S. and Kray, C. (2020) ‘Data scale as Cartography: a semi-automatic approach for thematic web map creation’, Cartography and Geographic Information Science, 47(2).
Kraak, M.-J., Roth, R.E., Ricker, B., Kagawa, A. and Sourd, G.L. (2020) Mapping for a sustainable world. New York, USA: The United Nations.
White, T. (2017) ‘Symbolization and the visual variables’, in J.P. Wilson (ed.) Geographic Information Science & Technology Body of Knowledge.

UBT E / N209 - Floor 3
10:30
10:30
30min
Human-wildlife conflict and road collisions with ungulates. A risk analysis and design solutions in Trentino, Italy
Marco Ciolli

Among the human-wildlife conflicts, wildlife vehicle collisions is one of the most evident to the general public. Human-wildlife conflicts can be defined as the breaking of a relationship of coexistence which occurs when the needs or the behavior of a species negatively affect human activity. Among the causes there are: land use change, especially urbanization, with the construction of infrastructures that interrupt natural habitats, but also conversion of forests to agriculture and pastures, that leads to damages of crops and predation of livestock and also the increased presence of people in wilderness area for recreational activities (Corradini et al. 2021). Often these conflicts lead to the killing and persecution of species, thus compromising the conservation of the species itself. This problem is globally widespread, both in those countries where the Land Use Change already occurred in historical times as well as where the land use change is presently occurring at a dramatic pace. In the last decades, in Europe there was actually a recover of large mammal populations, due to the legal protection and abandonment of traditional agriculture (Chapron et al 2014). The increased amount of large mammals lead to an increased human wildlife interaction, including roadkill and car accidents.
This study investigates wildlife vehicle collisions in the territory of the Italian Autonomous Province of Trento (PAT) 541,692 inhabitants, extending for 6,207 km2, a mountainous area interested by a significant summer and winter tourist presence. The species taken into account are Roe deer (Capreolus capreolus) and Red deer (Cervus elaphus) that are the most common species involved in road accidents in the area. In the last 10 years an average of 700 annual collisions were registered, the animals are often killed and the vehicles are heavily damaged leading to injuries and occasionally to human fatalities. A solution of the problem is becoming urgent in a highly anthropic environment like the Alpine one.
Different measures can be adopted to reduce the risks of collisions, e.g. underpasses, overpasses, viaducts and fly-overs, fences, animal detection systems, warning signs, nets, or also a combination of the former (van der Ree et al 2015).
The main purpose of this work was to use FOSS4G to identify the road sections characterized by a greater number of collisions and to propose and design practical solutions focusing mitigation efforts on these hotspots. The practical solutions were chosen among those more appropriate to each specific situation and when a specific project is proposed it includes the costs to realize it.
Initially the work focused on the geostatistical study of roads collisions with ungulates to determine their trends in space and time. The road sections characterized by a greater number of accidents were identified with accuracy and reliability, by combining GIS geostatistical analysis and a detailed study of the morphology, land cover and other boundary conditions.
QGIS 3.16.6 was used to import data and standardize the data set, as well as to process data and produce heat maps, analysis and most of the final maps. GRASS GIS 8.2 was used to perform data integrity check fixing data errors and resample or recombine data from different sources.
A large amount of different environmental co variates such as forest coverage, ecological corridors, roads and infrastructures were collected while others (e.g. contours and slope) were created starting from the Digital Elevation Model (DEM), the Digital Terrain Model (DTM). Data about ungulates collisions were provided by the Wildlife Service of the Autonomous Province of Trento.
Since the January 2000, every road collision caused by ungulates reported by the Forest Service or by the Hunters Association or by the Road Service was stored in a geo database. In this database are stored the date, the species of affected ungulate, the sex, an indication of the age and the geographical coordinates. Last update used for this study is 08/2022 and the datum is ETRS89, frame ETRF2000, projection UTM zone 32 N.
The ungulates are active mainly during the first at dusk and dawn when the greatest number of investments are also recorded (Mayer et al. 2021). Speed limit of the roads in the hotspots are often disregarded. In a straight tract located on the state road 47 in Valsugana, the maximum speed is set at 90 km/h and about 60% of the vehicles transit with a speed exceeding the limit (90 km/h) with a daily average of more than 19,000 vehicles per day.
Once the areas of intervention were identified with QGIS we carried out on-site inspections to define the best solutions to be adopted in each specific case. GIS processing proved to be extremely informative both in the preliminary design phase and in the final design phase in which the works and interventions were defined in detail.
The five hotspots chosen for intervention were located along four state roads and one provincial road For each case a specific analysis was carried out and a series of tailored interventions (underpasses, overpasses, viaducts and fly-overs, fences, road tunnels) and works aimed at mitigating road accidents with ungulates were identified. Each site was different and posed different construction problems and for each site we developed a specific solution. In addition, a first rough estimative metric computation is developed to determine the order of magnitude of the cost required to implement the recommended interventions.
The proposed projects may create a guideline for the future politics of the provincial government.
Moreover, with the aim of creating a tool for planning interventions at provincial scale a new map was created classifying the road sections in 5 categories based on the number of road accidents with ungulates.
Sharing the capabilities of FOSS4G to improve the procedures in designing interventions that can reduce the collisions can inspire further researchers and technicians to experiment these solutions to plan the positioning of crossing structures, thus helping to mitigate Human-wildlife conflict (HWC).

UBT E / N209 - Floor 3
11:00
11:00
30min
Methods and Evaluation in the Historical Mapping of Cities
Michael Page

Through a (re)mapping and spatial modeling of a city’s past, we can build data-rich exploratory platforms to examine urban histories and engage both scholars and the public. Geospatial technologies can be applied to extract data from archives and other data sources to build historical data models, geodatabases, and geocoders that subsequently enable the development of web-based dynamic map interfaces connected to rich digital content. This paper outlines a project within a larger consortium of institutions and researchers that focuses on methods in open data and open-source development of the historical mapping of cities.

OpenWorld Atlanta (OWA) is an example of the possibilities of such a web map platform. OWA seeks to provide public access to historical information about Atlanta, Georgia (United States) during the late 19th century and early 20th century through engaging 3D and dynamic interfaces. Drawing upon historical maps, city directories, archival collections, newspapers, and census data, projects like OWA allow researchers to analyze spatially grounded questions.

Recent effort on this project focuses on the 1920s, a dynamic period in the city’s history that saw the rapid expansion of the urban footprint driven by an increase in population and public infrastructure. Between 1870 and 1940, the city was shaped by its primary modes of transportation, heavy rail, and the electric streetcar. By the 1940s, the commuter automobile began transforming Atlanta into the sprawling landscape it is today. These developments happened under racist “Jim Crow” laws, and as such, the project thus allows new avenues into investigating the long and contentious histories of racial discrimination and the Civil Rights Movement.

This paper addresses the development of OWA which was built on open-source methods and philosophy. The design of its interface and features, including the call of spatial data and digital objects from server resources, the function of metadata, the evaluation of the project in usability studies, and the building of consortia around these methods are explored. Further, the interdisciplinary approach of its research and development team and the engagement of students in the process from coding, building, and evaluation. With OWA being built using Leaflet and other forms of coding it is designed to pull spatial data and map overlays organized and stored on Emory’s instance of Geoserver developed by the Open Geospatial Consortium (OGC).

Furthermore, another vital component is the structure of the information, data, and digital objects that are stored on an instance of Omeka which is a free, open-source content management system (CMS) designed for the management and dissemination of digital collections and exhibitions. It is primarily used by archives, museums, libraries, and other cultural heritage institutions to create and manage their online collections and exhibitions. Omeka allows student researchers and assistants to prepare and upload non-spatial content that will be populated as features into the platform. With Omeka, users can create and manage items such as images, documents, and audio and video files, as well as add metadata to describe these items and make them searchable.

Metadata plays an especially significant role in the function of the OWA platform. Geospatial features are then linked to records and the corresponding pieces of information, data, and digital objects, including images and 3D models. A modified Dublin Core schema was utilized in Omeka with categories designed to better fit the geospatial and historical data collected. As an example, the fields for the buildings of a data layer include architects, date built/demolished, racial classification of residents or businesses, head of households (from census data and city directories), etc. To populate these fields, research teams comprised of graduate and undergraduate students. Engaging with faculty and staff, the students collect historical information from newspapers, archives, and online resources and enter the information into the database.

The spatial data in OWA comprises many vector layers including administrative boundaries, roads, rail lines, buildings, and more. The design includes multiple avenues for exploration based on specific years and special themes. A key feature is the buildings layer, which was populated with historical information including people, race, entity name, addresses, and more from the building of historical geocoders. The 1928 historical geocoder is complete and was used to populate the 1928 map layer, 1878 is currently in production and with the years surrounding the 1928 geocoder we are using machine learning to produce geocoders for 1927, 1929, and 1930.

Another important aspect is the recognition of the necessity of usability and user experience studies. Researchers at Yonsei and Emory Universities have collaborated to evaluate the ease of use and overall user experience of the platform. The usability study's goal is to find areas of improvement in the user interface and user flow and to gather feedback on the product's design and functionality. A primary goal is to serve as an example of and future framework for usability studies centered on diverse use groups (insider vs. outsider, academic/public, etc.). Test participants were grouped by level of familiarity with Atlanta to capture the diversity of users of the platform. This investigation focused on analyzing and evaluating user experience to explore data and content, conduct analyses, and contribute via feedback or to the resource directly. Therefore, our key questions in these groups sought to address how we can better design interactive web maps of city histories to accommodate diverse user groups.

The authors of this paper include collaborators from Emory University, Yonsei University, Stanford University, and The University of Arkansas. Further, other collaborators include The University of São Paulo (USP), a public research university located in São Paulo, Brazil and Kaziranga University, a private university located in the state of Assam, India both of which are engaged in similar or related projects. The collaborators of these projects seek to share ideas and methods surrounding the historical mapping of cities.

UBT E / N209 - Floor 3
11:30
11:30
30min
Agroforestry in the Alas Mertajati of Bali, Indonesia. A case study in applying AI and GIS to sustainable small-scale farming practices.
marc böhlen, Rajif Iryadi, Jianqiao Liu

Small scale food production has in the past not been a priority for AI supported analysis of satellite imagery, mostly due to the limited availability of satellite imagery with sufficient spatial and spectral resolution. Additinally, small scale food producers might find it challenging to articulate their needs and might not recognize any added benefit in new analysis approaches.

Our case study, situated in the geographically and politically complex Alas Mertajati in the highlands of Bali, demonstrates the opportunities of applying satellite assets and machine learning supported classification to the detection of one particular small-scale farming practice, agroforestry. To this end, we are collaborating with the non-governmental organization WISNU as well as BRASTI, a local organization representing the interests of the indigenous Tamblingan.

The practice of agroforestry is widespread across Southeast Asia [5]. Agroforestry plots are 3-dimensional food sources with a variety of species of trees, shrubs, and plants combined into a compact spatial unit. Agroforestry plots are typically small, ranging from fractions of a hectare to a few hectares, and they are often owned by local residents and farmers. Agroforestry plots are tended to manually due to the low cost of manual labor, the small sizes of the plots, the lack of appropriate farm automation systems, as well as a desire to maintain traditional, time-tested land use practices. Small-scale agroforestry can produce a continuous and stable source of valuable and essential foods. The assemblage of vegetation with varying root depth also assists in reducing landslides, an increasingly common event during extreme rainfall in the highlands of the Alas Mertajati. As such, agroforestry is a robust hedge against some forms of climate change than monoculture farm plots [4].

In Bali, agroforestry sites typically contain several major cash crops including clove, coffee, and banana together with a variety of additional trees such as palms, as well as plants and shrubs such as mango, papaya, and taro. Because of the small plot sizes and the diversity of plants contained in agroforestry sites, detection of agroforestry in satellite imagery with statistical approaches is difficult [2].

While other researchers see in the explosion of remote sensing systems an opportunity for the exploration of new algorithms [1], our contribution focuses on the under-valued process of ground truth data, both to improve landcover classification as well as to engage with a local community that will profit from the process.

The latest generation of Planet Labs satellite imagery (Superdove) offers additional spectral information (Coastal Blue (431-452 nm), Blue (465-515 nm), Green I (513-549 nm), Green (547-583 nm), Yellow (600-620 nm), Red (650-680 nm), Red Edge (697-713 nm), Near-infrared (845-885 nm)) at the same spatial resolution (3.7/m) as the earlier Dove constellation [3]. These new spectral sources offer a new window onto the presence of plants associated with agroforestry practices in the Alas Mertajati (Figure 1). After collecting a first set of reference data, we selected several popular machine learning algorithms (Random Forest, SVM, Neural Networks) to produce classifiers that are able to capture the distribution of agroforestry in the study area to varying degrees. These maps are the first representations of agroforestry in Bali Indonesia (Figures 2, 3).

We shared these first-generation maps with members of the Tamblingan (through our project partners) who have long-standing claims to the Alas Mertajati as ancestral lands. Their observations found some of the areas identified as agroforestry to be false, capturing errors and slippages our research team was not aware of.

Together with a local guide, we collected additional ground truth examples in the field. We re-trained the classification systems on the augmented data set to produce updated agroforestry representations. The improvements are twofold. First, as a GIS product. The new map (Figure 4) show a different distribution of agroforestry sites than the previous results.
Agroforestry seems more widely established within the dominant clove gardens. The previous result had a kappa index of 0.714815, and the new result generates a kappa index of 0.734687, and we expect this result to further improve as we fine-tune our classification process.

Second, as a science communication project. In our discussion with our partners, it became clear that the first maps were visually difficult to understand. The “natural” coloration of water, forest, and settlements made it difficult for some non-GIS schooled members to read the information. Consequently, we created a new visualization approach that limited the content to a single category. We projected this information onto an infrared image – from the same satellite asset that delivered the data – to an ‘unnatural’ image with lower barriers to readability (Figures 5, 6).

We used the same approach to visualize the hydrology of the Alas Mertajati (Figure 7). The hydrology data sourced from the Indonesian Government's Badan Informasi Geospatial is superimposed on the same infrared image for visual clarity. However, the data is over 20 years old (Figure 8). There is no updated hydrology map. As such, the image depicts a water rich region that has more recently been identified as water poor due to a rapid rise in water use by the changes in weather patterns and an expanding tourism industry. In fact, a first round of data collected in the field during the rain season of 2023 found multiple dry river beds (Figure 9). As a consequence, the Tamblingan though BRASTI are establishing this water poor ground truth by verifying water flow (or lack thereof) in river beds (Figure 10).

Finally, the project demonstrates the usefulness of our software repository COCKTAIL. Built upon GDAL, ORFEO and QGIS modules, COCKTAIL allows us to invoke popular GIS land cover classification algorithms to classify Planet Lab and Sentinel2 imagery. Moreover, COCKTAIL collects all settings used to create a classification and saves them, so the products can be easily reproduced. COCKTAIL works with remote storage providers to stash large files on low-cost servers. This is of particular interest when working in resource constrained environments.

UBT E / N209 - Floor 3
12:00
12:00
30min
COMPARING DIFFERENT MACHINE LEARNING OPTIONS TO MAP BARK BEETLE INFESTATIONS IN REPUBLIC OF CROATIA
Nikola Kranjčić

This paper presents different approaches to map bark beetle infested forests in Croatia. Bark beetle infestation presents threat to forest ecosystems and due to large unapproachable area presents difficulties in mapping infested areas. This paper will analyse available machine learning options in open-source software such as QGIS and SAGA GIS. All options will be performed on Copernicus data, Sentinel 2 satellite imagery. Machine learning and classification options that will be explored are maximum likelihood classifier, minimum distance, artificial neural network, decision tree, K Nearest Neighbour, random forest, support vector machine, spectral angle mapper and Normal Bayes. Maximum likelihood algorithm is considered the most accurate classification scheme with high precision and accuracy, and because of that it is widely used for classifying remotely sensed data.
Maximum likelihood classification is method for determining a known class of distributions as the maximum for a given statistic. An assumption of normality is made for the training samples. During classifications all unclassified pixels are assigned to each class based on relative probability (likelihood) of that pixel occurring within each category’s probability density function.
Minimum distance classification is probably the oldest and simplest approach to pattern recognition, namely template matching. In a template matching we choose class or pattern to be recognized, such as healthy vegetation. Unknown pattern is then classified into the pattern class whose template fits best the unknown pattern. Unknown distribution is classified into the class whose distribution function is nearest (minimum distance) to the unknown distribution in terms of some predetermined distance measure.
A decision tree is a decision support tool that uses a decision tree model and its possible consequences, including the outcomes of random events, resource costs, and benefits. It's a way of representing an algorithm that contains only conditional control statements. Decision trees are commonly used in operations research, particularly in decision analysis to identify the strategy most likely to achieve a goal, but they are also a popular tool in machine learning.
K Nearest Neighbour is a simple algorithm that stores all the available cases and classifies the new data or case based on a similarity measure. It is mostly used to classifies a data point based on how its neighbours are classified.
Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned.
Support vector machines (SVM) are supervised learning models with associated learning algorithms that analyse data for classification and regression analysis. SVMs are one of the most robust prediction methods, being based on statistical learning frameworks. Given a set of training examples, each marked as belonging to one of two categories, a SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier.
Spectral image mapper is a spectral classifier that can determine spectral similarity between image spectra and reference spectra by calculating the angle between the spectra, treating them as vectors in a space with dimensionality equal to the number of bands used each time. Small angles between the two spectrums indicate high similarity, and high angles indicate low similarity.
Bayesian networks (normal Bayes) are a type of probabilistic graphical model that uses Bayesian inference to calculate probability. Bayesian networks aim to model condition dependence by representing conditional dependence by edges in directed graph. Bayesian networks are designed for taking an event that occurred and predicting the likelihood that any one of possible known causes was a factor.
Copernicus, also known as Global Monitoring for Environment and Security (GMES) is a European program for the establishment of European capacity for Earth observation. European Space Agency is developing satellite missions called Sentinels where every mission is based on constellation of two satellites. Main objective of Sentinel-2 mission is land monitoring and it is performed using multispectral instrument. Sentinel-2 mission is active since 2015. Sentinel-2 mission carries multispectral imager (MSI) covering 13 spectral bands. Sentinel 2 mission produces two main products, level-1C and level-2A. Level-1C products are tiles with radiometric and geometric correction applied. Geometric correction includes orthorectification. Level-1C products are projected combining UTM projection and WGS84 ellipsoid. Level-2A products are considered as the mission Analysis Ready Data.
Each method is evaluated with error matrix and each method is compared to each other. A confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa – both variants are found in the literature. The name stems from the fact that it makes it easy to see whether the system is confusing two classes. Each error matrix contains Kappa value. Kappa coefficient is a statistic that is used to measure inter-rater reliability for qualitative (categorical) items. It is generally thought to be a more robust measure than simple percent agreement calculation, as κ considers the possibility of the agreement occurring by chance.
All analyses are performed on data located in Republic of Croatia, Primorsko-goranska county.

UBT E / N209 - Floor 3
13:30
13:30
30min
Digital Earth Observation infrastructures and initiatives: a review framework based on open principles
Margherita Di Leo

In recent years, the democratisation of access to Earth Observation (EO) data, in parallel to the increased volume and variety of such data, have led to the paradigm shift towards “bringing the user to the data” [4]. This is exemplified by the European Copernicus Programme, which on a daily basis makes available terabytes of high quality, openly-licensed EO data suitable for a wide range of research and commercial applications. The computational power required to work with these large amounts of data, as well as a renewed interest for Artificial Intelligence models, and the need for large storage volumes were met with a rise of cloud-based digital infrastructures and services. These infrastructures provide environments that can be readily instantiated and equipped with the necessary data and processing tools all accessible in one place, in a highly automated and scalable manner to support users in analysing EO data in the cloud. Several such infrastructures as well as other initiatives (the latter also including services and components offering specific capabilities) have been developed, either as a byproduct of single companies leveraging enormous hyperscale computing powers (such as Google Earth Engine, Microsoft Planetary Computer and Earth on AWS) or as projects funded and operated by international communities that are primarily driven by specific policy objectives. Examples are projects publicly funded by the European Commission and the European Space Agency, such as the Data and Information Access Services (DIAS) platforms, and the Thematic and Regional Exploitation Platforms.
The current landscape of digital infrastructures and initiatives for accessing and processing EO data is fragmented, with varying levels of user onboarding and uptake success, see e.g. [3]. Within this context, we offer a user-centric framework used to review 50+ existing digital infrastructures and initiatives for EO. Our work is expected to extend the scope and outlook of similar smaller reviews [1], where 7 digital infrastructures are qualitatively compared according to a set of ten criteria, mainly of a technical nature. The proposed review framework is conceptualised from a user-driven perspective by mapping user needs to current infrastructure and service offers, ultimately aiming at identifying overlaps and gaps in the existing ecosystem. The framework is organised around 5 pillars corresponding to common problem areas: 1) sustainability of the service, 2) redundancy of service, 3) user onboarding, 4) price and 5) user needs. Within each problem area, we further identified a number of good practices for user-centric developments of infrastructure and services. The good practices are derive from the authors’ longstanding experience in using digital EO infrastructures and are framed around several aspects related to open principles, both from the technical and the organisation side.
The first pillar is the sustainability of the infrastructure/initiative after the initial funding phase. Good practices include: fostering the creation of a community of users/developers that ensures preservation/evolution of the infrastructures/tools; releasing software under open source licenses, which encourages the reuse and growth of products considered to be useful by the community; adopting open standards and releasing specifications in the public domain, facilitating interoperability and reuse.
The second pillar is the fragmentation between infrastructures/initiatives causing redundancy of services. Relevant good practices involve the use of open source licensing models in favour of collaboration and reuse, the adoption of common open standards and Application Programming Interfaces (APIs), the federation of resources and federated authentication.
The third pillar consists of the steep learning curve often needed to start using digital infrastructures/initiatives; related good practices include, in addition to well-written and openly available documentation (including resources such as step-by-step videos and tutorials), the availability of sandboxing solutions that allow users to experiment with the infrastructure/initiative to understand if the offer matches the needs.
The fourth pillar is the price of using infrastructures, which is not always transparent and/or clearly describing the services offered. The related good practice consists in the provision of a full and transparent list of services and related costs.
The fifth and last pillar is the top-down design and implementation of the infrastructure/initiative, with limited consideration of user’s needs. Good practices include co-design approaches, where users are actively involved in all phases and their feedback used to adjust the developed prototype [2], the establishment of helpdesks, forums, mailing lists and channels fostering community growth around the project, and the adoption of open source development and open governance.
The results of applying this review framework to 50+ digital EO infrastructures and initiatives shed light on a first set of limitations (from a user-driven perspective) common to many platforms. The most important include: discoverability of available datasets; steep learning curve to start using their services; difficulty to understand what the offered services are and whether they fit user needs; not fully transparent pricing; no reusability of software components; poor interoperability; vendor lock-in; no facilitation for code sharing/reuse; lack of guarantee of long-term sustainability of the infrastructure; internal policies hampering publication of commercial added-value code/algorithms. At the same time, the review identified some promising digital EO infrastructures and initiatives that already adopt most of the aforementioned good practices. These include, among others, the OpenEO API initiative, which aims to facilitate interoperability between cloud computing EO platforms, and the infrastructure of the Open Earth Monitor project, which adopts an open source, open data and open governance model by default.
This review, which is currently being applied to a growing number of infrastructures and initiatives, is expected to help the user community identify overlaps, gaps and synergies as well as to inform the providers of infrastructures and initiatives on how to improve existing services and steer the development of future ones.

UBT E / N209 - Floor 3
14:00
14:00
30min
Google Earth Engine and the Use of Open Big Data for Environmental and Climate-change Assessments: A Kosovo Case Study
Dustin Sanchez

Kosovo is one of the most environmentally degraded countries in Europe. It is also one of the poorest. The country lacks the capacity to conduct environmental assessments to gauge the scale of its environmental problems. It has even less capacity to understand its vulnerability to climate change and its prognosis for sustainable development. This paper describes the use of available (open) resources by the technically trained to understand environmental changes and provide a framework for developmental research that provides practical understandings of climate impacts. There tends to be a lack of awareness of the tools and scant knowledge of their use towards sustainable development.
An environmental assessment of Kosovo using large and open remote-sensing data from Google Earth Engine is explained through an embedded multi-case design. Our approach used publicly available models and code walkthroughs from the book Cloud-based Remote Sensing with Google Earth Engine. The models were coded for Kosovo and the greater western Balkans region in JavaScript using Google Earth Engine open datasets to analyze environmental conditions in this region. This work demonstrates the value of free and open tool development and analysis for development of environmental sustainability. The use of open data requires careful analytical designs and the application of correct tools for specific regions and particular uses. Complex environmental conditions can muddle the data and analyses generated from open datasets. The “un-muddled” analysis performed here adds to the knowledge base of the environmental conditions within Kosovo and provides insight into regional assessment of changing climates.
Models for air pollution and population exposure, groundwater monitoring with GRACE, urban environments, and deforestation viewed from multiple sensors were compiled into an environmental assessment of the scopes and scales of several environmental issues that plague Kosovo. The air pollution and population exposure model assesses the human toll of air pollution in Kosovo. Groundwater monitoring with Gravity Recovery and Climate Experiment (GRACE) appraises the health of aquifers and the security of water resources. Urban-environment analysis evaluates the changes that are occurring in urban locations in Kosovo. And the deforestation model is used to determine and evaluate the changes to several environments in Kosovo. The project will also include discussions of scalability to understand how the interconnected environmental conditions of the Balkans region can be further studies. The models, analytical frameworks, and overarching goals provide a robust strategy towards practical leveraging of remote sensed data to provide intrinsic value into developmental countries.
The methods are interchangeable and replicable for climate-change analysis, sustainability decision making, and monitoring of environmental change. The urban expansion in Kosovo from 2010 till 2020 is studied with Landsat and MODIS mission data to understand the consequences of land use change. The air pollution and population exposure model employs Sentinel-5 TROPOMI and population density data to help discern air pollution levels and the human toll of environmental degradation. The groundwater monitoring application uses Gravity Recovery and Climate Experiment strives to clarify water storage capacities and trends within Kosovo’s aquifers. The forest degradation and deforestation model uses Landsat mission data to understand the changes occurring within the forests of Kosovo. The combination of these models creates a comprehensive case study of the environmental conditions within Kosovo and provides a baseline for understanding the effects of changing climates in the region. This information is crucial in developing effective strategies to address the challenges posed by climate change and to ensure a sustainable future for the region.
This paper clarifies the methods used for modeling of big data sets in Google Earth Engine to generate products that can be used to assess both climate change and environmental change. We explore the frameworks for cloud computing of open-data environmental analyses by evaluating data selection and analytical techniques to provide an analytical framework for future development. Further building the cross-sectional understanding of the leverage utility of Google Earth Engine with analytical frameworks that provide utility with developing academic frameworks for resilience building and products that can traverse into government institutional knowledge building, private sector sustainable developmental gaps, public sector environmental and climate developmental strategies.
The emergence of new technologies has provided opportunities for new approaches to broadly understand the impacts of global climate change and free-to-use frameworks places the capacity to understand attainable for developing countries. The use of this technology enables development of a regional understanding of climate change, its impacts, and the approaches for enhancing resilience through analysis of petabytes of open satellite data. This paper delivers a framework with which remotely sensed data can be assessed to understand how human-environment interactions in developing nations will be influenced by changing climates. These models which are all functionally different have environmental links that through development provides the future of open big data for building climate change resilience through a remote sensed top to bottom understanding of what the data means and how it can be applied.

UBT E / N209 - Floor 3
14:30
14:30
30min
An application-oriented implementation of hexagonal on-the-fly binning metrics for city-scale georeferenced social media data
Dominik Weckmüller

Introduction

The analysis of georeferenced social media (SM) data holds broad potential for informing municipal policy-making. Local adaptation to climate change and disaster resilience, transforming city centers, gentrification, and demographic change are significant challenges for municipalities.
In light of these pressing topics, a growing awareness for data-driven decision making has fostered geospatial interfaces that allow practitioners to interactively explore data source.
Particularly SM offers the potential of a live feed and continuous reflection of events at scale. Although many studies have an urgent need for a purpose-driven, customized visualization of spatial data, little emphasis has been put on how to display these data.
Many studies on map-based visualization in SM use traditional cartographic methods, such as pins or choropleth maps, with varying color scales or heatmaps to represent absolute or relative values. However, SM data presents challenges that require more sophisticated statistical metrics and flexible visualization techniques. We assess the signed chi metric, specifically designed for mapping via binning, and expand its use in a Bonn case study using an on-the-fly hexagonal binning method for frontend applications like dashboards. We then evaluate the advantages and disadvantages of the various proposed metrics and visualizations in terms of their practical applications.

Problem Statement

As the overview by (Teles da Mota & Pickering 2020) has shown, research involving geo-SM from different platforms has become increasingly popular but bears specific problems inherent to the characteristics of volunteered geographic information (VGI) – volume, veracity, velocity, variety are just broad categories used to characterize these.

Firstly, Access to SM databases, such as Meta or Twitter, is usually limited to capital intensive partner companies. Instagram's public-facing API is largely undocumented and opaque to end-users, causing uncertainty about data selection criteria (Dunkel 2023). Hence, the lack of knowledge about data context and possible biases can affect the representativeness of the data subset.

Second, "super users" sharing repeated content may create noise and skew analysis outcomes if absolute values are solely considered.

Third, as Teles da Mota & Pickering (2020) point out, research has been conducted mainly for large areas ranging from national parks to entire countries or seldomly even the whole world (cf. Dunkel et al. 2023). Studies working with data on the municipal level where individual locations and differences of only a few meters play a significant role, are usually not focusing on methodological cartographic issues or appropriate metrics but rather on effectively communicating core research results. Due to this lack of reference material for the municipal level, a research gap of proper visualization methods is identified.

Lastly, VGI, as practiced by Instagram, poses a unique problem for researchers. Users are allowed to create public "Instagram Locations" and tag their posts with a coordinate of their choice, which can then be referenced by other users as well. However, the user is not obligated to provide a clear definition of what exactly is meant by the location they choose, creating ambiguity. For instance, the "Bonn" location's coordinates (50.7333, 7.1) are situated in the city's center. What it actually refers to is entirely subject to the interpretion of the user. It could refer to different extents of the city center, the official administrative boundaries of Bonn or anything loosely associated with Bonn, including cultural references or events. This ambiguity which Meta is aware of (Delvi et al. 2014) can be observed on different zoom levels such as city districts, cities, countries or continents throughout Instagram data and poses an enormous challenge to researchers working with city-scale areas of interest.

Research Interest

In order to deal with these challenges, a thorough data cleaning is insufficient. We propose an application-oriented system of metrics for data processing and visualization depending on the user’s needs, by comparing possible application scenarios as well as limitations based on a case study for the city of Bonn with Instagram data from 2010 - 2022:
1. Absolute values – absolute number of observed posts per location or bin
2. Relative values – relation between observed and expected posts per location or bin
3. Signed chi – statistic value indicating significance and direction per location or bin

The observed value usually refers to a quantity found at a specific bin, using a specific query such as a thematic filter. In contrast, the expected value often refers to an average quantity of a generic query, such as the average of all SM posts in Bonn, and it is used to identify over- or underrepresented spatial patterns at local bins (Visvalingam 1978). However, what is considered as the observed value for normalization is up to the analyst (Wood et al. 2007). One could also compare average thematic posts in all German cities (the expected value) to those found in Bonn, as a means to concentrate on the difference of the subject under analysts (posts in the city of Bonn). Or, another option could be to use discrete periods of historical time intervals as the expected value, and compare to the recent posts quantities to identify recent and unusual spatial posting behavior trends.

We evaluate these metrics through a hexagonal on-the-fly binning approach with different color scaling and propose easily customizable scripts for the leaflet-d3 plugin. We provide all our scripts for reproduction with explanations and usage recommendations as well as a demo dashboard in a public GitHub repository.

Our findings suggest that all of the investigated metrics can offer insight into data, but their appropriate use highly depends on the research question at hand. When using the dashboard frontend, outliers should be highlighted, non-significant values reduced in opacity, or intra-dataset validations being carried out through automatic comparisons across metrics and filters. Overall, the absolute metric is to be used sparingly. The relative metric generates only a very narrow gain in knowledge whereas the signed chi metric yields the best overall results and deals very well with the above issues.

UBT E / N209 - Floor 3
15:00
15:00
5min
Adaptation of QGIS tools in high school geography education
Jakub Trojan

Geographic Information Systems (GIS) has been around for more than 60 years. It has become a significant part of many scientific disciplines with a spatial component. In the last decades the educators have been trying to figure out a way, how to adopt its tools for their own field of study, the classrooms (Milson et al., 2012). Since then, several studies of their efforts have been carried out. Thanks to the emergence of open source software and open data, new opportunities for their visions have unfolded (Petráš, 2015). Particularly QGIS, in environments where teachers do not have access to sufficient funding, has been lately getting more attention.

Educators, backed up by years of research, believe that by collecting, displaying and analyzing spatial data, students can solve local problems, foster and drive their learning process of geography phenomena. After the use of GIS they are supposed to gain digital skills and extraordinary thinking that can be essential for their future careers and be motivated to pursue a career in science and engineering (Bednarz, 2004).

Implementation of GIS software into high school geography classes is, however, a lengthy process that requires a lot of patience and confidence. A teacher may come across four major obstacles: 1) lack of hardware, software or data, 2) lack of teacher training and materials, 3) lack of support for innovations, and 4) lack of time to learn and teach GIS (Kerski, 2003). The biggest issue has come to be the insufficient pre-service and in-service teacher training in geoinformatics and its application. A recent systematic study (Bernhäuserová et al., 2022) has concluded that the majority of the limits were related to teachers and resources.

In our study, we have tried to create strategies that can lead to the successful adaptation of QGIS tools in high school geography education. To reach out the goal and answer more questions, we have designed ten lectures that focus on the basics of QGIS. We drew inspiration from several official QGIS cookbooks and manuals. In each lesson, we applied a set of the most essential tools. For our study, we chose a qualitative method of design-based research (DBR), which focuses on designing study materials, testing them in classes and coming up with a theory (methodic) that can innovate learning environemnts (Bakker, 2018). To pilot our ready-to-use lectures and data, we have partnered with a 4-year South Moravian high school based in Brno, Czechia, which offered us two classes of second and final-year students. The research lasted three months, during which we taught 12 courses. Older students tried out lectures 1 to 7, except 6 (1 and 2 at home) and younger students tried lectures 1 to 3 and 8 to 9. After every class, students had to fill out a short questionnaire reflecting on their feelings and experience. They had to do a set of exercises for each lecture as homework and turn it in along with the finished maps. At the end of each trial, the groups were tested on their knowledge. Based on the observation that was carried each class, three categories according to the students' experience were drown out: ones that had no problem following the lecturer´s instructions, ones that often faced problems and those that worked individually. Students were asked to identify with one of them and then asked to participate in a voluntary interview, in which their experience would be discussed.

During both trials, students had to bring their own computers, which for some, caused several issues, from failed installations to technical complications during each lecture. The large number of students in each class (app. 30) also proved that the lecturer cannot assist every student in such conditions. Students chose different approaches and strategies. Most of them wanted to finish the task and faced no problems. A much smaller amount focused on understanding and worked individually. Only a few played with the program and found interest in it. In each group, only one student had previous experience with QGIS. However, most of the students understood every lecture, and found its content enjoyable, and in the test, they have proven to learn the basics of the program. If it would be up to them, they would implement GIS in the geography curriculum, change the tempo of the lectures (to progress more slowly) and divide them into smaller groups, which would benefit both parties. The older students were less motivated to participate; they were used to classes that were more passive and did to have enough free time to focus on anything except their graduation exam. Younger students were easier to motivate; more of them were interested in geography and had more time for homework. Both groups have produced unique maps, which display their gradually gained cartography skills and knowledge. They advise anyone interested in learning QGIS to have enough patience, gather good learning materials (referring to the ones we made) and work on a computer they know very well.

UBT E / N209 - Floor 3
15:05
15:05
5min
Teaching Geographic Information Science concepts with QGIS and the Living Textbook – towards a sustainable and inclusive Distance Education
Andre da Silva Mano

In recent years, the need for distance education solutions has been a point of attention for the Faculty ITC of the University of Twente (The Netherlands). Starting in 2017, a fully online program spread over nine months offered an alternative path to start an MSc in Geo-Information Science and Earth Observation. As using proprietary software is more difficult in distance courses, the focus shifted towards open-source alternatives. The experience and lessons learned came to their full potential when, in 2020, many students could not travel due to the travel restrictions imposed by the COVID pandemic. In response, ITC offered the fully online course Principles and Applications of Geographic Information Systems and Earth Observation as the first quartile of what is supposed to be a fully presential MSc Program. The course was developed around four fundamental principles: (1) The course was exercise led; (2) Every concept taught should be demonstrated and operationalized; (3) The number of different software tools should be minimized; (4) The software tools should be inclusive and encourage technological independence. Two Open-Source tools were selected: The Living Textbook a digital textbook developed and maintained by us [1], and QGIS to operationalize the concepts. For synchronous communication and iteration, Big Blue Button Conferences were integrated into the Learning Management System environment and organized according to time zones to serve a student population spread across eight time zones.

After running the course, we evaluated the impact of the new set-up on students (satisfaction and performance) and staff (attitude towards open source tools and open courseware). Additionally, we also evaluated the impact of the course in strengthening the wider Open Science initiative. Results show that for students, both satisfaction levels and attainment levels of the course’s learning outcomes were high. For the teachers, the feedback was generally positive, highlighting the importance of using flexible and inclusive tools. The courseware developed for the course is now offered to the Open Science community as open courseware [2] . It is the basis of having the Faculty recognized as a QGIS Certified Organization, thus strengthening the relationship between academia and FOSS4GIS, particularly QGIS.

Internally, this experience brought essential insights into successful online course design. These include but are not limited to (A) consistency – the tools and support materials of the course should remain the same during the course; and (B) accessibility – the tools used should not have any accessibility barrier, especially when it comes to licenses, but also when it comes to imposing operating system platforms or assuming file format preferences. Important results include changing the teaching staff attitude towards a more aware and confident use of FOSS4GIS. That change resulted in new paradigm shift faculty-wide paradigm where FOSS4GIS is now the primary choice for teaching. Finally, on a larger plane, the commitment of ITC to the Open Science agenda has, in its compromise to adopt and contribute to the development of Open Source Software, an essential element of the Open Science agenda.

[1] https://www.itc.nl/about-itc/organization/resources-facilities/living-textbook/
[2] https://principles-and-applications-of-rs-and-gis.readthedocs.io/en/latest/

!!! >> From this point on it is just text to sum up the minimum of 800 words so that the platform let us submit. We hope the abstract is clear enough for an evaluation. If not we are happy to expand if the reviewers wo wish it. Thank you !!! <<


ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis ipsis verbis

UBT E / N209 - Floor 3
No sessions on Saturday, July 1, 2023.