07-03, 11:30–12:00 (Europe/Tallinn), GEOCAT (301)
Metadata, YAML files and pipelines? When I try to convince my colleagues that the approach mentioned in this presentation is fun, they look at me alienated.
This presentation will highlight the usage of pygeometa, mdme and DevOps workflow in two projects from different domains of interest.
Land-Soil-Crop data
ISRIC is endorsing the pygeometa MCF format, a YAML-based representation originally developed as a subset of ISO 19115 metadata, advertised by the pygeometa community as 'Metadata Creation for the Rest of Us'. YAML reads much better then XML, and is optimal for content versioning in Git. But YAML comes with its peculiarities, such as strict indenting and reserved characters.
'Average users should not look at code, instead use shiny (web) interfaces' is a quote often used, but we're not used to reverse the quote: "As a DevOps engineer I hate shiny interfaces. I want to look at code, see the history of that code, who changed what, when, and how can I fix it".
This is where the fun part of pygeometa MCF comes in. CI/CD pipelines which run on content changes validate the YAML format and report errors to the submitters.
Should we then fully neglect the basic user? Of course not! So we crafted web based forms that generate mcf (osgeo.github.io/mdme) and have import options for Excel sheets (every column is a metadata field). Consider that many data scientists (fortunately) are used to placing a README.md in any project folder. We just ask them to structure the content using YAML. We added an inheritance mechanism, so common properties (contact details, usage constraints) are inserted only once and inherited by lower levels in the folder hierarchy. And embedded metadata is extracted from data files (bounds, projection, format) or online sources.
All this metadata is crawled to a central search index (pycsw/pygeoapi/geonetwork). To increase the participatory experience we added 'Edit me on GIT' links to each of the records, which brings users back to the original mcf file to suggest changes.
Weather/climate/water metadata
The WMO Information System (WIS2) is the next generation data exchange infrastructure for real-time and archive weather/climate/water data. Discovery metadata is a key component for cataloguing and discovery. An event driven architecture, metadata files are managed on GitHub, which on change, trigger CI/CD workflow to generate compliant WMO discovery metadata, validation and publish to an MQTT broker.
Tom Kralidis is with the Meteorological Service of Canada and longtime contributor to FOSS4G. He contributes to numerous projects in the Geopython ecosystem.
Tom is the co-chair of the OGC API - Records Standards Working Group, chair of the WMO Expert Team on Metadata, and serves on the OSGeo Board of Directors.
DevOps engineer at ISRIC - World Soil Information. We maintain a range of datasets and catalogues related to global soil property distribution (chemical, physical and biological)