BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//talks.staging.osgeo.org//foss4g-europe-2025//talk//JQMHC
 W
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-foss4g-europe-2025-JQMHCW@talks.staging.osgeo.org
DTSTART;TZID=CET:20250718T140000
DTEND;TZID=CET:20250718T143000
DESCRIPTION:Population census is one of the most complex statistical undert
 ake of a country that results in a detailed social\, demographic\, and eco
 nomic data about its population. Achieving and maintaining of population's
  welfare relies heavily on effective socio-economic policies that are root
 ed in census data. According to the United Nations The 2020 World Populati
 on and Housing Census Programme\, census data is the backbone for formulat
 ing\, implementing\, and monitoring of such policies (United Nations Stati
 stical Commision\, 2015 ) as it allows policymakers to make data-driven de
 cision-making and target economic and social challenges more effectively. 
 \nIn European context\, the collection of census data has a long-standing 
 tradition and today is governed by legal frameworks such as EU Regulation 
 on Population and Housing Censuses 763/2008 that standardized methodology 
 and comparability across countries. However\, the way census data is disse
 minated was transformed with the adoption of 2019 EU Open Data Directive w
 hich obliged governments to unrestrictedly publish data for anyone to reus
 e. Not only does it argue for free and available data\, but the Directive 
 also classifies census data as high-value data. Such classification unders
 cores its immense potential for fostering societal and economic developmen
 t and urges its provision in machine readable formats or via suitable APIs
  to foster this goals. \nPublishing census as open data on the web undoubt
 edly improves data accessibility but simply making the data available does
  not necessarily eliminate the challenges associated with data integration
  and interoperability. Potential solution to this problem lies in the shif
 ting from a web of documents\, which is inherently designed for human cons
 umption\, to web of data where structured\, machine-accessible data enable
 s automated processing and integration by the computer (Hogan\, 2020). To 
 assess how effectively open data supports the transition to a web of data\
 , Tim Berners-Lee proposed the 5-star deployment scheme for Linked Open Da
 ta (LOD). This rating system evaluates the openness and interoperability o
 f data based on a set of key principles. At the most basic level (one-star
  data)\, data is merely published on the web\, regardless of format. As th
 e data becomes more structured and adheres to semantic web standards\, it 
 progresses through higher levels of the scheme. The highest level\, five-s
 tar linked open data (LOD)\, represents dataset that is semantically descr
 ibed\, structured in standard formats\, and interlinked with other dataset
 s. This transition from isolated\, file-based datasets to interconnected\,
  structured data allows scattered data to be connected into a global knowl
 edge ecosystem\, paving the way for more intelligent data use.\nThe concep
 t of LOD is intrinsically linked to the Semantic Web\, as it relies on sem
 antic web technologies such as RDF triples (Resource Description Framework
 )\, SPARQL\, and ontologies to structure and interlink datasets across the
  web. By adhering to the LOD principles\, use of URIs\, HTTP access\, RDF 
 structure\, and links to other datasets\, LOD facilitates data discoverabi
 lity\, interoperability\, and reusability\, ultimately allowing for richer
 \, more insightful analyses across multiple domains.\nThe growing demand f
 or semantically interoperable population census data (LOD) has exposed the
  limitations of traditional data dissemination formats\, prompting the nee
 d for more flexible and machine-processable solutions. In Croatia\, nation
 al statistical agency provides geocoded census data as open data primarily
  in. xslx format\, which imposes significant constraints on automated data
  processing\, and cross-domain analysis. To overcome these limitations\, t
 his research aims to provide the basis for publishing the Croatian census 
 as LOD by utilizing the capabilities of OpenRefine\, a fully open source d
 ata processing tool. By using open source technology\, we ensure that each
  step of the transformation – from data cleaning to RDF triple generatio
 n – is transparent\, reproducible and adaptable to diverse datasets and 
 research contexts\, which is in line with the principles of openness advoc
 ated by FOSS4G communities.\nTo achieve the proposed goal\, the methodolog
 y includes three main steps: (1) data source identification\, (2) semantic
 al description of the census data and (3) data transformation to RDF tripl
 es. The census data provided by the Croatian Bureau of Statistics is ident
 ified as the primary dataset. This data pertains to the 2021 Census\, is a
 ggregated at the administrative spatial unit level\, and is provided in .x
 lsx format. Additionally\, corresponding spatial unit geometries are obtai
 ned from the State Geodetic Administration\, which provides geospatial bou
 ndaries in ESRI Shapefile format. Data semantical descriptions reuse exist
 ing vocabularies and ontologies to define a structured conceptual model fo
 r census data. Specifically\, the W3C RDF Data Cube Vocabulary is employed
  to model the semantic structure of census attributes\, while the OGC GeoS
 PARQL Query Language is integrated to incorporate geospatial components\, 
 ensuring that census data is linked to its corresponding spatial regions a
 nd geometries. In the final step\, .xlsx census data are transformed into 
 RDF triples using OpenRefine. The existing tabular structure is mapped to 
 the RDF Data Cube schema\, with spatial units classified as qb:Dimension\,
  population counts as qb:MeasureProperty\, and units of measurement as qb:
 AttributeProperty. Finally\, GeoSPARQL classes are utilized to extend spat
 ial units with geospatial properties\, such as polygon geometries.\nPublis
 hing census data as LOD using Open Refine data manipulation tool demonstra
 tes that existing open technology and conceptual models can easily support
  the transition to a web of data. However\, a potential limitation of the 
 presented approach lies in its realiance on predefined concepts withing th
 e RDF Data Cube schema\, rather than extending the ontology to include mor
 e detailed domain-specific concepts. Nonetheless\, converting Croatia’s 
 census data into LOD represents a significant step toward improved data in
 tegration\, enhanced accessibility\, and the generation of new insights. F
 uture research efforts in this direction may focus on expanding the scope 
 of LOD cloud by integrating housing census data and establishing linkages 
 between population and housing statistics in an LOD framework.
DTSTAMP:20260527T053250Z
LOCATION:PA01 (Quarticle)
SUMMARY:Open Technologies Supporting Linked Open Data Publishing: Croatian 
 Population Census Case Study - Karlo Kević
URL:https://talks.staging.osgeo.org/foss4g-europe-2025/talk/JQMHCW/
END:VEVENT
END:VCALENDAR
