BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//talks.staging.osgeo.org//foss4g-2022//talk//B8JZTD
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-foss4g-2022-B8JZTD@talks.staging.osgeo.org
DTSTART;TZID=CET:20220825T171500
DTEND;TZID=CET:20220825T174500
DESCRIPTION:We present *dask-geomodeling*: an open source Python library fo
 r stream processing of GIS raster and vector data. The core idea is that d
 ata is only processed when required\, thereby avoiding unnecessary computa
 tions. While setting up a dask-geomodeling computation\, there is instant 
 feedback of the result. This results in a fast feedback loop in the (geo) 
 data scientist’s’ work. Big datasets can be processed by parallelizing
  multiple data queries\, both on a single machine or on a distributed syst
 em. \n\n### Abstract \nIn geographical information systems (GIS)\, we ofte
 n deal with data pipelines to derive map layers from various datasets. For
  instance\, a water depth map is computed by subtracting the digital eleva
 tion map (DEM) from a water level map. These procedures are often done usi
 ng open source products such as PostGIS and QGIS. However\, for medium to 
 large datasets (> 10 GB) the extent of these analyses are costly due to me
 mory restrictions and computational cost. As a rule\, these issues are tac
 kled by manually cutting the dataset into smaller parts. However\, this is
  a tedious and time-consuming task. In case one needs to this regularly\, 
 this is not feasible. \n\nWe present the open source Python library *dask-
 geomodeling* [1] to solve this issue. Instead of a script\, dask-geomodeli
 ng requires a so-called “graph”\, which is the definition of all opera
 tions that are required to compute the derived dataset. This graph is gene
 rated by plain Python code\, for instance: \n\n```\nplus_one = RasterFileS
 ource('path/to/tiff') + 1 \n```\n\nNote that these operations are lazy: th
 ere is no actual computation done and therefore the above line executes fa
 st. Only when actual data is requested: \n\n```\nplus_one.get_data( \n    
 bbox=(155000\, 463000\, 156000\, 464000)\, \n    projection='epsg:28992'\,
  width=1000\, height=1000 \n)\n```\n\nAn array containing the data is comp
 uted. No need to load the whole TIFF-file in memory if you only use a smal
 l part! \n\nThe computation occurs in two steps. First\, a computational g
 raph is generated containing the required functions. While generating the 
 computational graph\, the operations may be chunked into smaller parts. Se
 cond\, this graph is evaluated by *dask* [2]\, using any scheduler (single
  thread\, multithreading\, multiprocessing\, distributed) that is provided
  dask. \n\nThis library is open source under the name “dask-geomodeling
 ” and is distributed on Github\, PyPI\, and Anaconda. A hosted cloud ver
 sion is also available under the name Lizard Geoblocks [3]. Currently\, we
  have implemented a range of operations for rasters\, vectors\, and combin
 ations. The community is welcome to use our library\, benefit from it\, an
 d expand it! \n\nReferences \n\n1. dask-geomodeling\, https://github.com/n
 ens/dask-geomodeling\, https://dask-geomodeling.readthedocs.io/ \n2. dask\
 , https://dask.org/\n3. Lizard Geoblocks\, https://lizard.net/
DTSTAMP:20260412T044659Z
LOCATION:Room 4
SUMMARY:Agile Geo-Analytics: Stream processing of raster- and vector data w
 ith dask-geomodeling - Casper van der Wel
URL:https://talks.staging.osgeo.org/foss4g-2022/talk/B8JZTD/
END:VEVENT
END:VCALENDAR
