BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//talks.staging.osgeo.org//foss4g-2024-academic-track//tal
 k//KPYFTX
BEGIN:VTIMEZONE
TZID:-03
BEGIN:STANDARD
DTSTART:20000101T000000
RRULE:FREQ=YEARLY;BYMONTH=1
TZNAME:-03
TZOFFSETFROM:-0300
TZOFFSETTO:-0300
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-foss4g-2024-academic-track-KPYFTX@talks.staging.osgeo.org
DTSTART;TZID=-03:20241204T154500
DTEND;TZID=-03:20241204T161500
DESCRIPTION:Innovations\, such as voice recognition and natural language pr
 ocessing (NLP)\, have significantly impacted various fields by enabling mo
 re natural interactions between humans and machines (Mahmoudi et al.\, 202
 3). In geoinformatics\, these advances are crucial for visualising geospat
 ial data\, allowing the creation of interactive and dynamic maps (Craglia 
 et al.\, 2012). Online mapping applications\, like OpenStreetMap (OSM)\, h
 ave democratised spatial information by enabling public participation in i
 ts creation and maintenance (Haklay\, 2010). Geolocation is essential in c
 ontemporary applications\, such as navigation\, emergency services\, and l
 ocation-based services. Google Colaboratory (or Colab) Notebook Environmen
 t stands out in promoting open science due to its accessibility\, ease of 
 use\, and collaborative capabilities\, and enabling the embodiment of the 
 FAIR principles (Camara et al.\, 2021). This study aims to develop a voice
  interaction application in Google Colab Notebook Environment to answer th
 e question: "Is it possible to develop a voice command application for geo
 location and visualisation of geospatial data within the Google Colab envi
 ronment?" The methodology includes FOSS libraries and tools such as geopy\
 , speech_recognition\, ffmpeg\, librosa\, and flask\, subdivided into six 
 stages: Audio Data Acquisition\, Audio Processing\, Speech Recognition\, G
 eocoding\, Visualization\, and Interface Development. The complete code\, 
 under an open license\, and how to reproduce this work are available on Gi
 tHub. Audio capture is performed using the Web Speech API in JavaScript (J
 S)\, which allows real-time voice recognition and integration with the Med
 iaDevices API to access the user's microphone. This method provides an int
 erface for high-quality audio recording\, essential for speech recognition
  and geocoding accuracy. Audio processing involves converting the ".webm" 
 format to ".wav" using ffmpeg\, efficiently maintaining the original audio
  quality. The Librosa library loads the audio\, adjusts the sampling rate\
 , and extracts relevant features from the audio signal\, such as spectrogr
 ams (Bisong\, 2019). Speech recognition is performed with the SpeechRecogn
 ition library in Python\, which provides an interface for various speech r
 ecognition services\, including the Google Web Speech API. This choice is 
 due to its high accuracy and support for multiple languages\, ensuring the
  system's flexibility and accessibility to a diverse audience (Nassif et a
 l.\, 2019). Geocoding transforms textual descriptions of locations into ge
 ographic coordinates\, allowing the visual representation of these locatio
 ns on an interactive map. The geopy library and the Nominatim service from
  OSM are used to convert addresses into latitude and longitude coordinates
  (Mooney & Corcoran\, 2012). For the visualisation of geocoded data\, a we
 b server was implemented using Flask\, a microframework for Python that al
 lows the creation of lightweight and efficient web applications. The user 
 interface was developed with HTML\, CSS\, and JS\, providing an intuitive 
 and interactive experience. The results show that the user and machine int
 eraction occurred satisfactorily. The first message displayed to the user 
 instructs them to slowly state the name of the city\, state\, or country t
 hey wish to geolocate. The use of JS and the Web Speech API allowed the sy
 stem to detect specific voice commands to start and stop recording\, as in
 dicated by the interface colors and states. This step is crucial for subse
 quent steps to ensure that the captured audio is clear and understandable.
  When the start command is recognised\, the interface changes to indicate 
 that the recording is in progress. The message "Command recognised: starti
 ng recording" confirms that the command was detected correctly. If the voi
 ce command is not recognised\, the interface displays a message asking the
  user to repeat the command. After recording\, the audio is saved in ".web
 m" format. If a previous audio file exists\, it is automatically overwritt
 en. This approach simplifies file management and avoids the accumulation o
 f unnecessary data. Next\, the audio is converted to ".wav" format using t
 he ffmpeg library. Then\, the audio is transcribed using the Web Speech AP
 I and the SpeechRecognition interface for the recognised language\, along 
 with the confirmation of the geocoded location and its respective latitude
  and longitude. The visual feedback proved essential for the user to confi
 rm that the entered information was recognised\, improving the system's us
 ability. The displayed information includes city\, region\, country\, lati
 tude\, and longitude. The interactive map allows the user to visualise and
  interact with the located area\, altering the zoom level and receiving a 
 voice message informing the map's current zoom level. This work presented 
 the integration of tools that assist in advances in human-computer interac
 tion in geoinformatics\, offering an intuitive and accessible interface fo
 r users of different technical proficiency levels. The results confirm the
  feasibility of voice command geolocation in Google Colab\, a platform tha
 t can be used for education\, research\, collaboration\, and sharing in sc
 ience\, enabling this work's reproducibility. Future research can improve 
 voice interaction features\, explore geolocation methods such as bounding 
 boxes\, and reduce dependence on JS and Flask. Improving the requirements 
 for peripheral devices could further increase the system's accuracy\, acce
 ssibility and user experience. The importance of geospatial accessibility 
 lies in enhancing service provision\, urban planning\, and social inclusio
 n\, facilitating mobility for people with disabilities\, and improving urb
 an infrastructure (Han et al.\, 2020).
DTSTAMP:20260513T083941Z
LOCATION:Room II
SUMMARY:Natural Language Processing and Voice Recognition for Geolocation a
 nd Geospatial Visualization in Notebook Environment - Nathan Damas
URL:https://talks.staging.osgeo.org/foss4g-2024-academic-track/talk/KPYFTX/
END:VEVENT
END:VCALENDAR
