Andri Riid
Andri Riid received his M.Sc. and Ph.D. degrees in System Engineering from Tallinn University of Technology in 1997 and 2002, respectively. He currently works as a Senior Research Scientist in the Laboratory for Proactive Technologies, Department of Software Science of the same university and in EyeVi Technologies as AI Team Lead. His research interests include comptational intelligence, signal processing, machine learning, neural networks and computer vision. He has published over 60 scientific papers in peer-reviewed journals and conference proceedings.
Sessions
Highway location markers are the modern-day equivalent of what are historically known as milestones and are typically small signs on the side of the road informing the driver of the kilometer count from the start of the road (in some countries the markers also carry the road ID).
In Spain, where current study was carried out, there are four different classes of kilometric milestones as they call them, several of them also having color variations corresponding to different road classes – motorway, state road and regional roads of three levels. Besides, roads belonging to the European itinerary are complemented with a green plate carrying its European road number (only motorways, state roads and 1st level regional roads can belong to European itinerary).
The goal of current study was to identify and localize the kilometric milestones from the panoramic images collected by the mobile mapping systems tracking Spanish roads.
To achieve that, two convolutional neural networks aside from additional algorithms were employed. As a first step. traffic signs were located in the images using a YOLOv5 object detection network [1], which yields bounding boxes of detected traffic signs. This detector has evolved through several iterations of development at EyeVi with the latest version trained on over 39 thousand annotated images.
In the next step, the kilometric milestone images from the bounding boxes are to be extracted, resized to a standard size of 224224 pixels and presented to the classification network of ResNet50 type [2] to determine the type of the kilometric milestone. The classification network was trained specifically for the project and the training data was automatically annotated using the image embeddings. It is possible to query top k embeddings of a particular manually selected image patch, it is also possible to query for all similar images by defining a (cosine) similarity threshold. The embeddings of image patches were computed using CLIP [3] as the encoder.
However, as the mobile mapping was in reality carried out only on motorways and state roads, all existing kilometric milestone classes as well as not all variations were not available for the training data - the number of kilometric milestone classes was thus reduced to 3. Nevertheless, the number of panoramic images the kilometric milestone training samples were collected from was over 500 thousand and the resulting total number of collected kilometric milestone images was above 1800. The obtained F1-scores for the classification of three kilometric milestone classes on test data were 97.1, 98.7 and 98.2%, respectively.
Determining the geographical locations of detected kilometric milestones from panoramic images is a rather challenging task because one traffic sign can be found from a number of consecutive panoramic images (which are shot after every 3 meters). Complementary algorithms of tracking and localization were used for that purpose. The goal of the tracking algorithm is to determine which bounding boxes in the stream (consecutive images) represent a single sign in the physical world. The idea behind the tracking algorithm is to create bounding box vectors from the positions of where the image was taken towards the bounding box on the observed image (placed one meter ahead of the image shooting position). Given a pair of bounding box vectors, a number of properties are calculated for the pair, such as minimum separation of vector lines, angle between the two vectors and the convergence point of two vectors. This is followed by triplet analysis in order to find strong triplets whose convergence points are reasonably close to each other and whose vectors are not parallel. Strong triplets serve as the seeds of tracks. The algorithm goes through a number of additional steps in which the single bounding box vectors which are not part of any existing track are merged with those if they and individual tracks are sufficiently close to each other.
From there, the localization is pretty straightforward and is first performed pairwise by computing single localization results for each bounding box pair in the track. Then the final location for the track is found by an fair advanced weighted averaging method
In summary, the four described components of the pipeline are able to detect and localize the kilometric milestones met on the road thus place them on the map with sufficient accuracy.
References
[1] Solavetz, J. (2020). YOLOv5 New Version - Improvements And Evaluation. https://blog.roboflow.com/yolov5-improvements-and-evaluation/. Accessed 21.02.2022.
[2] He, K. et al. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-78).
[3] Radford, A. et al. (2021). Learning transferable visual models from natural language supervision. In Proceedings of the International conference on machine learning (pp. 8748-8763).