Road surface tags add valuable information to OpenStreetMap (OSM). Surface tags – such as paved and asphalt, unpaved and dirt – allow map users to select routes that stick to the tarmac or explore more adventurous, unpaved ways. For 14 years, hundreds of editors have added surface tags to over 287,000 roads and paths in Victoria, Australia. Their diligent work allows us to create maps that show all of the sealed, non-residential roads across the state, as in Map 1 below.
However, that monumental undertaking prompts two important questions: how many of these surface tags are accurate and up-to-date? And how can we find inaccurate surface tags and update them? In this post, I dive down a rabbit hole, and compare road surfaces tags in OSM against satellite images and the Victorian government’s road dataset, to investigate the accuracy of road surface tags in regional Victoria.
Want the TL;DR version? Thanks to amazing work by heaps of enthusiastic mappers, OpenStreetMap’s road surface tags in Victoria are extremely accurate. There’s lots of words, tables and maps below, but that’s all they say.
Map 1. Sealed roads in Victoria based on road surface tags in OpenStreetMap. The map includes all roads tagged as highway=motorway, trunk, primary, secondary, tertiary and unclassified.
Updating Victorian roads
Except for the city of Melbourne, where many roads do not have surface tags, virtually all non-residential roads in regional Victoria – from motorways to unclassified roads – now have surface tags. (The other exception is highway=track. Most tracks are unsealed on the ground but don’t yet have a surface tag.)
In this project, I tried to find tags that needed an update by comparing OSM’s surface tags against the Victorian government’s road dataset. The government’s road data is excellent but the accuracy of it’s surface tags is unknown. So, rather than assume that they were accurate, I made two simple assumptions. I first assumed that the government and OSM datasets were created independently, and neither copied surface tags from the other.
I then assumed that roads with the same surface tag in both datasets were more likely to be tagged accurately than roads that had contrasting surface tags. If one dataset said that a particular road was sealed while the other said it is was unsealed then one dataset must be ‘wrong’ (or out-of-date), but we don’t know which dataset is ‘correct’.
I’ve added more detailed methods at the end of this post but, briefly, I used QGIS to find all roads that had contrasting surfaces tag in the two datasets and then checked the surface of all of these roads using available imagery (Bing, Esri, Maxar and Mapillary). I recorded the surface shown on the imagery and updated OSM surface tags as required. The tables and maps below show the results of these comparisons.
I used satellite and street-level imagery (from Mapillary) to determine the surface of each road. This has obvious drawbacks. (1) Many images are dated, and some roads will have been sealed after images were taken. (2) Sealed and unsealed surfaces are sometimes hard to tell apart. I recorded the image as ‘unclear’ whenever I wasn’t sure, but I undoubtedly made mistakes. (3) Some editors have updated road surfaces using on-ground observations that are more recent than available images. To ensure I didn’t reverse these changes, I checked the changeset history of every road and retained all tags that were based on recent, on-ground observations. (For brevity, I have not repeated this statement every time I refer to ‘updating OSM surface tags’ in the text below.) Two smaller limitations: (1) Many road ways in OSM are extremely short so I only examined OSM ways that were longer than 1 km. (2) Most roads in Melbourne don’t have a surface tag so I have not shown results from the Melbourne area in this post.
How did the datasets compare?
The good news is that OSM’s surface tags are highly consistent with those in the state government dataset. The analysis compared nearly 42,500 ways between the two datasets. Nearly all (96.8%) had the same surface tag in both. The discrepancies amounted to just 1,367 ways, with a total length of 3,311 km (Table 1). Most (83%) were on unclassified roads, which is not surprising as most major roads in Victoria are entirely sealed (as described earlier).
Table 1. The number and length of OpenStreetMap ways that were matched to roads in the government dataset and the number and percentage of these that had contrasting surface tags in the two datasets.
|OSM ways > 1 km long matched to gov data||42,481||151,742|
|Discrepancies: OSM ways with contrasting surface tags to gov data||1,367||3,311|
When I checked these 1,367 roads against satellite images, OSM surface tags were twice as likely to match the images than was the government data: 64% compared to 31% (Table 2).
Table 2. The number of times that OSM surface tags matched the surfaces seen in available images for all roads that had different surface tags in the government dataset and OSM.
|Outcome||Number||% Number||Length (km)||% Length|
|Surface on image matched existing OSM tag. OSM tag was retained.||877||64||2,000||60|
|Surface on image did not match OSM surface tag. OSM tag was updated.||427||31||1,149||35|
|Imagery unclear, no decision, OSM tag retained.||63||5||161||5|
Many ways that were tagged in OSM as being either sealed or unsealed were found to contain lengthy sections of both surfaces when examined on images (17%). I’ve called these ‘mixed’ in the next table. Nearly half (47%) of the roads that were tagged as unsealed in OSM (but sealed in the government data) were found to be partly or completely sealed on images. Presumably, many of these were sealed after the surface tag was last edited. By contrast, 10% of roads that were tagged as sealed in OSM (but unsealed in government data) were completely unsealed on the images; a further 12% contained long sections of unsealed roads.
Table 3. Comparison of the road surface seen on satellite imagery and Mapillary against the surface tag in OpenStreetMap.
|OSM surface tag||Number|
|% Sealed||% Mixed||% Unsealed|
In total, 427 surface tags needed to be updated after comparing the existing tags against available imagery (see Table 2). There was at least one update in nearly every local government area (LGA) in Victoria, as shown in Map 2. The greatest number of updates (42) was in the Greater City of Bendigo in central Victoria. By contrast, there were fewer than 5 updates in many LGAs of similar size.
Map 2. Number of roads in each local government area for which OSM surface tags were updated as a result of these comparisons. The area around Melbourne was not analysed.
One obvious reason why more surface tags were updated in some LGAs than in others is because some LGAs have far more roads than others. If the ‘error rate’ on surface tags was exactly the same in every LGA (for example, one update was needed for every 1,000 km of road in each LGA), then we’d expect to see more updates in LGAs that had a longer road network.
On average, about three changes were made to OSM surface tags for every 1,000 km of roads in each LGA. However, some LGAs had more updates than expected, even when the total length of roads was taken into account. In Bendigo, for example, there were 11 updates to OSM surface tags for every 1,000 km of road in the LGA (Map 3). My guess is that many factors contributed to these local quirks, including how long ago the surface tag was edited, who updated it, the quality of imagery that was available at the time, plus differing rates of road surfacing on the ground.
Map 3. The number of changes to OSM surface tags per 1,000 km of roads in each LGA.
Let’s pull these numbers together. In total, 96.8% of OSM ways (all > 1 km long) had the same surface tag in the government dataset and OSM (see Table 1). When the 3.2% of ways that had different surfaces in the two datasets were compared against images, OSM surface tags were consistent with images 64% of the time (see Table 2).
So, if we accept the underlying premise that roads with the same surface tag in both datasets are likely to have the correct surface tag in OSM, then these numbers suggest that OSM surface tags are likely to be correct (or, at least, consistent with the best available images) 98.8% of the time. We’ll test this premise below.
A final accuracy test
The accuracy estimate above is based on the untested assumption that roads with the same surface tag in both datasets are tagged accurately. This may not be the case. We can estimate the accuracy of OSM surface tags by taking a random sample of OSM ways and comparing their surface tags against available imagery. To do this, I downloaded a new dataset of OSM roads, from highway=motorway to unclassified. I restricted the analysis to ways > 100 m long, to avoid spending ages examining tiny ways. I then selected a random subset of 500 of these ways and inspected their surfaces on satellite imagery and Mapillary.
The results were highly consistent with the estimate shown above (Table 4). For four of the 500 ways, the imagery was unclear and I couldn’t tell whether the surface was sealed or not. For the others, 98% of surface tags were consistent with the surfaces seen on images. An additional 1.4% of ways were ‘partly correct’, as the images showed a mixture of sealed and unsealed surfaces within the way.
Table 4. Comparison of OpenStreetMap road surface tags against satellite and Mapillary imagery for a random subset of 500 roads in Victoria.
|Surface on image matched existing OSM tag.||486||98.0|
|Mixed. Surface on image partly matched OSM surface tag.||7||1.4|
|Surface on image did not match OSM surface tag.||3||0.6|
Only three OSM surface tags were completely inconsistent with the images (0.6%). In two of these, the data was probably just out-of-date and the roads were presumably sealed after the surface tags were last edited. Only one road with a sealed surface tag was totally unsealed on all available imagery; a dirt road just 500 m long that was as far from Melbourne as one could imagine: on the South Australian border in the far west of the state.
Why were the 10 cases where OSM surface tags did not match the images not detected in the earlier test, when OSM surface tags were compared against the government data? In seven cases, the entire way or the discrepancy was < 1 km long, so the roads were not included in the first analysis. In the other three cases, the two datasets were probably out-of-date, as surfaces were said to be unsealed but the images showed surfaces were now sealed. By chance perhaps, given the small sample size in this final test, no ways > 1 km long were found that were tagged as sealed in OSM and the government dataset but unsealed on the ground. That’s a good position to be in.
Map 4. All unsealed roads and tracks in Victoria based on data from OpenStreetMap. Residential roads are not shown. (This map differs from other maps in this post, and from all analyses, as it includes tracks that do not have a surface tag.)
This is an expanded version of the methods shown above, for those interested.
On September 3, 2021, I downloaded the Victorian government road dataset (VicMap Transport TR_ROAD ). OSM has a waiver to use this data. I then downloaded all Victorian roads in the following classes from OpenStreetMap: highway = motorway, trunk, primary, secondary, tertiary, unclassified and track. I imported and edited both datasets in QGIS.
For the OSM dataset, I pooled all officially accepted OSM tags for paved surfaces (paved, asphalt, concrete, etc) into a broad category called sealed, and all officially accepted OSM tags for unpaved surfaces (unpaved, dirt, earth, etc) into a category called unsealed.
For ease of management, I clipped both datasets into five regions based on local government boundaries in OSM and worked through each region one at a time. In each region, I separated the government data (hereafter called VicGov for brevity) into sealed and unsealed layers. All roads that were tagged as surface = unknown in the VicGov data were ignored.
I then placed a 20 m buffer around the sealed and unsealed VicGov datasets. To match VicGov roads and OSM ways, I clipped the OSM road dataset to the buffered VicGov road layers. From these clipped layers, I created a layer of roads with contrasting surface tags, which included all ways tagged as sealed in OSM but unsealed in the VicGov data, and vice versa. I then created a subset of ways that were > 1 km long to examine against satellite images.
Some caveats on road matching. The analysis only included roads/ways that had a surface tag in both VicGov and OSM. Any roads that lacked a surface tag in either dataset were not examined. A small number of parallel roads were incorrectly matched when OSM ways were clipped to the buffered VicGov data. I checked each way when I compared it against satellite images and discarded incorrectly matched roads.
I compared OSM surface tags for all of these roads against available imagery using JOSM. I used Bing, Esri and Maxar satellite images, plus Mapillary where available (which was rarely). I classed the surface shown on the most recent image into one of four categories: sealed, unsealed, mixed (if the imagery showed that the way being examined contained lengthy sections of both sealed and unsealed surfaces) or unclear. I also recorded whether the existing OSM surface tag was retained or updated as a result of the comparison. To ensure I did not over-ride recent on-ground observations, I examined the changeset history of each way and recorded the date that the surface tag was most recently edited. I then compiled the results to create the tables and maps shown above.
After the entire process was finished, I repeated these steps using a 50 m buffer to see if this would capture many roads that were not matched when a 20 m buffer was used. The 50 m buffer captured a small number of extra roads but mostly matched adjacent, parallel roads that should not have been paired. The results in this post are based on the results of the 20 m buffer process and I have not included the extra roads derived from the 50 m buffer (although I checked those ways against satellite images and updated OSM tags where required).
After the comparison of the VicGov and OSM datasets were completed, I downloaded a new set of OSM data on November 25, 2021, for the following highway types: motorway, trunk, primary, secondary, tertiary and unclassified, across all of Victoria. I did not include highway=track in this step as most tracks do not have a surface tag. I restricted this dataset to ways > 100 m long to exclude extremely short ways. I then created a random sub-sample of 500 of these ways in QGIS. Finally, I compared the surface tags on these 500 ways against available imagery, using the process described above. I classed a way as ‘mixed’ if images showed that more than 20% of the road surface differed from the existing OSM tag. It would have been good to include more than 500 ways in the random subset test but this step was s-o-o-o tedious that I gave up after 500.