Microsoft road detections

Continued from TIGER data quality:

The blog post announcing the detections came out last week but vanished for a while until today:

For what it’s worth, early reviews on OSMUS Slack seem to indicate a definition of “roads” that largely consists of driveways, alleys, and parking aisles in built-up areas – stuff that TIGER has plenty of in some counties but none of in other counties. This might not make for quite as exciting a headline, but it could still be a useful aid to finding things to map, especially in rapidly growing suburbs or areas outside the U.S. where mappers don’t have access to something like the annual TIGER roads overlay to work off of.

4 Likes

For what it’s worth (quite possibly “not a huge amount” but also quite certainly “not nothing”), the “annual TIGER roads overlay” (data) to work off of (to correct TIGER data, another — rather interesting and fruitful — thread at https://community.openstreetmap.org/t/tiger-data-quality) should be a very important suggested data component for “cleaning up existing TIGER data” (in the USA).

I’m not saying to “slavishly” tie one’s efforts of TIGER cleanup directly to “the latest” (updated TIGER overlay), as it is only one strategy among many for doing so. But it is or should be one very important consideration of part of a whole strategy one might take when approaching “TIGER cleanup” in any given area. Along with other strategies.

Minh seems to suggest that there are “areas outside the U.S. where mappers don’t have access to something like the annual TIGER roads overlay to work off of.” And, yes, this makes sense, because TIGER data are “US only.” So, there wouldn’t be TIGER roads outside of the USA to “clean up.” I would think that 100% of the places where there are “bad” (yet to clean-up TIGER data, or what is in some places called “unreviewed” or “less-reviewed” TIGER data), there are also annual TIGER roads overlay. I mean, 100%, and they’ll all be in the USA. Right? Am I missing / misunderstanding something here?

The Microsoft road detections are available globally, not just in the U.S. where we can supplement them with the TIGER overlay as an additional signal.

2 Likes

I’ve failed to download Europe and the website does not seem to allow restartable downloads (at least with curl).

I did download South Asia and to my surprise it didn’t contain Pakistan, historically one of the least mapped countries anywhere on OSM, but the one I’m best equipped to evaluate the usefulness or otherwise of this data.

Be warned the data is tab-separated with an ISO 3-letter country code in the first field and the geojson in the second. This does mean that individual countries can be examined IIF one is successful downloading and the relevant country is in the file!

I succeeded to download Canada « All ML Derived Roads » file (no Missing file).

Any tool to read such data or de we have to convert to import in QGIS or JOSM ?

format :
CAN {“type”:“Feature”,“geometry”:{“type”:“LineString”,“coordinates”:[[-104.515600204468,52.0099157988393],[-104.5155143
73779,52.0099025911018]]},“properties”:{}}
CAN {“type”:“Feature”,“geometry”:{“type”:“LineString”,“coordinates”:[[-113.556071519852,50.7644222238046],[-113.5560286
04507,50.7642593571165],[-113.555899858475,50.7639607667158]]},“properties”:{}}

I used ogr2ogr to import the GeoJSON sequence file into PostGIS: Download was bumpy, needed a couple of retries.

unzip -c Europe-Full.zip | grep -e "^DEU" | cut -c5- > deu.geojson
ogr2ogr  -f postgresql PG:"host=localhost port=5432 dbname=openstreetmap schemas=public" deu.geojson -nln deu

MS Road detection in black on OSM Standard map background in QGIS:

6 Likes

On Linux, I would, of course, have used ... | awk '/^DEU/ {print $2}' | ... :grinning:

In practice I was on windows & loaded the whole thing into PostgreSQL with a COPY statement including the PROGRAM option. SQL is then used to filter & convert the geojson linestrings to something usable.

Lots of the random TIGER driveways squiggles and similar were deleted over the years so that is easily believable.

@mmd the 2nd screenshot seems to indicate that the vast majority of line segments in the MS detections dataset correspond to existing OSM road data, is that correct?

I think that depends a bit on the area. In this case it was a pretty well mapped city (OpenStreetMap) where I didn’t expect any new roads to show up in MS road detections. I wanted to find out how much of an existing network could be detected.

I would sum up my findings as:

  • highway=residential ways are quite ok. There were some false positives a human reviewer could spot.
  • pedestrian areas: average results, frequently not detected correctly, or not at all
  • highway=motorway had quite some artifacts, going back and forth between two ways and single way, (larger) motorway junctions look like some random doodling
  • Most ways in forests seem to be missing for obvious reasons

To answer your question, I don’t see how the data corresponds directly to OSM road data. I haven’t read all the papers mentioned on their website, but I suppose they’ve at least used existing OSM data to train their model.

You have to keep in mind that MS road detections include geometry data only. The published data comes with exactly zero additional tags to indicate the type of road, or any other road attributes.

3 Likes

Thanks for sharing your findings, @mmd. The statement on the Github read-me:

We have detected 47.8M km of all roads and 1165K km of roads missing from OSM

led me to believe that it would be straightforward to filter out the features that they detected and are not in OSM yet. That would make the data perhaps more valuable to OSM.

2 Likes

Curiously, I had the same expectation. As it turned out, the files included every geometry they detected, not only the ones missing from OSM. That’s when I started comparing MS road detections with existing OSM ways. It would be interesting to take a closer look at the missing ones. In well mapped areas, most of them could be false positives.

That’s true of some of the datasets (Europe & South Asia for sure), but not all of them, where the ‘missing roads’ are available as a separate dataset.

Would it be a big effort to extract the Road Detections for individual countries to create a MapRoulette Challenge per country? What would be the best way to proceed?

A MapRoulette challenge could be useful / interesting, but I think we’d be best off doing more investigating into the quality of the data before we do so. If we launch challenges with lots of low quality tasks, mappers will likely abandon them anyway.

4 Likes

Where OSM is pretty well-mapped, a comparison of the “present ot not present” differences between MS’ detected roads and what’s in OSM already would likely just prompt questions about whatever MS have detected. I suspect that that’s why there isn’t a separate “missing roads” download for e.g. Europe.

However, even when there are roads in both, it might be useful to compare on what alignment MS have detected roads. For example, in the UK some of the service roads added to OSM were added by paid mappers using detections based on offset imagery. They’re topologically correct (and obviously better than what was there before, which was nothing) but a layer of “MS detections” might be a useful prompt to a local mapper cleaning up these problems. It might also help to feed back if MS’ detections are themselves based on offset imagery.

4 Likes

There is only a Canada-Full file where I extracted data for north of Québec. I can speak for vast forestry areas north of Québec where a majority of roads visible from imagery are temporary access roads for forestry activities with infrastructures not maintained in the long term. There are many MS road detections and we have to assume that many of these are false positives with roads not maintained for years. And I suppose that this is the same reality for many forestry areas around the world.

MS Road detection in black on OSM Standard map background in JOSM

1 Like

Any benefit over the MapWithAI road detections Meta/Maxar; or is Microsoft just trying to build a better mousetrap? Or is this the same dataset?

https://mapwith.ai/