Overture Maps first dataset release

Hi, I’ve updated my visualization to include the confidence score on click, as well as a slider to filter the visualization by minimum confidence level:

https://bdon.github.io/overture-tiles/places.html

You can find the scripts to reproduce the visualization here:

14 Likes

They have Rocky Mountain National Park located in Loveland Colorado, about 20 miles from the nearest part of the actual park. They have Roosevelt National Park also located in Loveland. There is no Roosevelt National Park. There is a Theodore Roosevelt National Park in North Dakota, over 500 road miles away. There is also a Roosevelt National Forest, but it’s nearest part is about five miles away.

Thanks!

Upping the confidence revealed some really big bloopers that previously were buried in all the other noise. For example there’s a node with the name of our municipality with confidence 0.82 that is a) located in a neighbouring municipality, and b) classified as a mountain (both is nonsense).

I had a look at some POIs in rural Austria. Mostly businesses and restaurants, pretty good actually. The only landmark feature (confidence 0.96) was about 1 km off (so they haven’t copied from us :slightly_smiling_face:).

Similar picture in Vienna. Lots of shops, but every now and then a POI with confidence > 0.95 is off by a few 100 meters.

Quite a few duplicate landmarks, some of them wildly misplaced and strangely named. Don’t use for hiking! :skull_and_crossbones:

Nothing that couldn’t be fixed, I guess. But they certainly have work to do.

4 Likes

Note if you filter by confidence that all points with source = msft have the same confidence of 0.6. So if you filter for confidence > 0.6 you ignore all of those.

Plot for a bounding box around Germany:

Confidence of points with source = meta:

5 Likes

Hello! I also made a visualization using planetiler of all of the layers in the July Overture dump (except transportation connectors) if you want to explore:

https://msbarry.github.io/planetiler-overture-demo/

I have some more notes on how the data was transformed in the github repo.

9 Likes

It’s kind of interesting how they cite OpenStreetMap as the source there. Not good or bad, just interesting. Hopefully a less raw (is that even the right word?) version has or will have a link to the site. If not also the tag line about how data from OpenStreetMap comes from a global community of mappers or whatever it is. If you have to click on the POI to find the source and there’s no way to find out more about it from there that’s kind of muh though. Not to mention it also leaves out the licensing information, which seems important (or conversely maybe that’s just how Mike Barry decided to implement it :man_shrugging:).

The original source data looks like: [{"dataset": "USGS Lidar"}, {"dataset": "OpenStreetMap", "recordId": "w29663417@3"}] and I process it into a string using this code. The record ID appears to be conveyed as a string that looks like "{w,n,r}<id>@<version>". Licensing information is conveyed at the dataset level which I show in the bottom-right corner.

5 Likes

My first diary talks about what I thought about the places data OMF released, plus a small sample of lost beach resorts and whatnot in my suburb. Just thought I could share it here :slight_smile: UndueMarmot's Diary | There are beach resorts inside subdivisions in my area, Overture data claims | OpenStreetMap

6 Likes

Interesting analysis from @wille on the places data quality:

Key Findings:

  1. General Accuracy: In a sample analysis of 308 places in Salvador de Bahia, Brazil, 63% of the places were found to be correct, irrespective of their confidence score.
  2. Confidence Score Correlation: The accuracy of the dataset increases with the confidence score. For items with a confidence score of 0.6 or higher, the accuracy rate was 81.2%. It further increased to 95% for places with a confidence score of 0.9 or higher.
  3. Category Issues: The categorization of places needs improvement. For example, similar categories like ‘cafe’ and ‘cafeteria’ or ‘psychologist’ and ‘psychotherapist’ exist, making it confusing.
  4. Data Quality: The main issues in the dataset are inaccurate locations and outdated places. For correct places, “small deviation” in location was the most common issue, while for incorrect places, being “outdated” was the primary issue.

The article concludes that the Overture Places dataset can be highly useful when filtered by a confidence score of 0.6 or higher, although improvements in categorization and data quality are needed.

7 Likes

I also had some quick look at the data by looking at a random small area at a place I know very well in my hometown:

The result was, at >= 0.6 confidence

image

And at >= 0.9 confidence

image

Missing in relation to the POIs that exist in the sample on OSM and that I know are correct.

10 Likes

If I understood well @westnordost, on this sample area Overture lacks POIs that can be found in OSM?

1 Like

An interesting datapoint, but I wonder how representative it would be of the overall dataset. Reasons being:

  • Very small sample size (26 POIs?).
  • Done in a location (Germany) that, I assume, has much better OSM coverage than most of the world.
2 Likes

Having seen various similar comparisons from other people about their places, this first POIs dataset release feels like “it’s something but it’s not that great”.
For my area, about half of the POIs exist currently. The other half contains POIs that either stopped existing years ago or even never existed (imaginary names). And it felt like an import of online store catalogs which already contained incorrect info, which I have seen unfortunately been imported in other maps.
And that’s what I’m afraid about this dataset. People coming to OSM to import whatever doesn’t exist from the dataset.

3 Likes

Hasn’t that ship already sailed? Aka Facebook implicitly telling people to import the junk with Rapid?

1 Like

Can, and is it a good idea, for OSMF to become some sort of observer of the foundation behind Overture Maps?

Details from TomTom on the transportation layer:

Main gist: They need the linear referencing system to be able to reliably conflate sources

“To make the transportation layer we take open data, which is mostly OpenStreetMap (OSM) at this point, and manipulate it to meet Overture’s requirements,” Clarysse explains.

“We quality check the OSM data to make sure it’s consistent according to Overture’s specification, we segment the road network using a consistent logic for the whole world, and then remap all the relevant road attributes that we know about from the open data to the new map,” he adds.

It’s easy to see how quickly this could get out of hand, when there’s a whole world of people and organizations adding unique data to a map.
“At some point, it just doesn’t scale well anymore. If everyone using the map adds things on top that require their own sectioning or segmenting, then it will not scale, and everyone would be fighting over the splitting, and how the roads are segmented,” Clarysse says.

“We have started with roads, but as we take data from open data sources, we are working to add pedestrian walkways in the upcoming releases. This will further increase the Overture transportation layer’s completeness, which is a big focus for us going forward.”

2 Likes

They’ll start improving the Places dataset in 2024, with feedback mechanisms

  • Growth in feedback as we begin to use signals and community feedback to update, correct, and refresh our data, starting with Places Signals.

https://overturemaps.org/looking-forward-to-2024/

And they’re starting to consider other sources:

I would hope that all now realize that the cooperation and “not a OSM competitor” statements were worth exactly what every knowledgable of US-corporate speak person said they were: nothing.

6 Likes

As I thought, we’re being reeled in.