This is pretty huge!
link to data:
This is pretty huge!
link to data:
I donât see that apache 2.0 is compatible with odbc.
Seems to suggest it is compatible, but iâm no lawyer. Hereâs hoping
itâs definitely an odd licence to choose, though.
Another Database filled with very (!) varying quality
Similar to Overture this does not contain opening hours, you have to pay to get access to those fields
Between this release and Overture places, it must be an interesting time to be in the business of selling basic POI data on the basis of big splashy numbers. Not sure how big of a deal itâll be, but the related Placemaker program says itâll gather a mix of the objective facts we care about as well as more subjective attributes.
Iâve had a look at a bit of the data in country and it seems a -lot- better than the Linux Foundation Overture Maps data, but then that isnât exactly hard.
Differences seems to be mainly a lot of offices that we donât have, but obviously that will depend on where you look.
License: using Apache 2.0 as a data license is more than weird and leaves many many questions open (the text from the EU site is machine generated nonsense).
Seems as if either the quality varies hilariously widely or I was just very lucky, see âȘïžHarel Dan on LinkedIn: #opensource | 18 comments
Yes, Foursquare and Overture places are like many geolocation-centric datasets: users arenât supposed to ever see the raw data, either in a list or on the map. You have to filter by a confidence score. Otherwise, youâll get tons of user-generated junk â pranks, mistakes, etc. Itâs probably even messier than Overture in some places, because Foursquare was essentially a mobile location game for many years.
This kind of data is more readily consumable in a geocoder than on a rendered map. In the past, Foursquare would charge big bucks for the confidence scores as an upsell. If these scores arenât part of the dataset, then no wonder the company feels comfortable releasing the data.
Hereâs a marketing idea for OSM: publish a dataset of all the POIs that have ever been added to OSM in any form, including deleted ones, and tout the relatively huge total number of them with an asterisk that refers to an OSM confidence score and the URL to a forum thread where we bikeshed about what that confidence metric should be.
Many of my schoolmates were into Foursquare back in the day (specifically, the City Guide app that theyâre discontinuing on December 15). They became âmayorsâ of various places around town by checking in frequently. It doesnât take very much of value to juice youth engagement with an app. Similar to the gymgoers of PokĂ©mon Go, there were some pranksters who created fake venues near them specifically so they could be mayors.
Anyhow, hereâs a viewer you can use to get a sense of what the data is like if you donât purchase Foursquareâs confidence scores:
Happily, my favorite tells of Foursquare-sourced data are still around to âamplify the Digital Echo Chamber effectâ, as they put it.
⊠and hereâs my traditional link to âbusinesses unfeasibly allegedly located inside York Minsterâ
So gave my surroundings another look thanks to Olivers map and my initial impression is kind of confirmed, less outright inventions than the Linux Foundation, more offices than OSM.
The offices have the same issue that office POI data from google has (typically from trade registry data), a lot of it is just peoples single proprietor companies home address with very very very limited usefulness outside of propping up the numbers.
What is disappointing is that the date_refreshed doesnât seem to indicate anything QA related, there are two POIs nearby (a bank and a post office) that have been gone for more than 5 years (so did actually exist at one point in time) that have a date_refreshed from this year.
I kind of have the suspicion that given 4sq was never popular here (contrary to Facebook) the motivation to add outright nonsense was less and that is the reason that Iâm getting a more positive impression than others.
PS: just in case people have forgotten this isnât the first open data from 4sq, GitHub - foursquare/quattroshapes has been around for a long time, and at least its âlocalityâ data has always been quite amusing.
Someone made a blog post looking into Foursquare dataset, currently being discussed on Hacker News
I see there are even POIs where date_closed
indicates correctly that they closed many years in the past, but date_refreshed
is in 2024 (although closed POIs without any indication they are closed seem far more common).
They define date_refreshed as âThe date the POI last had any single reference refreshed from crawl, Listing Syndicators, users or human validationâ, which I guess could mean almost anything really.
I have not yet had time to do a systematic, quantitative analysis - but when I look at Rapperswil (SG) [1] nearby for example, I see so many POIs that are literally miles away or just rubbish, that I can confirm what many people like Simon have already said: This data seems to be of very questionable quality at least in European regions.
For those interested, Oliver Wipfli added an example of how to directly extract a region as GeoJSON using Duckdb under Linux [2] .
[1] Foursquare OS Places PMTiles
[2] GitHub - wipfli/foursquare-os-places-pmtiles: All 100M+ open source places of Foursquare in a single PMTiles file
Iâm curious, do you think trying to cross reference this Foursquare dataset with Overture would create a more accurate dataset, or do you think both are subject to the same systemic bias that would defeat the purpose of using both?
I took a quick look at the FSQ data of the couple of hundreds of McDonaldâs and Burger King restaurants in Switzerland. I applied a filter like this (the âdate_refreshedâ attribute is unusable because it contains recent dates like 2024, even though itâs not checkedâŠ) and compared them with the OSM POI:
"date_closed" IS NULL AND "fsq_category_labels" NOT LIKE '[Event%' AND "fsq_category_labels" NOT LIKE '[Arts%'
.
At least ~30% of the FSQ POIs were unusable. Mostly because they were hundreds of metres - even kilometres(!) - off, and that makes any automated matching almost impossible (please correct me if someone has a matching algo that is smart enough).
It may be that some of the nearly 12,000(!) FSQ categories are of better quality. But without the âconfidenceâ attribute - which is withheld in the free distribution - the FSQ data seems to be practically unusable for many main categories, according to my and other assessments - at least unusable for OSM IMHO.
if you want to check the map but only see current things (so exclude all items that are âclosedâ):
open devtools in your browser by pressing F12 and add this text to the console and press enter
map.setFilter(âfoursquareâ, [â!â, [âhasâ, âdate_closedâ]])
and do it again with
map.setFilter(âfoursquare-10Mâ, [â!â, [âhasâ, âdate_closedâ]])
Looked a bit around my hometown (in the Netherlands) and most points are very out of date (and missing date_closed), not in the right location or not in the dataset to begin withâŠ
I had a look around in the areas Iâm familiar with and the data there is⊠interessting to say the least.
POIs and even a bridge that are not even within the correct ZIP code. Name of the operator instead of the POIs name. Places that closed 10+ years ago. Many MANY POIs that were off by one building, usually on the wrong side of the street. Streetnames that are obviously from Monopoly. Temporary stuff like ships that had been moored at that place at that point in time. Various areas and stages from the Wacken Open Air festival.
But I also found one buisness that is still missing OSM and within the correct building in Foursquare. I only know this because I know the guy running it. There is nothing signed outside as far as I know, making it unverifieable even if you sand right in front of it.