Foursquare releases 100M+ POI dataset under Apache 2.0

This is pretty huge!

link to data:

4 Likes

I don’t see that apache 2.0 is compatible with odbc.

Seems to suggest it is compatible, but i’m no lawyer. Here’s hoping :crossed_fingers:
it’s definitely an odd licence to choose, though.

3 Likes

Another Database filled with very (!) varying quality

Similar to Overture this does not contain opening hours, you have to pay to get access to those fields

Between this release and Overture places, it must be an interesting time to be in the business of selling basic POI data on the basis of big splashy numbers. Not sure how big of a deal it’ll be, but the related Placemaker program says it’ll gather a mix of the objective facts we care about as well as more subjective attributes.

2 Likes

I’ve had a look at a bit of the data in country and it seems a -lot- better than the Linux Foundation Overture Maps data, but then that isn’t exactly hard.

Differences seems to be mainly a lot of offices that we don’t have, but obviously that will depend on where you look.

License: using Apache 2.0 as a data license is more than weird and leaves many many questions open (the text from the EU site is machine generated nonsense).

5 Likes

Seems as if either the quality varies hilariously widely or I was just very lucky, see â–ȘHarel Dan on LinkedIn: #opensource | 18 comments

3 Likes

Yes, Foursquare and Overture places are like many geolocation-centric datasets: users aren’t supposed to ever see the raw data, either in a list or on the map. You have to filter by a confidence score. Otherwise, you’ll get tons of user-generated junk – pranks, mistakes, etc. It’s probably even messier than Overture in some places, because Foursquare was essentially a mobile location game for many years.

This kind of data is more readily consumable in a geocoder than on a rendered map. In the past, Foursquare would charge big bucks for the confidence scores as an upsell. If these scores aren’t part of the dataset, then no wonder the company feels comfortable releasing the data.

Here’s a marketing idea for OSM: publish a dataset of all the POIs that have ever been added to OSM in any form, including deleted ones, and tout the relatively huge total number of them with an asterisk that refers to an OSM confidence score and the URL to a forum thread where we bikeshed about what that confidence metric should be. :nerd_face:

11 Likes

Many of my schoolmates were into Foursquare back in the day (specifically, the City Guide app that they’re discontinuing on December 15). They became “mayors” of various places around town by checking in frequently. It doesn’t take very much of value to juice youth engagement with an app. Similar to the gymgoers of PokĂ©mon Go, there were some pranksters who created fake venues near them specifically so they could be mayors.

Anyhow, here’s a viewer you can use to get a sense of what the data is like if you don’t purchase Foursquare’s confidence scores:

Happily, my favorite tells of Foursquare-sourced data are still around to “amplify the Digital Echo Chamber effect”, as they put it.

2 Likes


 and here’s my traditional link to “businesses unfeasibly allegedly located inside York Minster” :slight_smile:

2 Likes

So gave my surroundings another look thanks to Olivers map and my initial impression is kind of confirmed, less outright inventions than the Linux Foundation, more offices than OSM.

The offices have the same issue that office POI data from google has (typically from trade registry data), a lot of it is just peoples single proprietor companies home address with very very very limited usefulness outside of propping up the numbers.

What is disappointing is that the date_refreshed doesn’t seem to indicate anything QA related, there are two POIs nearby (a bank and a post office) that have been gone for more than 5 years (so did actually exist at one point in time) that have a date_refreshed from this year.

I kind of have the suspicion that given 4sq was never popular here (contrary to Facebook) the motivation to add outright nonsense was less and that is the reason that I’m getting a more positive impression than others.

PS: just in case people have forgotten this isn’t the first open data from 4sq, GitHub - foursquare/quattroshapes has been around for a long time, and at least its “locality” data has always been quite amusing.

Someone made a blog post looking into Foursquare dataset, currently being discussed on Hacker News

I see there are even POIs where date_closed indicates correctly that they closed many years in the past, but date_refreshed is in 2024 (although closed POIs without any indication they are closed seem far more common).

They define date_refreshed as “The date the POI last had any single reference refreshed from crawl, Listing Syndicators, users or human validation”, which I guess could mean almost anything really.

1 Like

I have not yet had time to do a systematic, quantitative analysis - but when I look at Rapperswil (SG) [1] nearby for example, I see so many POIs that are literally miles away or just rubbish, that I can confirm what many people like Simon have already said: This data seems to be of very questionable quality at least in European regions.
For those interested, Oliver Wipfli added an example of how to directly extract a region as GeoJSON using Duckdb under Linux [2] .

[1] Foursquare OS Places PMTiles
[2] GitHub - wipfli/foursquare-os-places-pmtiles: All 100M+ open source places of Foursquare in a single PMTiles file

I’m curious, do you think trying to cross reference this Foursquare dataset with Overture would create a more accurate dataset, or do you think both are subject to the same systemic bias that would defeat the purpose of using both?

I took a quick look at the FSQ data of the couple of hundreds of McDonald’s and Burger King restaurants in Switzerland. I applied a filter like this (the ‘date_refreshed’ attribute is unusable because it contains recent dates like 2024, even though it’s not checked
) and compared them with the OSM POI:

"date_closed" IS NULL AND "fsq_category_labels" NOT LIKE '[Event%' AND "fsq_category_labels" NOT LIKE '[Arts%' .

At least ~30% of the FSQ POIs were unusable. Mostly because they were hundreds of metres - even kilometres(!) - off, and that makes any automated matching almost impossible (please correct me if someone has a matching algo that is smart enough).

It may be that some of the nearly 12,000(!) FSQ categories are of better quality. But without the ‘confidence’ attribute - which is withheld in the free distribution - the FSQ data seems to be practically unusable for many main categories, according to my and other assessments - at least unusable for OSM IMHO.

1 Like
  • small tip: you can run a filter on both layers to filter them out.

if you want to check the map but only see current things (so exclude all items that are ‘closed’):
open devtools in your browser by pressing F12 and add this text to the console and press enter

map.setFilter(‘foursquare’, [‘!’, [‘has’, ‘date_closed’]])

and do it again with

map.setFilter(‘foursquare-10M’, [‘!’, [‘has’, ‘date_closed’]])

1 Like

Looked a bit around my hometown (in the Netherlands) and most points are very out of date (and missing date_closed), not in the right location or not in the dataset to begin with


I had a look around in the areas I’m familiar with and the data there is
 interessting to say the least.

POIs and even a bridge that are not even within the correct ZIP code. Name of the operator instead of the POIs name. Places that closed 10+ years ago. Many MANY POIs that were off by one building, usually on the wrong side of the street. Streetnames that are obviously from Monopoly. Temporary stuff like ships that had been moored at that place at that point in time. Various areas and stages from the Wacken Open Air festival.

But I also found one buisness that is still missing OSM and within the correct building in Foursquare. I only know this because I know the guy running it. There is nothing signed outside as far as I know, making it unverifieable even if you sand right in front of it.