Can I bulk add data for NYC? (50,000+ businesses)

Hi Everyone,

Is there a way I can bulk add metadata for 50,000+ NYC businesses and not be flagged as spam?

I’m new to this forum and new to OSM but I’m trying to create an app using OSM data and it needs many businesses to have updated information. That led me to this idea but I wanted to know what the community thinks first, if it’s even possible, and the best way to do it.

The general approach would be this:

  1. Send postcards to every business in NYC (I’ll try to crowd source the cost or cover it myself)
  2. Postcard has simple instructions to use a URL and unique business identifier code with a basic form to input the business’s data (name, address, phone, hours of operation, type of business, subtypes, description, etc. I’d want to check with the community here first on appropriate data and labeling.)
  3. Collect all that data in a separate, independent database outside of OSM
  4. Occasionally push large amounts of that data to OSM.
  5. Repeat steps 1-4 in perpetuity for new businesses that open in NYC

I can do all of the above on my own except for the data injection into OSM. I don’t know how to do that and I also don’t want to be flagged as spam. I’d prefer to work with the community, or someone with authority for something of this scale (if OSM is structured that way) to find an efficient way to do that.

Thank you for listening.

-LS

Import/Guidelines - OpenStreetMap Wiki would be likely relevant

2 Likes

A quick query of the OSM database with:

{{geocodeArea:New York City}}->.searchArea;
nwr[shop](area.searchArea);
out geom;

shows there are 25,322 shops in NYC already mapped. It is suspect that there are 50,000 more, and that’s just using “shop” and not any other types of business.

1 Like

For adding they can use

3 Likes

Thank you, will read.

I pulled data from google a few months ago, limited business types (restaurants, a few others) and it was already 27,000 and there were a lot of business types I left out. So I made an educated guess it would be 50,000+, just for illustrative purposes. I also left out some of the boroughs, I’d be expanding my region a bit.

Unfortunately I can’t use the google data for my purposes due to licensing.

Thanks for the screenshot, good tip on usage.

1 Like

Thank you, will check this out.

That’s OK because Google has become unreliable in so many ways, and it’s getting worse.

The other thing is, why would you make a separate database, then try to import that into OSM? OSM is a database, so just use it directly.

1 Like

this part is not suspect to me at all

even well mapped areas typically miss very large part of POIs, with only fraction actually mapped

Has any count or estimate ever been made of the number of businesses in a large city anywhere in the world? That would be cool. Let’s limit it to shops.

I think you want to look into seeing if you can make sure they write down what they send back in something readable by osm servers. If all the shops take 1 minute to add, I can tell you without using math that is going to take a ton of time. Also merging the adress data with currently existing adresses should preferably involve something to keep the nodes history (:

I guess today is the day I learned about Enshittification.

The other thing is, why would you make a separate database, then try to import that into OSM? OSM is a database, so just use it directly.

The reason is to reduce friction for businesses responding with their data. I’d be a middle man with my temporary db, so on their end it is minimal effort.

Yes, you’d do for them what they don’t know how to do for themselves and don’t want to learn how to do. They want to concentrate on running their business and not becoming an OSM mapper. It’s a good idea that can help OSM, and it’s worth charging a fee to do. I don’t know how much that fee could be, but supply and demand will sort that out. Already OSM is better than other maps for certain things in certain areas, and your idea could help it be even better if done well. Tell them that! Also show them how Google Maps is now almost completely enshittified and shows you not what you are looking for but what someone paid you to see. Finally they came clean last November with this:

Maybe the way to approach it is to just collect data, then daily or weekly, upload it to OSM. I do this with EV charging stations, but I just use a text file for each station. What’s in the text file? The OSM tags, in text form, and lots of notes. Even with the in-browser iD editor, you can copy/paste the text tags, and that’s quick.

Often I am waiting on a station to be constructed, in which case I cannot map it on OSM because there’s nothing yet on the ground (one of OSM’s most basic rules is “Map what’s on the ground”, and that means right now, not what was there yesterday and not what might be there next week). After construction is started and we get photo confirmation, I can map it with construction:amenity=charging_station. Only addr:housenumber=* shows up in the default OSM renderer. Then we wait for the construction to be done and the station taken live. At that point, from the charging company website I’ll get the final station name, branch, address, and some other details. From street-level photos I’ll get the number of ADA accessible stalls. At that point I remove construction: to leave amenity=charging_station, and then the EV charging icon EV_charging_icon-OSM_Carto_t and station name will get rendered. Here’s an example in metro Denver, CO.

I don’t use a database but just use the file system on my computer, with a few levels of nested folders to contain the text file and any related photos (and sometimes drone videos). Would there be a better way? Maybe. I could use a database on my computer (or a server somewhere), but then I create a database problem of my own. You’d have the same problem. If you used a database and it could export a set of text tags ready to be copy-pasted into OSM, you’d have no data conversion import issues, but if you try to connect up the two databases, then your life gets complicated and that’s why you’re here.

Sometimes a simple approach, or a simplified approach that’s not 100% automatic, works well. The danger of 100% automation is that with 1 click you accidentally insert 1000 POIs into OSM that have bad or incomplete data. :grimacing: Then you (or someone else) has to fix it. This happened last summer for US Tesla Supercharger stations, though it was an intentional bulk import that was severely half baked. Almost a year later we are still fixing the bad data because after being criticized for the bogus bits, the importer refused to fix all the issues, packed up his toys, and went home.

How would you distinguish between:

  • a new business replacing an old one in the same place
  • a new business right next to an old one
  • a business owner wanting to replace the entry of a competitor?
5 Likes

Thank you for this, a lot of good points and ideas.

1 Like

Yep, it’s a problem. At least in the case of NYC names of businesses can be connected to addresses through the nyc.gov database for new businesses and in the case of restaurants at least, passing health inspections. So connecting a date to a business entry would give at least some type of metric for how stale it might be and if anyone is registering the same address. Unfortunately, multiple businesses do share identical addresses which is a problem I had with the google data stomping on each other when address was treated as singular.

Overall I would REALLY REALLY strongly encourage to start smaller. Do test run with 10 postcards and enter them manually, with verification by survey to look for how it is going.

Do not start with 50 000 of them and full automation. “Occasionally push large amounts of that data to OSM.” step has massive amount of traps and issues

5 Likes
  1. The address will be the same, so you leave that and update the business name, hours, phone, website, etc. This is a never-ending task that keeps us busy. Check this restaurant in metro Denver: it was Elephant Bar, then Bar Louie, and now it’s vacant and waiting for the next occupant. The address remains the same and I’ve added 2 different dated old_name tags so far.

  2. For businesses in the same building, sometimes the housenumber of their street address will be different. For others they will have the same househumber but a unique unit number. Check this shop in Golden, CO that is in a building with several other shops. The housenumber and street are the same for all of them (17121 South Golden Road) but each has a unique unit number, so in this case the full address is 17121 South Golden Road Unit 140. These change all the time and keep us busy. :grinning_face:

  3. Something like that actually happened some years ago with a client who owned an art gallery in metro Denver. Someone tried to hijack the pin on Google Maps and they had to prove to Google that they were the real owner. I don’t know how this works on OSM, because anyone in the world can come in and change the info on your business (or almost anything, for that matter). Then, you change it back and it becomes an edit war. Maybe there needs to be a locking mechanism so you can claim your business, but then who’s in charge of the locking?

    I’ve actually been looking into the best way to automatically monitor changes to certain OSM objects that I’m interested in. Maybe have an email sent to me, or a text message if it’s really important. If I have a business mapped on OSM, I’d want to know immediately if the info got changed without my knowledge. It could be a bad actor or, these days maybe more likely, a half-baked mechanical edit or bot that accidentally screwed up a bunch of info. Someone could get AI to do a huge amount of damage in a very short time. I don’t know what plans OSM has to defend against stuff like this.

That is the understatement of the year! :smiling_face_with_sunglasses:

bad account would be banned by Data Working Group - OpenStreetMap Wiki

if vandalism would persist people would watch affected POIs - manually or via watchlist

(see also MTB trail vandals, “I want to delete my house/pool from map” or “I want to hide existence of this path for reasons”)

various rate limits were introduced, so “create bunch of new accounts and mass-vandalize map” is much harder to do

do automated editing API queries/Overpass/Postpass queries? And send message via email or something else once change is spotted?

generate map of relevant area and compare before/after images?

1 Like