Proposed import of All the Places data onto existing grocery store POIs in the US

Hi all,

Over the last week or two I have been working on prepping some All the Places data for import into OSM, following preliminary interest in the OSM US Slack #imports channel. I have limited the data to grocery store chains in the US only. This post is to formally solicit feedback and community approval before I proceed with importing the grocery store data only. Of particular note is that I do not plan to overwrite existing OSM key/value pairs except for the brand:wikidata tag. I also do not plan to add, remove, or move OSM objects based on this import alone, but will create separate MapRoulette challenges to resolve those issues as I identify them.

I have documented my process for the import in much greater detail in the OSM wiki:

Import/All the Places US data

All the GeoJSON files I plan to use are available in my Github repository in the /data folder here: atp-import. Github has a handy built-in GeoJSON viewer if you want to peruse quickly.

Much more information is available in the Wiki page but feel free to ask about anything I haven’t addressed. Two outstanding questions I’d appreciate feedback on, besides the usual:

  1. How large (geographically) are people comfortable with the changesets being, and are people comfortable with me uploading multiple brands within one changeset?
  2. How would people want me to notify the community as I move on to other commercial sectors (I have hotels planned next)? A new topic, or just new post in this thread?

In general I really like this idea! I actually have plans to do something like that in Poland.

Have you verified this data? Maybe by looking at mismatches between OSM data and ATP where both claim something?

When I started looking at ATP data in Poland I immediately started noticing mismatches - for example one of brand had opening_hours for all days except Sunday. If such import would be done in Poland, before ATP processing was fixed it would result in importing thousands of wrong opening_hours tags.

Have you done at least spot checking for each spider whether it has suspicious, garbled or mismatching data?

Have you run JOSM validator on data you plan to upload?

I worry about this part - from my experience putting such MapRoulette challenge will result in people editing blindly and many shop data is misplaced more or less heavily

" This method of information gathering was recently given an “it depends” green-light for use in OSM that appears to grant permission for use in at least the US, per this Licensing Working Group recommendation"

As I understand it is not limited to USA (unless I missed something?).

new topic, though most of text can be reused (the same for wiki page)

Great initiative!

How large (geographically) are people comfortable with the changesets being, and are people comfortable with me uploading multiple brands within one changeset?

Why would you want to make the changesets larger? Since the input files are per brand, why not upload per brand?

Another question: Why would the cleanup process reduce the number of locations? E.g. Kroger is listed with 6,796 locations from ATP and only 2,857 after cleaning. I didn‘t get that from the process description.

Some shops are not mapped in OSM so they will not be edited

1 Like

I think this is a good idea, thanks for putting this together. I have a few questions/comments though:

  • All of your examples in the wiki page already have a lot of information tagged, so it’s not obvious to me what information you’re planning to add to less well mapped nodes. Would you be able to provide at least one example of what you plan to do with a node that, say, just has amenity=supermarket + name=* or something similarly bare-bones?

  • Based on your table, in the Whole Foods and Trader Joe’s entries it looks like the examples you used contain branch information in the name tag, which you plan to treat. However the table makes it look like you leave the nodes without any name tags at the end? Is that really what you plan to do or am I just misinterpreting? I’d expect you to leave them with a simple name=Whole Foods or whatever the case may be.

  • A hyper-local concern, but perhaps more broadly applicable: I spot checked one grocery brand, Ralphs, a Kroger subsidiary, common near me and spotted some oddities in your geojson file. For one, some of the stores in the file have name=Ralphs Fresh Fare, while others have name=Ralphs. Apparently, this is a branding that Ralphs does, but I had never noticed, even though I shop at one every week. Apparently no one else had either, since according to taginfo name=Ralphs Fresh Fare does not currently appear in the database at all. Or perhaps it’s because if you tried to enter it in iD, NSI would prompt it to correct you to name=Ralphs. I tend to think that name=Ralphs is more correct for these, since that’s what everyone refers to them as, though I don’t feel that strongly. But more systematically, have you checked whether your changes would add branding names that no one uses organically to the database? Or whether some of your edits would prompt warnings due to NSI? Neither of these necessarily means the edit is wrong, but it might be worth taking a closer look at any such cases.

  • Another thing I spotted was that these local Ralphs/Ralphs Fresh Fares tend to have mysterious operator tags, which seem to reflect long-ago acquisitions and mergers (for instance, one common value is operator=Alpha Beta Company, a former local grocery chain that hasn’t existed since 1995). I don’t think there’s any visible difference in the stores with different alleged operators, and I wonder how much of it is internal shell games by the parent companies. Is operator a tag you plan to add?

Re: your questions, I definitely think you should not do more than one brand per changeset, and that you should probably break them up at least regionally, maybe even state by state, although I recognize that might be too much more work for you. Again, thanks for spearheading this!

1 Like

Branded store formats such as Kroger Marketplace/Signature/Food & Drug come up fairly regularly in the NSI project. I’m not sure we have a consistent answer for them, since the formats vary more significantly in some chains than in others. A great many manually inputted store formats have been removed from name over the years due to consistency checks. In my opinion, the information should be captured somehow, whether in one of the name keys (not necessarily name) or a more specific brand tag. Or maybe a new key along the lines of branch.

Speaking of Kroger, they’re currently attempting an acquisition of Albertsons. The acquisition is currently stuck in litigation, but if the deal closes, this import will have to be re-run multiple times as the combined company sheds several brands and hundreds of individual store locations.

This is actually because some of the original files include amenity=fuel and amenity=pharmacy POIs operated by the grocery stores, which I’ve removed as they’re beyond the scope of this import proposal.

1 Like

Yes, and generally the information in the ATP files looks good after cleaning it up. Again, I don’t plan on overwriting hand-inputted data that may be better (or worse). The quality of the coordinates is hit or miss depending on the brand, but generally within 500 meters, which is sufficient to make matches. Yes, I’ve run the JOSM validator and will do so again when uploading as well.

I’m sensitive to this, but also think the benefits tend to outweigh the risks on this. I can play around and look into if there are ways to prevent this before proceeding.

That may be the case, I’m just keeping this import to the US because I know I know how to spot and fix poorly parsed US data.

Basically, all the information that doesn’t overlap with current information would be added. So if a basic grocery store node only has amenity=supermarket and a name, and the reference data I’ve prepared has amenity, name, phone, website, opening_hours, addr:*, brand, and brand:wikidata, then I would merge the phone, website, opening_hours, addr:*, brand, and brand:wikidata values.

Yeah that’s my oversight in the documentation. Presumably all the objects I’m merging data onto will already have a name tag that I would not replace, but in theory, I would merge a basic Whole Foods Market or Trader Joe's value. I have it right in the data :sweat_smile:

Again, this might just be academic given that I think every object I add info to will already have a name that I won’t touch, but it’s a good point and I can look into whether this affects brand or brand:wikidata, for example.

I can remove this on Kroger nodes, and take a closer look on this in the future. I agree this is likely of dubious value.

Interesting, and probably a bridge I or the community crosses if and when it happens. This would have to happen whether or not this import goes forward, unless I’m misunderstanding. Might be some work all for nothing, but such is life.