Proposed bulk removal of brand:wikipedia tag from United States POIs

The main visualization I’m aware of is Open Etymology Map, which only considers name:etymology, name:etymology:description, and name:etymology:wikidata, but not name:etymology:wikipedia. The MapComplete Etymology theme similarly considers name:etymology:wikidata but not name:etymology:wikipedia. But maybe someone included it in an import or bulk edit just in case.

1 Like

By the way, there are 1,061 elements in the U.S. with brand:wikipedia=* but not brand:wikidata=*. Needless to say, we should keep these occurrences of brand:wikipedia=* for now, or replace them with brand:wikidata=* if there’s a straightforward way to do that en masse. At a glance, most are for brands that are in name-suggestion-index, so the brand:wikidata=* tags will be added eventually as mappers encounter validator warnings about them being missing.

1 Like

There is also EqualStreetNames

According to https://github.com/EqualStreetNames/module-process/blob/39b77bb3feefdd2fa8d8887b7f84f1e956d47f2f/docs/geojson.md?plain=1#L15-L17 tags used:

  • name
  • wikidata
  • name:etymology:wikidata
1 Like

I am a bit torn on this proposal. On one side I agree with the motivations cited in favor above.

On the other side it seems wasteful to drop all the good values for this tag on the map. Also, and maybe most importantly, currently this tag is documented in the wiki as a valid key that can be normally used, so if we proceeded we would be erasing from the map a tag that is actively used and documented (it’s usage is in decline, but still, it’s used). If as a community we conclude that this is a redundant tag that adds very few useful info but a lot of maintenance burden (and I would agree with that) then we should first document it as such and discourage its usage and only then delete the existing usages.

Also, whatever the outcome of the proposal for this tag, I think we should expand the conversation to all secondary Wikipedia tags, because concepts and issues described here apply to most of these tags and in most countries.

In the meantime I would propose third way: the wikidata<->wikipedia links from WD are licensed under CC0. In general WD’s CC0 is not considered sufficiently reliable for importing/using it in mapping but in this case WD is the original source of the information so the CC0 is reliable. We could use queries like the one proposed above to fetch all the elements with a wrong value of brand:wikipedia and use this information to clean up the bad data without wasting all the good values.

For example:

I improved the query above in this query removing most of the redirect articles (the ones where the new name case-insensitively includes or is included in the old one; I may have mistakenly removed some non-redirect articles but they should be very few), reducing from 24k to 14k POIs.

From QLever’s interface the result can be easily exported as CSV and used to filter the elements that actually need their brand:wikipedia to be removed.

1 Like

Are you planning to do this work, and keep it updated over time? Because that’s exactly the problem, that the :wikipedia links will drift over time. A complete removal fixes this problem, while leaving in a partial set simply sets us up for future bad data. If someone wants to set up some kind of bad data surveillance system to monitor brand wikipedia/wikidata mismatches and keep on top of it then I’ll happily step aside and let them do it. I think that’s unrealistic in a volunteer project.

I do not want to expand the scope of this edit or discussion beyond this specific key precisely because it can get bogged down in broader discussions (for example, there is still plenty of support for the plain wikipedia key). I contend that this specific key has been abandoned by the tool that primarily it put there and should be cleaned up. If that spawns other meta looks at other keys in similar boats, I’d welcome that. Given that we are 24 posts into this discussion, and the tag count is quite significant, I’d like to keep this focused on brand:wikipedia and understand if there are people that feel strongly that it should be kept in the United States.

This is fair, though name-suggestion-index added (then abandoned) not only brand:wikipedia but also operator:wikipedia, network:wikipedia, and flag:wikipedia. I think it would be reasonable to delete these four three keys in the same fell swoop while leaving the others, which aren’t so common after all. Of these four three, brand:wikipedia and operator:wikipedia are the least stable because of how frequently companies change hands and brands rebrand.

Edit: Looks like NSI never added flag:wikipedia for some reason, so we’re good on that front.

1 Like

That’s a fair concern. But still, before doing a mass deletion we should have a wide consensus to make this change structural, at the very least adding to the Wiki page something like

If brand:wikidata=* is already specified, the usage of this tag is discouraged because

  • from the brand’s Wikidata entity it’s already possible to reach it’s Wikipedia page in any available language
  • adding this tag creates duplicate data prone to misalignment
  • the human readable name of the brand can already be specified with brand=*

I would definitely be in favor of this.

In absence of consensus the selective removal would be the next best solution.

PS:

I know it’s not important for the thread but just for clarity I was only referring to secondary wikipedia tags (*:wikipedia=*), wikipedia=* has similar issues but it’s usage is too broad to think about touching it

1 Like

I agree that the wiki could use an update to specifically lay out NSI’s contribution to this tag’s proliferation and a link to this discussion proposing to remove it in the United States. I am hesitant to say that the tag is discouraged on a global level because we have not had this discussion globally (and I do not want to bog down the US edit in such a discussion if there’s domestic support). But certainly your wiki update suggestion is a good idea.

note that I proposed do part of such cleanup and there was opposition to do doing this as some people wanted this tag gone rather than fixed

Are you referring to a recent proposal to update brand:wikipedia=* tags by replacing Wikipedia redirects with their targets? That would make my earlier QLever query more accurate, but you’d have to be careful to keep the redirects that match the brand:wikidata=* and/or the redirects that are categorized as redirects with possibilities, redirects from subtopics, redirects to sections, or redirects to embedded anchors. It’s kind of complicated because you’d have to know which redirects are in which categories, and Wikidata can’t help much because its ability to link to a redirect is only a little over a year old. And this assumes the redirects are correctly categorized in the first place – relatively few Wikipedia editors know that’s even a thing.

yes, but only where redirect targets match already present brand:wikidata

(is it possible to have somehow end with case where brand:wikipedia and brand:wikidata are present, brand:wikipedia targets redirects to a another page that has wikidata entry matching brand:wikidata tag and editing brand:wikipedia tag to match new target would lower OSM quality?

note that if redirect target has no wikidata or different from present brand:wikidata tag or no brand:wikidata tag is present then edit would not be done
)

It is possible, technically, if someone manually changes the brand:wikipedia=* tag to one of these “redirects with possibilities” but doesn’t retag the accompanying brand:wikidata=* tag. The same scenario is how folks in this thread have theorized that a completely mismatching brand:wikipedia=* tag could have some value worth preserving until further inspection. However, I think it’s quite unlikely compared to the analogous scenario with wikipedia=* and wikidata=*, because editor support for brand:wikipedia=* has always been nonexistent.

If you perform an edit with the safeguards you’ve mentioned, I’m not terribly concerned about intentional sitelinks to redirects getting wiped away. But we’re still left with the question of why we should even maintain these tags that no one has a reason to use. You’d have to run this cleanup constantly, indefinitely, but I think the only practical benefit is that my QLever query will more accurately reflect the irrelevance of brand:wikipedia=*.

1 Like

I just wanted to chime in with my full support for the purge of brand:wikipedia. I also think starting with the US is the right way to go about it, since we (hopefully) afterwards can point to it as a successful example for the following worldwide edits. And lastly, I agree that the edit should be foregone by a change to the wiki page, as to not delete a tag that’s technically encouraged.

As always, great initiative @ZeLonewolf!

So there are no autogenerated wikidata entries for this “redirects with possibilities”?

I’m unaware of any effort to automatically generate items for these redirects, though it’s an interesting idea. As far as I know, any “intentional sitelink to redirect” between Wikidata and Wikipedia has been created manually, typically as a result of splitting an item.

2 Likes

I’ve added a section to the wiki summarizing the reason that brand:wikipedia was introduced, and pointing to this thread and proposal for removal.

2 Likes

They are supposed to, and they certainly change less than Wikipedia page titles and mappings between concepts and wikipedia pages. However, it does happen that they change and what was all expressed as one wikidata item becomes multiple ones. I’ve seen this kind of issue mainly with geographic places vs administrative entities.

1 Like

Thanks everyone for the discussion. Based on what I’ve seen here, I’m satisfied that there’s consensus for removing this tag from objects in the United States and will be proceeding with the edit.

There is not a single brand:wikipedia in my area, so I do not have to ask you to do the same here.

I came across this discussion because I saw the mass edit that just went out across California.

As a random OSM contributor who has recently focused a bit more on adding shops/stores and improving their details, I support this decision.

I typically use JOSM and Vespucci.
As a software Dev myself, Entering a Wikidata ID is my preference as it is not an ID that I have to worry about changing as often.

Entering a Wikipedia name feels redundant, and (as others have mentioned) is not as reliable. I have ignored it for the most part as it does require additional work to verify that I am using the correct name. Grabbing the QID from Wikidata has been a lot more straight forward.

4 Likes