Proposed bulk removal of brand:wikipedia tag from United States POIs

For context, brand:wikipedia was added begrudgingly in response to concerns that brand:wikidata=* wasn’t human-readable enough, and that brand=* was too ambiguous. Later, for consistency, the same pattern was extended to operator/network/flag:wikipedia=* to complement operator/network/flag:name=*.

In particular, there was concern about multiple brands sharing the same name in different geographies. This happens a lot with the names of banks, for example. For technical reasons, the Wikipedia article about each brand would have a unique title, even if both brands would have the same brand=*. In theory, this makes it easier to verify that the brand:wikidata=* tag corresponds to the correct brand.

Since then, name-suggestion-index has been able to clean up many local and regional brands that had originally been classified as global brands, making it much harder for mappers to accidentally tag the wrong brand based on its shared name. It has also gained the ability to scope a brand to a specific geometry rather than a whole country. For example, the scope of the multinational Burger King fast food chain excludes the famously unaffiliated Burger King in Mattoon, Illinois. Finally, iD added support for the not:brand:wikidata=* key so that mappers can affirm a very subtle distinction that would otherwise escape notice.

Since the beginning, the name-suggestion-index developers have caught a lot of flak for the inclusion of brand:wikipedia=* tags in its presets. Many mappers view the presence of three different keys to track the same information as a bit excessive. brand:wikipedia=* is the least stable of the three keys, since Wikipedia doesn’t consider its article titles to be even somewhat stable identifiers.

The raw brand:wikipedia=* values were never particularly good at telling you whether the brand=* and brand:wikidata=* referred to the right brand anyways. At the time that name-suggestion-index removed *:wikipedia=* from its presets, two-thirds of the presets (11,810 of 17,992) had brand/operator/network/flag:name=* values that didn’t match their *:wikipedia=* values. Excluding disambiguators in parentheses in the Wikipedia article titles, that still comes to 61% (10,958).

Instructions for reproducing this analysis
git clone https://github.com/osmlab/name-suggestion-index.git
git checkout 82b4751e6c141bf112656423d5c99863e4247b0b^
npm install
npm run build
jq '.presets | map(.addTags) | map((.brand // .operator // .network // .["flag:name"]) as $name | (.["brand:wikipedia"] // .["operator:wikidata"] // .["network:wikidata"] // .["flag:wikidata"]) as $wikidata | select($wikidata) | select($name != ($wikidata | split(":")[1]))) | length' dist/presets/nsi-id-presets.json

Many of these discrepancies arose because Wikipedia chooses to conflate some brands with the company that own or operate the brand, or even with the company that historically owned the brand before selling it off. This is especially common with oil distribution companies and convenience store companies. In these cases, Wikidata would ideally have a more specific item about the brand proper. In the meantime, the brand=* tag indicates what OSM prefers to record on the feature. When Wikidata splits out a brand item, name-suggestion-index sometimes needs to replace the *:wikidata=* tag, and it previously would remove the *:wikipedia=* tag at the same time, since there’s no exact match on Wikipedia. In other words, one way or another, these *:wikipedia=* tags would become irrelevant anyways.

5 Likes