Proposed double-entry of Consolidated City-Counties

On some level, it is correct for me to (put blinders on) and say I don’t give a rat’s behind about Wiki-stuff.

Pretending that I am a sixth-grader (eleven-year-old), offer me a hard-and-fast rule, the same with some explainable exceptions, or say “we’ll keep track of it in the table.” Which (the latter) I think we are, as the OP says, “except for the Wikidata tags of late.”

Maybe we are tightening up the semantics of the column headings’ meanings for Status, Coextensive? and Emerged, but otherwise, yeah.

Loren and Minh, you’ve bitten off some gum here. Really we have as a project, you two are leaders right this instant, I’m in the dugout sort of stunned, Brian will chime in soon I’m sure, others have every right to say something, and we continue to chew on several, maybe as many as a dozen of these (I count 44 total now). I think most of them we agree on, maybe the table makes it more clear which remain ambiguous and which are “under discussion / construction.”

Keeping discussion to Denver, San Francisco can be helpful, as one bubble, maybe two, can be blown at a time and everybody says, “aw, yeah, how pretty.”

Minh chews pretty fast and blows lots of pretty bubbles, he’s amazing like that. Loren made a nice table, with flinty columns, with discussion here; people are watching.

Yeah. I don’t want to add any “eating popcorn,” I’ll just shut up and watch.

2 Likes

I am sympathetic to this argument, since they are administrative boundaries after all and it’s the state that administers them. So if the California court decides that San Francisco County doesn’t exist, then that seems like a strong indication that it doesn’t exist. Meanwhile, I think everyone in the conversation agrees that something containing “San Francisco” in the name is an admin_level=6 subdivision of California, and that every other boundary with that value in California is a County.

All I can say is that as someone who lived in the City and County of San Francisco for 18 years, not having the territory of San Francisco tagged as belonging to some admin_level=8 territory feels wrong: when you’re in SF, you’re in a City of the state of California, just a really weird one that’s also a county at the same time. To be a tagging purist, one admin_level=6;8 boundary might be most correct, but there seem to be concerns that that would be impractical for use by many OSM applications. To me, it seems like a more reasonable compromise to duplicate the admin_level=6 relation with a separate admin_level=8 relation, to represent the “city” and “county” nature of the City-and-County, than to omit an admin_level=8 relation altogether and have San Francisco be a strange exception of the only incorporated municipality of California not covered with an admin_level=8 relation. I guess it is an exceptional governance structure anyways though.

You’re telling me! I am currently a resident of the District (I miss my congressional representation already…). My confusion stems from the fact that to my eyes, DC and San Francisco seem to be governed in much the same way: by one unitary authority exercising the rights of a city, county, and in DC, quasi-state. Additionally, as far as I can tell, the City of Washington doesn’t exist: all the local government apparatuses reference only the District of Columbia (Mayor of the District of Columbia, DC Council, DC Department of Parks and Recreation, etc). I believe the City of Washington used to exist, and was abolished in the 1870s when the District was unified into one government. So why would Washington be admin_level=8 but not San Francisco? Both seem to have nearly the same history and current structure.

To me, admin_level=8 doesn’t necessarily mean “city”. admin_level=* tells us which boundaries nest inside which boundaries, which affects not only geocoding results but also the boundary line’s dash pattern or the label’s size on a rendered map. admin_level=8 can mean the name is less prominent than the surrounding admin_level=6, and the line is fainter with shorter dashes.

We just generally call level 8 “city” because many well-known cities have admin_level=8 boundaries. In the surrounding San Francisco Bay Area, there are almost a dozen admin_level=8 boundaries representing the third kind of municipality, towns. border_type=city is what actually means city, and border_type=city;county seems like a reasonable encoding for city and county. One can get very lucky assuming admin_level=8 is some kind of municipality, but they’ll find a little less luck assuming the inverse, that every municipality is admin_level=8.

Speaking of naïve data consumers, the main website’s search tool contributes to these assumptions by associating the levels with a set of terms that apparently don’t correspond to any country anywhere. This prompted the Puerto Rico community to demote barrios to admin_level=9 so they’d be labeled as “villages” instead of “cities”.

The Washington Metro calls one of its stops Washington National Airport. :wink:

Names really do matter with this one. People expect to see the string “Washington, D.C.”, or “Washington, District of Columbia”, in the same contexts as “Columbus, Ohio”, or “Miami, Florida”, perhaps moreso because we’re used to having to distinguish the city from Washington State. But “Washington, D.C.”, makes for an ugly map label. In the context of a map, “Washington” suffices. Data consumers will automatically form “Washington, D.C.”, in the right contexts if there are two boundaries, but if there’s only one, we force them into either “Washington” or “Washington, D.C.”, without a real choice.

Right. You definitely can’t make this assumption, and it gets worse in other countries with differing schemes. I ran into this in the last few days, trying to untangle what counts as a city in Poland for my app.

I would say that admin_level values of sibling objects should ideally correspond. So if San Francisco is “more like” its admin_level=6 peers or “more like” its admin_level=8 peers, then that should be the guide. I don’t think there’s any hard and fast rule that level 6 must be space-filling within a state and either option is somewhat right and somewhat wrong.

I agree that border_type=city;county is an appopriate encoding and choosing the right admin_level value is really more a philosophical debate that doesn’t matter too much.

(Philosophical debate)

Virginia ICs have less of a claim to being a county-equivalent than a CCC. So I wouldn’t change San Francisco to level 8 before we dropped the Virginia ICs to level 8.

And that stop is in Virginia! :wink:

2 Likes

But does this tagging for the geocoder require an admin_level=8 + border_type=city for Washington? It seems like this would already accomplished by the existence of a place=city node for Washington within the District of Columbia boundaries, as well as addr:city=Washington + addr:state=DC for addresses within it, which are both clearly correct.

It seems inconsistent to me to cite obscure California court rulings to justify the existence or non-existence of one city boundary while justifying another by the ease of producing a map label, when legally and practically they seem indistinguishable to me.

And what I’ve been arguing all along is that San Francisco is “much like” both of them, neither more one nor the other. If you produced a list of counties of California, you’d expect San Francisco on the list. If you produced a list of cities of California, you’d expect San Francisco on the list.

For typical usage examples, locals frequently refer to the “nine-county” Bay Area, including San Francisco as one of the nine. Meanwhile, any list of the largest cities in the Bay Area will be led by San Jose, San Francisco, and Oakland. Its natural peers are both other counties and other cities.

We map city boundaries at all because a place node doesn’t communicate an inherent area, and a fuzzy point cloud of addresses doesn’t really equal the city limits. Some Washington, D.C., addresses are physically in Maryland or Virginia and vice versa, because postal cities and ZIP codes are based on delivery routes, not boundaries.

That said, no, the Washington boundary probably does not need to be an administrative boundary to satisfy the common geocoding use case I described. Ignoring the history, I would liken it to the boundary for Alaska’s Unorganized Borough or Maine’s Unorganized Territory: a process of elimination, except that in D.C. there’s nothing to eliminate in the first place. The Unorganized Borough is currently mapped as an admin_level=7 administrative boundary, as opposed to the boundary=census boundaries for census areas. The UT goes unmapped, but at a more local level, Maine’s gores are admin_level=8, on par with its governmentless townships.

Washington definitely exists in the present day, not just as a historic artifact, and it has a well-defined territory. But as you point out, it’s administratively irrelevant these days, just a geographic catch-all. Could we tag its boundary as boundary=place, similar to the well-defined but governmentless neighborhoods in some cities? People looking for the boundaries of all the cities would still find it via border_type=city, but not via admin_level=8.

After 2+weeks and 100+ posts, it seems clear we’ll have to accept that any tagging convention or boundary decision will be inconvenient, at least to some users to some degree.

In other words . . .

“There are no solutions, only trade-offs”
~ Thomas Sowell :nerd_face:


The discussion has focused mainly on San Francisco with Denver a distance second, but whatever single-city CCC convention agreed to for them would seem to apply to Broomfield, Nantucket, Honolulu*, and the six Alaska City and Borough of XXX entities (in total, the 11 single-entity CCCs).

(* - Honolulu seems unique in that Hawaii lacks municipal governments, but the official name of Honolulu is “City and County of”)

New Orleans / Orleans Parish and occasionally Philadelphia / Philadelphia County and a few others have been discussed as examples, but I don’t see any appetite for changing them or any of the other CCCs from two entities in OSM or altering their current tagging.

It also seems the trade spaces we’re debating and the agreed preferences are:

  1. more accuracy > less accuracy
  2. easier for data consumers > more difficult for data consumers
  3. affecting fewer data consumers > affecting more data consumers
  4. easier for mappers > more difficult for mappers
  5. affecting fewer mappers > affecting more mappers

Could the convention for the 11 single-entity CCCs (Broomfield, Denver, Honolulu, Nantucket, San Francisco, and the six Alaska City and Borough of XXXs) be as simple as leaving them with a single boundary in OSM and tagging them:

admin_level=6;8
border_type=county;city
wikidata=Q123;Q456

This seems to hit most of the above preferences, and I imagine those adversely affected could resolve their particular issues with modest adjustments.

Does anyone see this is obviously not workable for either the naive user (me) or the sophisticated user (probably everyone else in this conversation)?

Semi-colon separated admin_level values are NOT workable and will break data consumers of many stripes. Take a look at taginfo to see the distribution of values.

So this?

admin_level=6
border_type=county;city
wikidata=Q123;Q456

And would @willkmis just accept not seeing San Francisco listed at admin_level=8?

(And actually I think it would be many more than @willkmis that would expect San Francisco to be represented there.)

Or would @Minh_Nguyen’s solution from a few days ago seem more palatable:

There are a handful of examples of tagging a single OSM feature with multiple Wikidata QIDs, but at a glance they all seem to be relatively obscure features. If we do the same on a world-class city (and county), I imagine we’ll eventually find out about more naïve data consumers. :wink:

Major downstream processors such as OsmAnd, OpenMapTiles, Mapbox, and Overture join OSM and Wikidata to backfill translated place names that OSM doesn’t or won’t provide, discover images from Wikimedia Commons, rank places based on ephemeral attributes like Wikipedia page views, and discover attributes independent of OSM classifications (like the CCC query I shared previously). This usage has often arisen as a way to bridge some of the tradeoffs between accuracy and usability that we’re discussing.

Given these data consumers’ publicly documented schemas, it seems unlikely that they’re prepared to expose all the values in the list. The best-case scenario is that they can expose the first value and toss the rest. So if we do end up tagging a single San Francisco boundary relation with both values, San Francisco (Q62) probably needs to come before San Francisco County (Q13188841).

I was being somewhat tongue-in-cheek. Site relations are real, but basically nothing knows what to do with them because they’re so diverse. It’s a bit of a running joke that the Great Lakes is a “site”. I guess it is, from an interplanetary perspective.

Based on the intervening discussion, it seems pretty clear that San Francisco County is not really an administrative boundary, so if we take this approach, we still would tag the “county” boundary as boundary=place and nix the admin_level=6 tag. Even then, we’ll be misleading many more data consumers into saying either “San Francisco, San Francisco County, California”, or even “San Francisco County, San Francisco, California”. So I’m not sure this compromise would actually make anyone happy.

Not that it will solve anything, I’ll say this. In my life and especially in OSM, I say “what is true.” I worry (or show concern, better stated) not at all or very little for “what people think” or “whether this (truth) makes somebody happy.”

That might make me unpopular, but it is very likely NOT going to make me wrong.

My apologies in advance if that doesn’t actually move this conversation forward.

With respect to the Wikidata tags, I think the difficulty is that Wikidata and OSM have different project scopes and therefore different inclusion criteria. OSM has a narrower scope than some other databases that cover geography. Most of the time, this is not a surprise, but it becomes a surprise in a few edge cases like this.

Wikidata necessarily has a separate item about the county, in order to represent the fact that it has existed in the past and some databases like GNIS and OpenHistoricalMap continue to refer to it. GNIS doesn’t expect to have a one-to-one correspondence to OSM elements, and OHM pointedly doesn’t, but for better or worse, many have grown accustomed to the idea that there’s a one-to-one correspondence between OSM and Wikidata within certain classes.

The good news is that border_type=city;county (or county;city) doesn’t seem to be controversial or problematic in terms of backwards compatibility. How about we implement that for now while we consider the rest?

3 Likes

I gave your post a thumbs-up (let’s implement “that” while we consider…), although I caution that whenever we do these sorts of things, we are, in a sense, “coding for (a, the…) renderer.” Yes, it is important to look at entire toolchains of both syntax (tagging) and semantics (including how use cases parse our data and make sense of them for a given context). Yet is is the fact that a “given context” can differ, here, there, somewhere else, that gets us into trouble as we “stovepipe” ourselves into a single, “let’s do this for now” (potential or actual) “solution.”

There is miniscule prior art, but for the semi-colon delimited examples tagged so far, it seems like the larger entity wants to go first in the list. So, border_type=county;city.

2 Likes

I see this as possibly workable. Perhaps this could also be applied to the other Consolidated City Counties too? But I haven’t thought through the implications of that.

I agree that this would be an improvement on the current tagging.

I also think that more mappers would expect, and more data consumers might produce better results with, San Francisco present as an admin_level=8 boundary than without it. That’s not the be-all end-all argument to be sure, but I think it’s a strong one. But these places are always going to be edge cases, so there’s never going to be a perfect way to represent them.

As an aside, I appreciate the discussion and how you and all the participants have been trying to work together to find a workable solution.

I would be fully on board with making all of the consolidated cities and independent cities admin_level=8. I think these are “more like” cities and towns more commonly tagged =8 than counties. But, this isn’t a hill I’d die on.

I’d lean toward admin_level=6. As long as border_type=* contains city, that’s adequate for anyone who needs to know whether the city and county has anything in common with a stereotypical American city, even though the boundary is not just a city limits.

One possible test of San Francisco’s admin level is the maritime boundary around the Farallon Islands. Like California’s counties and unlike its cities, San Francisco has maritime boundaries into the Pacific Ocean, a consequence of the consolidation.

I don’t think we can definitively determine whether the net benefit to data consumers is positive or negative. It depends on the use case. We’ve talked a lot about forward and reverse geocoding in this discussion, because some of the participants are personally grappling with some geocoding challenges. For that use case, consolidating the San Francisco boundary leads to higher quality output, however surprised the developer might be about that outcome, and the admin level ultimately doesn’t matter as long as the number is higher than the containing boundaries and lower than the contained boundaries. A geocoder will “walk the tree” to figure out the hierarchy, which remains intact.

But I want to impress upon this group the importance of renderers in these discussions, because unlike geocoders, renderers need the specific values of admin levels to be predictable between neighbors, and no other tagging scheme can satisfy this need. For the purpose of depicting boundaries, a renderer doesn’t care about the territory’s designation or similarity to a stereotypical city; it only cares about the name and how deeply nested it is. As I described before, the nesting level affects the line treatment and label. In some maps, more local boundaries get fainter colors. In others, they get more dots and fewer dashes (or vice versa); you’re supposed to be able to count the dots between the dashes to determine the nesting level.

More importantly, the admin level affects the scale or zoom level at which the boundary appears. In Rand McNally’s Quick Reference World Atlas (rev. 2000) and Goode’s World Atlas (22nd ed., 2010), there isn’t enough detail to warrant more than two levels of boundaries. Notice how they gloss over the various regional distinctions between different kinds of boundaries. Whether it’s a state, province, etc., what they intend to show is a second-order boundary. Imagine a map that only shows third-order boundaries – should it show San Francisco’s boundary?

As a rule of thumb, we should use the minimum applicable value numerically. As you zoom in, the map continues to show the lower-order boundaries while progressively adding higher orders. If you choose too high an admin level for San Francisco, the border around the Farallons won’t appear until too late. If you choose a lower admin level for San Francisco, it’ll stick around even as you zoom in.

What happens when a boundary line happens to be both an international boundary and a second-order boundary? Of course, the map depicts only the international boundary, ignoring the second-order boundary, except perhaps as an edge label. Any map will show the boundary of San Francisco as a county boundary on every side, even if it still acknowledges San Francisco County by force of habit.

Regardless of the use case, admin_level=* is how we tag the order of a boundary, answering the question: “What does it subdivide?” San Francisco and the neighboring counties all subdivide California.


Earlier, I mentioned that in Ohio we do keep withdrawn cities and villages at level 8 rather than promoting them to level 7 alongside the townships they withdrew from. This is not only because of the convenience of keeping all the cities and villages at the same level. By leaving a gap at level 7, we’re saying that there’s something there – the paper township – but we choose not to map it.

This is the approach we’d take with San Francisco if we were to determine that San Francisco County does exist, but it’s too obscure to map. We’d take this approach with New Orleans if we were to come to the conclusion that Orleans Parish is too obscure to map. But as it is, those parish limit signs aren’t the only way motorists come into contact with Orleans Parish as they make their way to New Orleans.

1 Like

I am curious whether by this logic, if we would promote Washington (of Washington, D.C. fame) to admin_level=6. After all, it is the sole division of the admin_level=4 District of Columbia.