Proposed double-entry of Consolidated City-Counties

ZeLonewolf · December 4, 2024, 7:53pm

If I may elaborate more. Data consumers of US boundaries (forget the rest of the world) already have to handle:

A place with counties but no cities/towns (HI)
Places with townships, which sit above cities/towns and below counties (IL and other)
Places with space-filling boundaries and places with unincorporated parts of counties
Places with extremely sparse numbers of municipalities (VA/MD) in which you need to rely on CDPs to approximate the boundaries of settlements
A place where a city is sometimes admin_level=6 and sometimes 8 (VA)
The special case of New York City
Places with counties that only exist on paper (RI, MA)
A place with counties that aren’t the recognized county-equivalent by the US census bureau (CT)
A place where townships are given alpha-numeric codes rather than real names (ME)
A place with a consolidated city-county that is organized exactly the same as other counties in the state with the exception of how it’s named (HI)
Towns that span two counties

I’m sure I’m missing some oddities.

It seems to me that there is already so much legitimately accepted divergence from a universally-standard way of organizing boundaries in the real world, that we might as well just go for accuracy rather than try to sanitize things because we think we know better for any arbitrary data consumer’s needs.

ezekielf · December 4, 2024, 8:33pm

That’s reasonable. I just hope there’s at least some use case that is made simpler by the real-world-accurate, but inconsistent approach. I haven’t thought about the issue enough to know if that’s the case or not.

Minh_Nguyen · December 4, 2024, 8:42pm

A map probably shouldn’t show a label for “San Francisco County” since, in the real world, that toponym never appears in the usual places that a California county name appears. If every county were like San Francisco County, a map would never label any counties.

A reverse geocoding result for 37°47′44″N, 122°23′37″W should report something like the following breadcrumb:

United States ‣ California ‣ San Francisco ‣ Ferry Building

not:

United States ‣ California ‣ San Francisco County ‣ San Francisco ‣ Ferry Building

or:

United States ‣ California ‣ San Francisco ‣ San Francisco ‣ Ferry Building

whereas a reverse geocoding result for 37°24′11″N, 121°58′11″W should report something like:

United States ‣ California ‣ Santa Clara County ‣ Santa Clara ‣ Levi’s Stadium

not:

United States ‣ California ‣ Santa Clara ‣ Levi’s Stadium

or:

United States ‣ California ‣ Santa Clara County ‣ Levi’s Stadium

(This is actually a case that trips up Pelias. Who knew that a city could exist in a county by the same name?)

Now let’s imagine that each of the breadcrumbs above links to a list of sibling places. In my opinion, the list of places under California should ideally include San Francisco among the counties. If this tree can also list all the cities in California, it should list San Francisco among the cities too. Maybe the site navigation for the California Superior Court system would differ, but I kind of feel like that’s their problem.

To me, the only relevant difference between San Francisco and an independent city in Virginia is that we could conceive of a distinct “San Francisco County” and it wouldn’t be total fiction. It would be a paper county, like the paper townships of Ohio.

The argument for a San Francisco County boundary is that otherwise, it’s hard for a user to come up with a complete list of California counties. I acknowledge the problem statement but disagree that this boundary would get the user closer to a solution. They’d have to special-case this county either way, because it’s San Francisco that’s the county – unless the objective is only to get boundaries that they can merge together into a shape of California and discard any county-specific information.

Tagging the one boundary as admin_level=6;8 would directly communicate the dual nature. The only question is if it would surprise enough renderers and geocoders to require another big long discussion about semicolons. That tagging approach used to be pretty common back when boundary relations were exotic so we tagged boundary ways for backwards compatibility. I don’t recall if any data consumers had a problem with dual admin levels then.

ezekielf · December 5, 2024, 2:45am

A post was split to a new topic: Oklahoma Townships

Loren_Maxwell · December 6, 2024, 3:04am

I haven’t been overly engaged for the last couple of days because I’ve been visiting (of all places) New Orleans. Or Orleans Parish? Or both?

Anyway, I’ve added another column to the table showing how each CCC emerged (since creation or merged): Talk:United States/Boundaries - OpenStreetMap Wiki

One point that keeps coming up is that the CCCs of San Francisco / San Francisco County, Denver / Denver County, etc., have never been separate entities because they were created as one entity (although it could be argued that two entities were consolidated upon creation, but I’ll leave it to others who might be more familiar with these topics as to if that’s a reasonable rebuttal).

This is true of the six Alaska “City and Borough of XXX” entities and also true of New Orleans / Orleans Parish.

So from the table I’m building, I can see a difference between San Francisco and Philadelphia, since Philadelphia / Philadelphia County were once separate entities that merged (1952) versus San Francisco existing only ever as a CCC.

But I’m not seeing any difference between New Orleans / Orleans Parish and San Francisco / San Francisco County, other than New Orleans / Orleans Parish were not always coextensive, however they have been since 1870 so I’m not sure that would be relevant for a boundary discussion in 2024.

In other words, if San Francisco and Denver are single entity, what would the rationale be for New Orleans to be a double entry? Or if New Orleans is double, why would the others be single?

@Minh_Nguyen’s position, “strongly held”, is that New Orleans and Orleans Parish are two separate entities.

@ZeLonewolf’s position, also seemingly strongly held, is that San Francisco City and County is one entity.

But what’s the difference? Is it really that over 150 years ago they were not coextensive? Or is it really that if you lived there then you’d just know?

In defense of usability for the naïve user (like myself)

I very much like the implicit criteria @stevea mentions:

Imagine “somebody not from around here” trying to figure this out.

In other words, how does a naïve user (like myself) walk from the most likely sources a naïve user would start with and make their way to and through OSM data, such as Wikipedia → Wikidata → OSM, with full traceability as to what is happening to each entity along the way so that they are not surprised when San Francisco County and Denver County do not have entries in OSM?

If the solution isn’t to add the boundaries to OSM, then is the solution most helpful for future naïve users to start further upstream, perhaps at Wikidata?

Should Denver County (Q15906757), Broomfield County (Q16088503), Nantucket County (Q2991355), and San Francisco County (Q13188841) (and possibly Orleans Parish (Q486231) based on the above) be merged with their city entry in Wikidata?

Should those Qids only be an “instance of” (P31) a “Wikimedia redirect” (Q21528878) instead of county of California (Q13212489) or county of Colorado (Q13410403) or a county of Massachusetts (Q13410485) or parish of Louisiana (Q13410524)?

Assuming that doesn’t disrupt data consumers of Wikidata, this would at least not lead the naïve user to expect those boundaries in OSM if they are taking that particular naïve path through Wikidata.

But Wikidata is only one naïve path.

Imagine a naïve user only interested in counties who starts with census data and sees San Francisco County (Explore Census Data) and Denver County (Explore Census Data) but not being able to find those in OSM and not being aware they are coextensive and were created as one entity with the city and county consolidated?

Or a list of counties in California from Zillow (https://www.zillow.com/browse/homes/ca/), or Ballotpedia (Counties in California - Ballotpedia), or Cal_Berkley (Table 1: List of Counties in California & Zoning Percentage | Othering & Belonging Institute), or even the government of California itself (https://notary.cdn.sos.ca.gov/forms/notary-county-codes.pdf), or any other place a naïve (but reasonable) user might start.

With a little research they can probably find out those are CCCs, but I’m not sure the naïve user wouldn’t be even more confused when they see some CCCs have both county and city in OSM but some do not, with no obvious rhyme or reason to it.

Of course there are only 44 of them and with a few days of study they can figure out that whether the county and city of a CCC is coextensive is an important clue as to if OSM has them listed as two separate entities or one.

But that doesn’t quite explain all the discrepancies, so with some additional research they might realize that whether the city and county were consolidated simultaneously with their creation is another important clue as to if OSM has two separate boundaries.

And with some more work they might compare the official names of the 44 CCCs to realize that is yet another clue as to if OSM has two separate boundaries, along with examining the signs entering the city, and the organization of the sheriff and police departments, and the …, etc., etc., etc.

But at what point can the naïve user just get back to their actual application, which doesn’t overly concern itself with whether San Francisco County is a separate entity than San Francisco as much as it just needs a list of counties and cities in California, consolidated or not?

I’m not suggesting a specific solution, but whatever it is, the end result should be that it is relatively easy for the naïve user to travel a reasonable path from the most likely sources a naïve user would start with to OSM, otherwise OSM becomes too difficult to use unless that naïve user invests the time and effort to become an expert in details that would elevate him from being a naïve user.

But the naïve user loves being naïve. He wants to be naïve. He needs to be naïve – not for the sake of being naïve, but so he can focus his efforts on his true passion, his application!

I don’t think anything proposed here would necessarily make the OSM data “unusable”, as long as the naïve user could easily figure out why San Francisco County and Denver County are present in Wikidata and other sources but not OSM.

However I think the defense of accuracy at the expense of usability is not as straightforward as it’s sometimes presented.

No doubt accuracy is important and should be defended, but accuracy isn’t categorically better than usability.

At the extreme, if it’s accurate but unusable, what’s the point?

willkmis · December 6, 2024, 3:47am

Minor note: San Francisco City and County were not created as one entity: they were merged in 1856. When the state of California was created in 1850, San Francisco County included all of what is now San Mateo County, and the City’s limits were much smaller than they became when the entities were merged.

Loren_Maxwell · December 6, 2024, 4:00am

Ah, ok – I went by the Wikipedia CCC page.

But this makes it even more similar to New Orleans / Orleans Parish or Philadelphia / Philadelphia County.

ZeLonewolf · December 6, 2024, 4:49am

Essentially, nomenclature. We are tagging the names of territory, which is a concept we seem to be nodding our heads at. Whether we want to admit it or not, this all revolves around how territory is named, not 19th century history or zoning laws or who gets to elect the dogcatcher.

The bit about “a query that a naive data user can query for a list of counties” is a red herring. You can query admin_level=6 in California, and get all the counties, including San Francisco. You’ll have a complete, space-filling, California-shaped jigsaw puzzle of territory.

In Virginia, this same query would get all the counties AND all the Independent cities, for a Virginia-shaped jigsaw puzzle. But for some reason we aren’t discussing replicating the independent cities to have double-entry with additional copies, each at admin_level=8. Why is that?

We accept that there is one place named Falls Church, Virginia (admin_level=6) and therefore we have one boundary relation to represent that named bit of territory. We accept that there is both a city of Washington and the District of Columbia, so we’re representing the two things with two OSM objects.

The difference for all the “City and County of”, “Town and County of”, and “City and Borough of” examples, is that by their name, they are one named thing.

The CCC wikipedia article has a pretty good list of consolidated cities (one named place with both in the name) versus merged ones (separately named cities and counties). That’s where I would draw the distinction. It seems reasonable to have a boundary relation for Muscogee County, Georgia and a separate one for its sole city of Columbus.

And lastly, a naive data consumer will fail, correctly, because the boundary hierarchy simply isn’t a naive construct. It’s complex and messy in its real-world form, and a data consumer must be aware of the complexities. They should also all thank @stevea for painstakingly documenting this complexity as it has been discovered over the years.

stevea · December 6, 2024, 5:03am

I have been a mere scribe. And it hasn’t been easy. And it continues. (Though, we have accomplished MUCH).

See, the complexity exists in the real world; the real world is messy. We’re (simply?) a bunch of mappers trying to come to agreement and cope with the mess. It’s very real life, it’s very OSM.

Loren_Maxwell · December 6, 2024, 5:28am

I see your point in that a naive user will get the information they’re looking for with the admin_level=6 query, but it still lacks traceability from a source like Wikidata.

The difference is that there is traceability from the list of independent cities in Wikipedia → Wikidata → OSM, but not for all the CCCs in the same way. Some CCCs show as two entities, others as one, but it’s not obvious why.

For independent cities, it’s obvious from Wikipedia and Wikidata I would only expect one entity.

Ok, speaking just for me (on behalf of all the naive users now and in perpetuity ), if the nomenclature is the correct distinction to make, which I definitely see the argument for even if I’m not fully convinced, would the tagging be admin_level=6;8 and border_type=county;city for those with the “consolidated name”?

And a quick side question out of curiosity that would also be helpful here – does Overpass support two Wikidata tags separated by a semicolon?

Loren_Maxwell · December 6, 2024, 5:35am

@ZeLonewolf – and what would your take on this be?

ZeLonewolf · December 6, 2024, 12:02pm

Most likely yes, wikidata should be updated. We changed P131 on Denver (located in the territorial entity) to point to Colorado rather than the item that goes with the Denver County redirect page after a discussion on Slack.

Loren_Maxwell · December 6, 2024, 12:30pm

I’m trying to log into the Slack discussion but I get the error “___ @gmail.com doesn’t have an account on this workspace.”

Is it restricted or is to be added?

Minh_Nguyen · December 6, 2024, 12:30pm

Incidentally, this article listed New Orleans under the “Consolidated since their creation” heading, alongside San Francisco. It also claimed, without evidence, that the city has always served as the parish government since establishment. This is incorrect. In fact, unlike any other parish, it had not one but two parish governments prior to consolidation. I’ve corrected the article, moving New Orleans down to the “Merged” section and adding a fuller explanation with sources. There was a suggestion to correct this and other entries back in 2013, but apparently no one noticed it.

Apologies if you were taking Wikipedia at its word and I’ve pulled the rug out from under you.

The former was just a way of explaining what you can plainly see on the ground as you enter the city. In OSM, the quintessential mappable boundary is one that is marked on the ground. If we didn’t place such an importance on such real-world artifacts, the project wouldn’t have even allowed us to map boundaries in the first place.

The latter is the very raison d’être of OSM: local knowledge. Yes, sometimes we have to temper our local knowledge of obscure quirks for the public good. But the public is also served by us bringing something unique to the table. OSM is well-known for its detail-oriented coverage of geography, even sometimes at the cost of uniformity. In the event of a conflict between the Census Bureau and reality, we proudly and unapologetically choose reality. By trotting out a variety of mass-manufactured experiences that put San Francisco in San Francisco County, you’re reminding me of how unfortunate it was when OSM used to cause lots more data consumers to replicate that bug.

And yes, it is a bug. The Supreme Court has affirmed on multiple occasions that the City and County of San Francisco is a direct subdivision of the state, exclusive of any county. The official status is not charter city, not charter county, but rather charter city and county. It is as distinct from a city and from a county as peanut butter and jelly is yummier than peanut butter or jelly alone. How else would we get seemingly redundant phrases like this all over the Constitution and California Codes?

…any town, city, county, city and county, municipal corporations, private persons, partnerships or corporations…

In other words, the sign on the Golden Gate Bridge isn’t merely combining two names to save space; it’s stating the name of a single combined jurisdiction. This is not the case in Orleans Parish, despite it having shared a government with New Orleans for so many years.

This seems to be optimizing for naïve users who aren’t using Wikidata effectively. As you point out, each county is an instance of a statewide subclass, such as county of California (Q13212489), reflecting the fact that counties are a matter of state law. For better or worse, it’s normal and necessary to query recursively for subclasses of whatever class you’re looking for:

?county wdt:P31/wdt:P279* wd:Q13188841.

On the other hand, if you only query Wikidata for direct instances of county of the United States (Q47168), you’ll get only 36 results, all of them defunct – including St. Vrain’s County (Q2323678) and the other 11 counties of the extralegal Jefferson Territory, which not even OpenHistoricalMap currently covers. To exclude the historic counties, you need to filter out anything that’s an instance of former administrative territorial entity (Q19953632) or that has a dissolved, abolished or demolished date (P576) statement. This too is normal. In fact, every data consumer that relies on Wikidata statements ends up having to do something similar to weed out surprises.

As far as I know, each Wikidata item representing a consolidated city is an instance of consolidated city–county (Q3301053) statement. In turn, the consolidated city–county (Q3301053) item is a subclass of county of the United States (Q47168). Therefore, if the user queries recursively for U.S. counties and subclasses thereof, they will get San Francisco (Q62). That’s fine, except they will also get San Francisco County (Q13188841), an entity that legally does not exist but whose name still pops up in miscellaneous contexts. Maybe the user will notice that the two items are coextensive with (P3403) and said to be the same as (P460) each other and filter out the one they don’t want?

But let’s suppose the user only cares about California, and instead of searching for U.S. counties that are located in the administrative territorial entity (P131) of California (Q99), they search for instances of county of California (Q13212489). They get only San Francisco County, not San Francisco, because the latter isn’t an instance of a California county per se. Yet the former doesn’t have a OpenStreetMap relation ID (P402) to point to. You got me!

There are a couple solutions to this problem that automatically get the naïve user the data they need without having to think about it:

Make San Francisco (Q62) an instance of county of California (Q13212489) too. Qualify that statement with a nature of statement (P5102) of de facto (Q712144) if that feels better. To avoid a situation where there are too many counties of California, also make San Francisco County (Q13188841) an instance of former administrative territorial entity (Q19953632).
Create a new item to represent consolidated city–counties in California, as a subclass of both city of California (Q13218357) and county of California (Q13212489).

Yes, Overpass has a dedicated lrs_in() function for testing whether a value appears in a semicolon-delimited value list. For example, if you merely search for cuisine=french in New Orleans:

nwr["cuisine"="french"](area.searchArea);

you’ll miss Café du Monde, which is tagged cuisine=coffee_shop;french. (No donut?) Fortunately, you will get your beignets if you treat the tag as a list:

nwr["cuisine"](area.searchArea)(if: lrs_in("french", t["cuisine"]));

Granted, this is not very discoverable or memorable, and it might run a bit slower. Most people use regular expressions instead:

nwr["cuisine"~"french"](area.searchArea);

Minh_Nguyen · December 6, 2024, 12:31pm

Invite yourself here.

willkmis · December 6, 2024, 1:31pm

I mean, that’s because there’s a difference in how the states are administered? California has Counties, direct subdivisions of the state (usually tagged admin_level=6 in OSM), and municipalities (called Cities or Towns, the terms are equivalent in CA), usually smaller than counties and with different responsibilities (usually tagged admin_level=8). Virginia has Counties and Independent Cities, direct subdivisions of the state (usually tagged admin_level=6 in OSM), and Towns, usually tagged admin_level=8. Virginia doesn’t have any non-independent Cities, because every incorporated municipality that’s within a County is a Town.

In California, every City is admin_level=8 and every County is admin_level=6. In Virginia, every City is admin_level=6 and every County is admin_level=6. These are both fine by me, as they reflect the real administrative structure of the state. The argument about California is that for the one City and County (City-and-County?), it may or may not make sense to tag it differently than other Counties, since it’s also a City, as to some (like me) it feels ‘more right’ to tag it both like every other County in the state and like every other City in the state.

Loren_Maxwell · December 6, 2024, 2:00pm

Off topic: @Minh_Nguyen – It’s funny you mention Café du Monde‬ of all places.

I was there on Tuesday for the first time ever on a first date with someone I had never met before.

It being a first date, she and I started what might be considered some normal first date discussion while we were waiting on a waiter, such as some of our odd experiences in the dating world, how we typically meet people, and eventually the etiquette of who should pay for a date.

Being an older gentleman from the south, I said I believe the man should always pay for the first date. To me, it signals his willingness to provide for her.

She seemed pleased with that answer and the rationale.

Just as we finished discussing that topic the waiter happened to arrive and we placed our orders. She ordered a large Caf‌é au Lait while I got a Black Coffee & Chicory and of course we got an order of beignets to share.

Now apparently at Café du Monde‬ you pay upfront when you order. No problem, so I pull out my card.

Oh, well – and they only take cash, of which I routinely carry zero . . .

So she paid . . .

Loren_Maxwell · December 6, 2024, 2:22pm

Agree, and I certainly didn’t mean that as a slight against local knowledge.

I only meant to highlight that the naive user would have no way to identify that New Orleans / Orleans Parish is different than San Francisco / San Francisco County based on using local knowledge.

Perhaps this problem is best thought of as two separate things (no irony intended )

how a logical distinction is arrived at (of which the naive user is probably relatively indifferent) versus
making obvious the fact that a logical distinction has been arrived at and what the distinction is (of which the naive user is probably vested)

I can see my mistake in this discussion has been in tackling the first part – trying to make or challenge some logical distinction between San Francisco and New Orleans / Orleans Parish only to find out I am a minnow amongst a school (or frenzy?) of sharks. A true victim of the Dunning-Kruger effect

While I’ve learned a lot, I’ve also realized I should retreat to focusing on the implication of second part – basically how can a naive user make their way through the OSM data without tripping up and also maintaining their valued naive user status?

I’m content to largely leave the San Francisco / San Francisco County discussion up to the experts and just focus on the simple question of how can someone best make the naive journey from Wikipedia → Wikidata → OSM?

Loren_Maxwell · December 6, 2024, 2:43pm

One quick question about how the community works – we’ve discussed a lot and my sense is that a solution is forming, but how is a decision actually made, especially when there might not be consensus on all items?

Also, I’m happy to do the tagging, update the wiki, etc., and also happy to update the Wikidata portion if the solution involves that.

This has all been very instructive – I’m glad I stumbled across this group and am more than willing to contribute!

ZeLonewolf · December 6, 2024, 3:39pm

Great question.

Personally, I am guided by the William Golding novel Lord of the Flies when describing the community mechanics. Additionally, many cite the IETF RFC 7282 definition of consensus…