Manhattan Community Boards as administrative boundaries

Minh_Nguyen · July 9, 2024, 7:05pm

To reiterate, census geographies are a rule of thumb, not a rule. The Census Bureau, as a rule, does not acknowledge any reportable subdivisions within an organized, incorporated municipality, going as far as to automatically abolish all the CDPs within a New England town as soon as it incorporates as a city, regardless of whether there are any real-world demographic changes to justify that step. On the other hand, it mostly ignores Puerto Rico’s administrative structure in favor of an ahistoric but more demographically useful place hierarchy of its own. And its Boundary and Annexation Survey is perhaps the leading source of information about Ohio’s paper townships, which exist only on paper.

ZeLonewolf · July 9, 2024, 7:06pm

Seems about as likely as a simple definition of when a railway becomes abandoned

ElliottPlack · July 9, 2024, 8:25pm

I am only looking at the census for guidance and inspiration here. They may have figured this out already. I’ll look deeper at PR though.

stevea · July 9, 2024, 9:52pm

I’m with others who think this is some “for-the-renderer” goof; faulty reverse-engineering going on in somebody’s imagination or unexplainable brokenness somewhere. I cheer and applaud Sarah for outstanding work on Nominatim. But, right now, this geocodes only roughly (correctly, though not in the local vernacular of which members matter, which in the present algorithm, don’t). For example, in the USA, People often say we are from “City (or town), State” (the USA is implied). When I’m in a more international context (I’m not just right here, as we are in the United States Discourse Community), I’ll say from “California, in the United States of America.”

For Sarah’s engineering to capture so many subtle aspects of “local, because it is appropriate, code-switching” like omitting a city or including only state and country for more-global friends, would be a colossal task. I code-switch like this all the time and I still don’t always feel like I’ve cleared all the grinding out of the gears as I pedal forward in a new gear. So, it isn’t Sarah being stupid, it is Sarah (Nominatim) saying “here’s the best damn geocode I am able to give you right now” (and they’re pretty damn good, imo, with “what we have today”) and it is up to you to code-switch it into something better (your preferred local perfection), if you 'd like to, downstream user.

It’s not that the “Results from…” are problematic. It is that “perfect geocoding” (with its likely billions or trillions of sub-codings of subtle and local nuance) is solvable, but it is “large and complex.” Nominatim is part-way there.

The USA in OSM with admin_level has a sorta-fragile, still-holding, slowly grows and improves skeleton of consensual hallucination about it that has acquired decades of heads nodding about it “down to” level 8. And even there, there are minor errors, what might be called even minor skirmishes, and good discussion that quenches squabbles. I think that’s OK, we stand firm on this foundation. With 9 and 10, there is more of sort of “what the locals think kinda makes it so, even if they can’t explain it to the rest of us.” Heck, I find a vast distance between 10 (Community Districts) and 6 (Borough), unique in USA as NYC is unique in USA. And I am nodding and see others nodding (about these being 10s in OSM). Oh, I’ve updated our wiki (US_admin_level and United_State/Boundaries) so NYC Community Districts have scooched over from 8 to 10 in our map data, simply reflecting the reality that Elliott called to our attention at the top.

We chug along forward. We do now have methods to tag these (admin boundaries, census boundaries…). I do find this really is great discussion; thanks everyone. I’ve learned a lot here.

I especially thank OP @ElliottPlack for his topic origination and continuing research with our Department of Commerce. “GIS-worlds” (for lack of a way of better describing disparate GIS communities) including OSM benefit from bumping up against one another and learning from each other, I think that’s awesome.

Minh_Nguyen · July 9, 2024, 10:17pm

Special tax districts always have well-established boundaries – you’d never hear the end of it otherwise! But I don’t think we should be mapping TIF zones and TIDs and so forth. They overlap in all sorts of ways and by definition exist only on (very important) paper. Then there are the hundreds of kinds of special districts, which can also impose their own taxes.

Besides, the administrative in boundary=administrative refers to the concept of a political boundary.^[1] Not every boundary used in government administration is a political boundary, and some political boundaries are only minimally used in government administration. To the extent that community districts count as political boundaries, it’s only because the government has somehow made them function enough like bona fide political boundaries that we’re able to model cleanly. By this standard, some close calls are inevitable, like how we’ve administratively subdivided Connecticut into planning regions instead of counties, siding with the Census Bureau and the USGS against other geography-heavy agencies like the National Weather Service and the CDC.

stevea:

ZeLonewolf:

Personally I think it’s stupid that “Manhattan Community Board 5” is part of the geocoding there, but if the people of NYC think it fits the bill, who am I to argue?

I’m with others who think this is some “for-the-renderer” goof; faulty reverse-engineering going on in somebody’s imagination or unexplainable brokenness somewhere. I cheer and applaud Sarah for outstanding work on Nominatim. But, right now, this geocoding only roughly (correctly, though not in the local vernacular of which members matter, which in the present algorithm, don’t). For example, in the USA, People often say we are from “City (or town), State” (the USA is implied). When I’m in a more international context (I’m not just right here, as we are in the United States Discourse Community), I’ll say from “California, in the United States of America.”

Ultimately, this is a mismatch between the geocoder’s capabilities and user expectations influenced by the UI. Nominatim comes up with a correct list of bounding administrative areas, but the user misperceives it as an address, because the UI presents it as a comma-separated list of places where other geocoders display the address. Most of the elements are correct but some are off – an uncanny valley. This frequently generates complaints about an address that should name a city other than the one it’s physically located in. (On the other hand, I’m glad we finally got rid of the CAL FIRE districts and TxDOT districts, which were inaccurately listed as bounding administrative areas for a while.)

Not to be confused with an electoral boundary, yay English… ↩︎

stevea · July 9, 2024, 10:38pm

Awesome reply, Minh. In Connecticut, OSM tipped over admin_level=6 into COG-land as “say the locals,” yet county boundaries remain in OSM: judicial districts, a branch of government, are still defined by county lines here, so in effect, the state has two middle-tier governments (though if COGs are “modern,” county boundaries might be called “antiques used for judicial districts”). Personally (and I don’t think I’m alone) Connecticut counties are “useful geographical data” and I am glad OSM has them (as well as COGs, and if locals call these 6 and so does the Census Bureau, OK, and yes, we denote this in our wiki).

Uncanny valleys happen, they are a weird edge. Humans and robots are simply like this at this strange edge. Geocoding is difficult and almost has to read people’s minds to be perfect; “local” rules and subtle aspects of appropriateness (given certain data, starting with that “stack of numbers’ names”) are seemingly endless.

I recently saw CALFIRE districts, but yes, they have decreased; maybe I had old caches. Those were a sore spot for a while.

Little spats between this or that federal government agency (and OSM must choose) are nothing new; see US Virgin Islands, which you can chop into three islands or two municipalities, and one (federal) Department chose three, another chose two.

As a general observation, OSM seems to lean towards denoting “as things are” while the Census Bureau leans towards “everything a county or county equivalent” (in one way of how they express data) or they simply invent categories and utter into them. That’s not what OSM means by admin_level, so we (rightly) apply more strictness.

Such a rich topic of discussion we’re enjoying here; again, I’ve learned a lot.

ZeLonewolf · July 10, 2024, 8:00pm

Absolutely, reverse geocoding in the general sense as done by Nominatim, is not an address and isn’t intended to be. It’s (in Nominatim) a hierarchical listing of the political boundaries that a place is in. For example, counties do not appear in addresses, but they do appear, correctly so, in the Nominatim reverse geocoder.

But, my point was lost above, so I’ll re-state it hopefully more clearly. In understanding whether a feature fits in the same category as other features with the same tag, one way to examine this fact is to view how data consumers do or might interpret the feature in context. So, I prepared a couple completely made-up and exaggerated mockups to demonstrate pictorally how I see the pitfall of having non-political boundaries in the politcal hierarchy using examples that we all agree aren’t political boundaries:

Now, if these examples look wrong, which hopefully they do – why do they look wrong? For me, it’s because these places – a police precinct and a school district – are not the type of place that I expect to see in a general-purpose setting when I want a description of where a place is. Aside from a deep ontological analysis of what differentiates a police precinct or school district from a municipality, I would sum this up as: the examples fail the duck test – a police precinct or school district are not in the same category as a city, county, state, or country. They’re in a different category.

Similarly, I, as an occasional (2-3x per year) visitor to NYC, feel that these community ~~boards~~ districts similarly fail this duck test, and the deep dive into what they are has only served to confirm rather than contradict my understanding of them. Now – an occasional visitor is not the same as being a resident – and if the predominant view amongst New Yorkers is that they do pass the duck test, then they are right and I am wrong.

Now, if we think instead that these mockups simply demonstrate examples of a data consumer misinterpreting data, that’s certainly a valid position. But what happens when boundary=administrative is used for both entities that pass the duck test for place and entities that fail the duck test? You are then demanding that a data consumer that wishes to use boundary=administrative for sensible reverse geocoding to have collections of rules to differentiate which of these boundaries are useful for general-purpose use cases versus ones that are esoteric and obscure.

I think that’s unfair to data consumers if we can’t tag these in some way that differentiates the two cases. I’m certain given the complexity of the US that we will eventually come up with boundaries that so uncontroversially should sit at =9 or =10 that we would want to differentiate them from ones that are so obscure that the general public would be confused by their appearance in applications and maps.

Minh_Nguyen · July 10, 2024, 8:28pm

Oddly enough, the colloquial form seems to be “locations ~~near~~ in Community ~~District~~ Board 5”. Community districts are space-filling, so if you’re only “near” one, then you’re in another one entirely. (Same goes for boroughs.) The “Board” part drives me almost as batty as having to stand “on” line for a table, but who am I to judge? To a New Yorker, I probably don’t even pronounce “Ruby Tuesday” correctly. Fortunately, there seems to be some appetite for renaming them to “districts”, no doubt over Buffalo wings.

stevea · July 10, 2024, 8:34pm

Hyperlocal goes bogus. That school district and that Louisiana restaurant (together) are nonsensical. And, I might not be so smart next time.

Be thankful that a precinct and a school district are weird places and get your creepy feelers itching. We must be wary of (obvious? only maybe…) bogus localities.

If you say I am confused by mockups, you have lost me, otherwise I’m too simpleminded; not sure why you show us those.

I am certain of nothing below 8 (for now). I am hopeful in building structure ahead on better and best 9s and 10s. We seem to have some forward momentum.

For example, OSM denotes Connecticut (a wild ride until a few months ago). I’ll add, “fairly accurately, fairly simply, fairly well-documented as to why and how.” USA already do this, to a large degree (with many patches blooming), and then we get to 8 or so, below, to 9 and 10, it remains blurry. Fine. We can land this without crashing. Coming up with “uncontroversial” boundaries, well, that’s a tall order.

First, we take Manhattan.

If there is anything I have learned is there is always another book at the library about this one might read. Let’s listen. There’s somewhere between 3 (4, 5…) and 8 billion of us.

MxxCon · July 11, 2024, 4:58am

Personally, I disagree with this position, not just for NYC but anywhere. I believe whenever possible OSM should try to avoid “local preference supersede global norm”.
If every nook and cranny comes up with their own style, spec and schema of mapping and tagging things, data consumers will have very difficult time creating maps and apps based on that data. They would be spending a ton of time creating exceptions for every non-standard area, or have just very broad rules that would include a lot of irrelevant data to a given purpose.
Yes, I know it’s already happening to some degree, but I think we should be aiming to eliminate such things whenever possible.

I think such community districts are potentially useful to have on the map, however, I wouldn’t want NYC to be a non-standard exception to the norm. (I know that NYC as a whole is =5 exception.)

I’d like to have a cohesive way to tag such boundaries, if not all over the world, then at least in US.

Minh_Nguyen · July 11, 2024, 6:10am

A bad reason for using boundary=administrative would be “because the locals feel like it” or “because the locals don’t think [insert general principle] should apply to them”. A bad reason for not using boundary=administrative would be “it makes the world more complicated” or “it makes the database less consistent”. A good reason would be that it does or does not adhere to general principles such as the on-the-ground rule or the duck test, when considering all the facts.

In other words, I view this discussion as an effort to determine the true nature of these boundaries, not about whether to make an exception for them. Your local knowledge would be very helpful for filling in our gaps in knowledge, especially because we seem to be in the business of characterizing what your average New Yorker on the street would find acceptable.

Reading between the lines, it seems like some of us might concede that the boundaries are real and meet the criteria for boundary=administrative but only pedantically. We don’t have an established syntax for characterizing the tangibility of a boundary like we do for route numbers. (Marginally signposted routes can be tagged as unsigned_ref=* despite the signs.) For boundaries, what we do have are admin_level=* and border_type=*, which together convey local nuances so much more effectively than a boundary=* tag that somehow has to be applied consistently around the world.

I just find it rich that we’re rabbitholing on New York’s community districts when many other “administrative” boundaries out there are clearer cases for retagging. Even if the community districts aren’t the paragon of a political boundary, why focus all of the effort on the 20% that are already 80% right, while ignoring the 80% that are 20% right?

stevea · July 11, 2024, 10:53am

Yup. Thanks again, Minh, this time for putting your finger right on the beating pulse of this.

ezekielf · July 11, 2024, 2:40pm

Oh you mean like all these Census Designated Places in New York that are currently tagged boundary=administrative + admin_level=8 + border_type=hamlet;CDP?

ZeLonewolf · July 11, 2024, 4:21pm

We aren’t discussing these because we have a clear consensus on how to re-tag CDPs which are erroneously tagged as admin boundaries. I even wrote a tool that flags these issues. It’s just a matter of someone taking the time to go through the 2800 issues flagged for New York State and fixing them.

The reason we’re discussing these boundaries in particular is the lack of consensus.

ZeLonewolf · July 11, 2024, 4:31pm

I agree for sure. It’s not local preference I’m fishing for, but local understanding. I can apply concepts like the “duck test” quite readily in places I’ve lived and have intimate knowledge of. When it comes to places I’ve only visited, or worse, places I’ve never visited, I find it much more difficult to substitute my judgement for that of someone truly on the ground. So it’s not a preference thing, it’s more of a “hey person on the inside, does this look from the inside the same way it looks from the outside?” kind of thing.