I’ve got an application where we’re trying to tag ‘urban’ areas in the United States. Yes, it’s a fairly nebulous term, but people have said that they’d like to use it in heuristics.
I started by just marking everything within an “admin_level” 8 and “boundary” “administrative” polygon, and threw this data set over the fence, and instantly got back complaints about New York City, which is listed as “admin_level” 5.
What’s the intent on the data going forward? Should I special-case New York City? Am I going to get way too much area if I include all admin_level 5 regions in the United States? The two Wiki pages which cover this don’t seem to have a lot on the discussion of how this decision evolved, I’d love to get some pointers to the discussions where this got hashed out.
New York City had some changes to accommodate the borough concept. I don’t quite understand all the details surrounding boundary tagging, but there is more information at http://wiki.openstreetmap.org/wiki/United_States_admin_level .
Yeah, I saw that. I guess I need to dive into the borough geometry and see if I’m seeing stuff that’s within the city but outside of a borough, or if boroughs use some other admin level.
Know if there’s anywhere else in the US that this might be the case?
Dan, I don’t know if too much time/too many months have passed since you posted this, however I now reply. I have been active trying to both document admin_level accurately and see that admin_level gets used properly in the USA. To do that, I have “channeled consensus” over years regarding “proper” admin_level values in the USA such that the admin_level wiki page we now have rather accurately documents how 50 states and territories (should) do this. (The so-called “prescriptive” method for wikis. There is a “descriptive” page called United States/Boundaries written by fellow OSM contributor Minh Nguyen, with whom I also work on the USBRS WikiProject). There may be six or eight states in the admin_level wiki which still have some rough edges (many in New England, which has a rich, lengthy and complex history regarding how these six states break up their administrative governance) but “forty-something, plus all US territories” are fairly well-documented as to how we should tag them regarding admin_level (+ boundary=administrative). The topic is complex, but ultimately understandable.
By including all admin_level=8 areas (cities; what are effectively “incorporated cities, towns and villages” in the USA) you are on the right track to including a “good swath” of urban areas into a geo-set. However, it isn’t that simple. There are unincorporated areas (again, which are complex in the USA, and our wikis note this) as well as exceptions like Consolidated City-Counties, Independent Cities (all of which are polygons or multi-polygons) as well as place=town nodes (among others) which might also be considered urban areas. I’m not sure it is programmatically determinable as you appear to be wishing or treating it as, but again, starting with admin_level=8 is a good start.
I’m not sure how you might add additional algorithmic elements to the determination to get to an answer data set which is satisfactory, but I do offer you my listening ear to encourage you to ask further questions about what you might find what you believe to be right, wrong or puzzling about OSM’s data in these regards in the USA.
If you want a draft OSM-derived set of Urban Areas for North America (US & Canada) you can try: https://github.com/SK53/ua2/tree/master/north-america
These are derived using road length according to the technique originated by vlasvlasvlas (paper Sotm-14) and minor refinements descried in my blog about 18 months ago. They are by no means perfect, but the idea is that the data on github can be refined by multiple means.