At first I was going to open a Trac ticket “Nominatim incorrectly lists Rendondo Beach as being in Ventura County”
Then, after a little digging, I was going to write “Nominatim lists way too many cities as being in Ventura County that are really in Los Angeles and other neighboring counties”.
Then, after a little digging, I decide to first ask about some Nomintaim basics.
One of the old-timers prefaced his answer to one of my earlier questions by saying “The Nominatim details.php is more or less a dump of the internal data Nominatim uses, so much of it is probably not very understandable unless you know how Nominatim works internally.”
Fair enough, but in reality, whatever the details.php page shows ends up being reflected in the search results. So when the details page for Ventura County (as node tagged place=county) http://open.mapquestapi.com/nominatim/v1/details.php?place_id=1619288 shows it as a “parent of” dozens of cities, including Beverly Hills, Santa Monica and, um, Los Angeles, you can be assured that searching for those cities in OSM main page (or anywhere else that uses Nominatim explicitly or under the hood) will list them, incorrectly, as being in Ventura County.
I can see that tickets along the lines “Nominatim incorrectly lists place X as being in place Y” have been created, and resolved, in Trac, but I first want to understand what’s going on.
Let me first list what I already know:
In addition to a node tagged name=Ventura, place=county (see link above), there is also a relation tagged name=Ventura County, boundary=administrative, admin_level=6, etc. http://open.mapquestapi.com/nominatim/v1/details.php?place_id=79488864 Unlike the node, this object is “parent of” the exact ten cities found in Ventura County.
This is a fairly typical situation in Southern California (I have not checked the US or the world), where a county or a city gets both a node and a polygon (relation if necessary). I think the node is added for the sake of having a nice looking label on the rendered map, I can’t see much other use since the node’s “parent of” list is usually badly out of whack, while the polygon’s is usually spot on.
The nodes for the three neighboring counties that I checked: Santa Barbara, Ventura, Los Angeles are placed far, far from the geographic center of each respective county and very near the southern border of each county. As a result, each one lists many neighboring county’s cities in the “parent of” list and few of its own. Why should it be this way? Who put them there? Were they placed to coincide with each county’s seat? Doesn’t look to be the case. Were they placed near the statistical center of county’s population? Were they just arbitrarily placed near county borders to help identify where one ends and another begins?
I thought I was beginning to understand the method to this madness until I checked the node for Orange County and found it to be more or less smack in the middle of the county and still listing almost all cities incorrectly under “parent of”. It’s missing the cities that are right near it, but includes far removed cities from Los Angeles County.
So I am left with guesses and questions, which are these:
Am I right in assuming that a polygon lists as “parent of” those object that are wholly contained in it?
What does “parent of” than mean for a node that’s tagged as place=*? Wild guess: it’s objects that are located within X miles from the node, X being different for every admin_level: higher X for lower admin_level. Why then does it include far away object and not include near objects almost, it seems, randomly?
I am guessing that Nominatim developers and data consumers may tell me to ignore the inaccurate parent-child relations involving nodes and concentrate instead on the accurate relations involving polygons. Fine, but shouldn’t the effects of the “bogus” relations then be removed from the search results?
If node’s parentage is not bogus by design, is the situation in Southern California the result of poorly placed county nodes? Is there a way to determine how they were placed, and should someone like me (who has the enthusiasm but not the background) move them to the geographic centers of respective polygons or should it be done by someone wiser?
Well, I guess that’s enough questions for one post. Thanks for reading!