What is the easiest way to go about counting the occurences of street names in the US?
What I want to do is count all the different uses of each of the 50 states as they are used in the names for roads throughout the country. Compiling a list that will show me their total usage.
Download an .osm.pbf file of the United States: Geofabrik Download Server
Convert into a .gol file using Geodesk’s gol-tool
Open the gol file and use a GQL query to select a subset of objects you are interested in - in this case, roads, similar to the safe_for_cycling query here: Sets of Features | GeoDesk Documentation
Then for each state count how many roads contain its name
You can use regular expressions to do string matching: Query Language | GeoDesk Documentation
While that would probably solve 99% of cases, there is still a sizable set of “discontinuous” named streets, i.e. those interrupted by an unnamed bridge, by a pedestrian street, or by an odd-shaped junction.
A more bulletproof method would be to just count occurrences of a street name within a jurisdiction, but it’s tricky to find out what the right jurisdiction level is where street names are guaranteed to be unique. (And I’d bet that it varies from state to state, as most things in the US).
If those odd cases are not of major interest for your application, ignore them, but have in mind that they exist.