How to extract streets with name, location/polygon AND village-name, city-name, from downloaded OSM data file (pbf), using osmfilter preferably

I have found these 2 relevant topics

but, unfortunately, it is not clear to me how to add on them so as to get city-name, village-name etc.

I would like to list all streets (residential, etc.) with their location coordinates (or a set of coordinates) and with all the administrative levels it belongs to, e.g. village-name, neighbourhood-name, district-name, etc.

Ideally I would like to do this from my linux terminal, using osmfilter and not via online overpass query etc.

Or, I can use python code and osmium.SimpleHandlerjust like Maksym does here: https://max-coding.medium.com/extracting-open-street-map-osm-street-data-from-data-files-using-pyosmium-afca6eaa5d00 unfortunately I do not know how to add the village names etc. to that.

1 Like

‘adding’ settlement data, essentially everything that is an area, to enclosed elements fundamentally requires, one way or the other, spatial operations.

In the case of OSM this requires actually creating those areas out of the raw data and typically employing heuristics for places that don’t have associated areas.

None of the above is going to be possible by simply ‘filtering’ OSM data.

If you don’t want to write bespoke code, you have a number of options from one of the ‘query-ready’ data formats (for example GOL) to simply importing in to a postgis database with osm2pgsql.

2 Likes

thank you Simon.

I have now imported my regional PBF file into postgis using osm2pgsqlwith something like:


osm2pgsql -d "${DBNAME}" -W \
    --create --slim  -G --hstore \
    --multi-geometry \
    --tag-transform-script \
        "${BASESTYLEDIR}/openstreetmap-carto.lua" \
    -C 2500 --number-processes 2 \
    -S "${BASESTYLEDIR}/openstreetmap-carto.style" \
    "${PBF}"

(borrowed from the many tutorials on how to create a local vector-tiles OSM data server, martin).

Now I have the following tables in my DB:

mygisdb=# \d
                    List of relations
 Schema |         Name         | Type  |      Owner      
--------+----------------------+-------+-----------------
 public | geography_columns    | view  | postgres
 public | geometry_columns     | view  | postgres
 public | osm2pgsql_properties | table | postgres
 public | planet_osm_line      | table | postgres
 public | planet_osm_nodes     | table | postgres
 public | planet_osm_point     | table | postgres
 public | planet_osm_polygon   | table | postgres
 public | planet_osm_rels      | table | postgres
 public | planet_osm_roads     | table | postgres
 public | planet_osm_ways      | table | postgres
 public | spatial_ref_sys      | table | postgres


First of all, do I need to do some more prerpocessing and create more tables, because I do not see there administrative data?

Secondly, can you point me to some tutorial which shows how to do this with SQL and postgis?

thank you

You’ll find the administrative boundary data in the planet_osm_polygon table you will need to use appropriate values for boundary and admin_level to retrieve the boundaries you are interested in.

One way to do it is to query for the boundary that you want to use and then query the database for the streets in question from table planet_osm_line. Note that you will get the “segments” of the streets in question that you might have to assemble to complete roads, depending on your requirements.

I don’t really have a good tutorial handy, but I would be surprised if it doesn’t exist. If you have some kind of basic SQL and OSM knowledge, you probably can figure spatial queries out from the Postgis docs.

2 Likes

I think I have something. It prints the name of the road and all the names of the administrative groups it belongs to. It also gives the bbox of the geometry of the street and also its half-way point. Note that it produces JSON but it is not GeoJSON compatible because I filtered out things like type which add hugely to the final size. Disclaimer Google’s AI assistant got me started on the SQL by providing a rudimentary skeleton:



DBNAME=mydbname
DBUSER=mydbuser
OUTFILE='out.json'
# set a limit as to how many to fetch, e.g. 'LIMIT 10' or leave empty
LIMIT='LIMIT 10'

psql -U postgres -d "${DBNAME}" -o "${OUTFILE}" -v ON_ERROR_STOP=1 -t <<EOS
    SELECT jsonb_build_object(
      'f', jsonb_agg(feature)
    ) AS jsonout
    FROM (
      SELECT
	jsonb_build_object(
	--- geometry of the street
	--- note how we filter coordinates using ->'coordinates'
	--- note how we add bbox by the last 1 in options of ST_AsGeoJSON(...,1)
	--- decimal digits is 9 is the last-but-one option
	--- WARNING: GeoJSON needs a 'type' to be valid, but we don't want GeoJSON
	'p', jsonb_build_object(
		--- linesegment points on the street
		'EPSG:4326', jsonb_build_object(
			'p', ST_AsGeoJSON(ST_LineMerge(ST_Transform(roads.way, 4326)), 9, 1)::json->'coordinates',
			'b', ST_AsGeoJSON(ST_LineMerge(ST_Transform(roads.way, 4326)), 9, 1)::json->'bbox'
		),
		'EPSG:3857', jsonb_build_object(
			'p', ST_AsGeoJSON(ST_LineMerge(roads.way), 9, 1)::json->'coordinates',
			'b', ST_AsGeoJSON(ST_LineMerge(roads.way), 9, 1)::json->'bbox'
		)
	),
	--- halfway point of the street
	'c', jsonb_build_object(
		--- centroid of the linesegment falling on the line
		--- see https://gis.stackexchange.com/questions/254151/how-to-find-the-center-point-which-lies-on-the-linestring-geometry
		--- see http://postgis.net/docs/manual-1.5/ST_Line_Interpolate_Point.html
		--- last parameter (0.5) puts the point HALFway on the street, 0.0 will be at the start, 1.0 will be at the end
		'EPSG:4326', ST_AsGeoJSON(ST_LineInterpolatePoint(ST_LineMerge(ST_Transform(roads.way, 4326)), 0.5), 9, 0)::jsonb->'coordinates',
		'EPSG:3857', ST_LineInterpolatePoint(ST_LineMerge(roads.way), 0.5)::jsonb->'coordinates'
	),
	'n', roads.name,
	'a', ARRAY_AGG(admin.name ORDER BY admin.admin_level DESC)
      ) AS feature
      FROM
        planet_osm_roads AS roads
      JOIN
        planet_osm_polygon AS admin ON ST_Intersects(roads.way, admin.way)
      WHERE
        roads.highway IS NOT NULL
        AND roads.name IS NOT NULL
        AND admin.boundary = 'administrative'
        AND admin.admin_level IN ('2', '4', '6', '8', '9', '10')
      GROUP BY
        roads.osm_id, roads.name, roads.way
      ${LIMIT}
) AS subquery;
EOS
if [ $? -ne 0 ]; then echo "$0 : error, failed to execute SQL"; exit 1; fi

echo "$0 : done, check output in '$OUTFILE'."

run against the postgis db created from OSM regional pbf file.

I have one question though, How can I cache certain data? like: ST_LineMerge(ST_Transform(roads.way, 4326)

which I use several times and is exactly the same? Unless it is cached already by Pg?

You could use GeoDesk for this. Here’s an example for extracting U.S. states and counties. Once you have the admin area of interest, it’s trivial to obtain all the streets within it.

Or, you could work in reverse: Query all streets first, then retrieve the admin areas using a containing() query:

import geodesk

france = geodesk.Features('france.gol')

streets = france("w[highway][highway!=footway,path,steps][name]")
admin_areas = france("a[boundary=administrative][name]")
for street in streets:
    print(f"{street.name} is in:")
    for area in admin_areas.containing(street.nodes.first.centroid):
        print(f"- {area.name} ({area.admin_level})")

For each street, this will print something like:

Route de Creully is in:
- Cairon (8)
- Normandie (4)
- Calvados (6)
- Caen (7)
- France (2)
- France métropolitaine (3)

To get started, download the GOL Tool and pip install geodesk.

You can build a GOL (Geo-Object Library, a compact single-file OSM database) from any PBF file (or download a pre-made GOL for any region).