trouble loading data using the osm2pgsql tool

I am using the data found at http://download.geofabrik.de … is there a better data source?

I am using Ubuntu 18.04

I have been having some trouble loading data using the osm2pgsql tool and I had some questions about using the tool. I am assuming I am missing something obvious and likely need to tune something a little so this will all work as expected.

  1. If I understand how the tool was written and I want to load data from multiple .pbf’s, I would use --create once to clear the database and then use --append for the remaining. Correct?

(Based on behavior I am seeing, it does seem like --create and --append do not work as I think they do.)

  1. Let’s say that I load the data from north-america-latest.osm.pbf using --create and then use --append to load alaska-latest.osm.pbf. There is obviously duplicate information between these two pbfs. Would there be duplicate information in the database as well or is data duplication handled well?

  2. I downloaded all of the US State pbfs and then issued two commands:

	osm2pgsql -v -d gis --create --slim -G --hstore --tag-transform-script ~/src/openstreetmap-carto/openstreetmap-carto.lua -S ~/src/openstreetmap-carto/openstreetmap-carto.style ~/pbf/rhode-island-latest.osm.pbf

	osm2pgsql -v -d gis --append --slim -G --hstore --tag-transform-script ~/src/openstreetmap-carto/openstreetmap-carto.lua -S ~/src/openstreetmap-carto/openstreetmap-carto.style ~/data/states/*

Processing rhode-island-latest.osm.pbf is successful.

The output from the second command indicates that it processed alabama-latest.osm.pbf and alaska-latest.osm.pbf correctly, but failed on arizona-latest.osm.pbf with the error:

	Processing: Node(140k 2.8k/s) Way(0k 0.00k/s) Relation(0 0.00/s)DB writer thread failed due to ERROR: DELETE FROM planet_osm_point WHERE osm_id IN (5213494297,5213494298, ... ,41780255,41780260) failed: ERROR:  invalid memory alloc request size 1073741824

Any idea why this might have happened? Is this error expected?

  1. To try to work around the error in #3, I wrote a shell script which just contains all of the osm2pgsql commands to create or append each individual state. The script looks like:
	osm2pgsql -d gis --create --slim  -G --hstore --tag-transform-script ~/src/openstreetmap-carto/openstreetmap-carto.lua -S ~/src/openstreetmap-carto/openstreetmap-carto.style ~/data/rhode-island-latest.osm.pbf
	...
	...
	...
	osm2pgsql -d gis --append --slim  -G --hstore --tag-transform-script ~/src/openstreetmap-carto/openstreetmap-carto.lua -S ~/src/openstreetmap-carto/openstreetmap-carto.style ~/data/pbf/wyoming-latest.osm.pbf

However, I received a similar error message as I did in #3.

It processed alabama-latest.osm.pbf, alaska-latest.osm.pbf, arizona-latest.osm.pbf, and arkansas-latest.osm.pbf successfully. However, when it tried to process california-latest.osm.pbf, I got:

	Processing: Node(10310k 3.8k/s) Way(0k 0.00k/s) Relation(0 0.00/s)DB writer thread failed due to ERROR: DELETE FROM planet_osm_point WHERE osm_id IN (65602137,65602157, ... ,122869803,122869809) failed: server closed the connection unexpectedly
		This probably means the server terminated abnormally
		before or while processing the request.

  1. I grabbed the planet pdf from https://planet.openstreetmap.org and executed the following command:
osm2pgsql -v -d gis --create --slim -G --hstore --tag-transform-script ~/src/openstreetmap-carto/openstreetmap-carto.lua -S ~/src/openstreetmap-carto/openstreetmap-carto.style ~/data/planet-190408.osm.pbf

and got the following output:


	osm2pgsql version 0.96.0 (64 bit id space)

	Allocating memory for dense node cache
	Allocating dense node cache in one big chunk
	Allocating memory for sparse node cache
	Sharing dense sparse
	Node-cache: cache=800MB, maxblocks=12800*65536, allocation method=11
	Mid: pgsql, cache=800
	Setting up table: planet_osm_nodes
	Setting up table: planet_osm_ways
	Setting up table: planet_osm_rels
	Using lua based tag processing pipeline with script /home/renderaccount/src/openstreetmap-carto/openstreetmap-carto.lua
	Using projection SRS 3857 (Spherical Mercator)
	Setting up table: planet_osm_point
	Setting up table: planet_osm_line
	Setting up table: planet_osm_polygon
	Setting up table: planet_osm_roads

	Reading in file: /home/renderaccount/data/planet-190408.osm.pbf
	Using PBF parser.
	Processing: Node(1513140k 875.2k/s) Way(0k 0.00k/s) Relation(0 0.00/s)fish: “osm2pgsql -v -d gis --create --…” terminated by signal SIGKILL (Forced quit)

Any idea what may have caused this error?

  1. I get a similar error when I try to process oregon-latest.osm.pbf
	osm2pgsql -d gis --create --slim  -G --hstore --tag-transform-script ~/src/openstreetmap-carto/openstreetmap-carto.lua -S ~/src/openstreetmap-carto/openstreetmap-carto.style ~/data/pbf/oregon-latest.osm

produces:

osm2pgsql version 0.96.0 (64 bit id space)

	Allocating memory for dense node cache
	Allocating dense node cache in one big chunk
	Allocating memory for sparse node cache
	Sharing dense sparse
	Node-cache: cache=800MB, maxblocks=12800*65536, allocation method=11
	Mid: pgsql, cache=800
	Setting up table: planet_osm_nodes
	Setting up table: planet_osm_ways
	Setting up table: planet_osm_rels
	Using lua based tag processing pipeline with script /home/renderaccount/src/openstreetmap-carto/openstreetmap-carto.lua
	Using projection SRS 3857 (Spherical Mercator)
	Setting up table: planet_osm_point
	Setting up table: planet_osm_line
	Setting up table: planet_osm_polygon
	Setting up table: planet_osm_roads

	Reading in file: /home/renderaccount/data/pbf/oregon-latest.osm
	Using XML parser.
	Processing: Node(17403k 446.3k/s) Way(1417k 23.23k/s) Relation(540 540.00/s)fish: “osm2pgsql -d gis --create --sli…” terminated by signal SIGKILL (Forced quit)

why might this be happening?

I’d suggest downloading all the data that you want, merge it together (e.g. with osmosis) and then load that once with --create. As you’ve found, you will get problems trying to merge data with osm2pgsql.

Understood. Thank you for the suggestion.

However, why would processing planet-190408.osm.pbf fail? Shouldn’t it be merged as you suggest?

Perhaps I am running out of space or something while processing it? But, that would not explain why processing the Oregon pbf is failing.

The most likely reason is that you have nowhere near enough memory for the processing that you’re trying to do.

Any thoughts on why the processing the much smaller Oregon pbf might fail in a similar manner?

What kind of ram or disk space might be required to process the planet data?

The same reason, perhaps? How much disk space and RAM have you actually got?

I’ve no idea how much in the way of resources you’ll need for a full planet file these days. England from http://download.geofabrik.de/europe/great-britain.html is about 800Mb, and imports happily for me in a 5Gb RAM virtual machine. Scotland is about the same size as Oregon, and should import happily in about 3Gb.

I have ~250gb of disk space (can get more if required) and ~25gb of ram available to me.

Furthermore, I have been able to use the tool and --create with the California pbf without any problems and that one is a lot bigger – also about 800mb.

What I am wondering is if the problem might be (a) the oregon pbf is corrupt or (b) with my switching between --create and --append, perhaps the postgresql db has become corrupted.

Neither of those options seem likely, but until I can come up with a better thought, I will see what I can do to investigate both of them. If you (or anyone else) has any better ideas, please let me know.

What’s weird is that I used https://extract.bbbike.org to obtain a pbf containing Oregon and appeared to get the same problem…processing the data just does not work. Obtaining a different region worked without any problems.

I am beginning think the problem is indeed corrupt data in that area. Perhaps someone else could try it and and see if they experience the same problem.

The latest Oregon from download.geofabrik.de works for me. The actual script I used was https://github.com/SomeoneElseOSM/SomeoneElse-style/blob/master/update_render.sh (but using “oregon” instead of “new-york”).

Without changing anything about how I was doing things, I redownloaded the oregon data, which has a different md5sum then the version I had been using, and it now works.

Without a better explanation, I will blame corrupted data which has now been fixed.