Some of you might be interested in a project I’ve been working on, processing OSM planet files on Hadoop:
It’s geared toward a quite specific task: from a planet.pbf file, extracting and rasterizing the linear features with a specific tag. We needed to do this for all highways and railways as an input to an accessibility model.
We’ve leveraged the Osmosis pbf2 library to perform the deserialization which has just worked, excepting that seeking between file blocks is impossible - see the readme for the workaround.
I’d be interested to hear anyone else’s experience processing OSM data using big data tech. I’d be interested in working on a more generic framework, perhaps with support for Apache Hive or other analytic frameworks.
Any ideas on how to take things forward, formats to support etc would be of interest.