I am interested in downloading all and delta notes, and I only see these two options:
However, I was looking for something different, because processing a 1 GB XML file is complex. Something like a flat file, a JSON, or another format. Also, it would be interesting to have Map Notes changes in the last minute, last hour, last day. In fact, I don’t see a way to download the delta changes in Map Notes.
The only thing I have found is the Pascal Neis site, and many other sites use the data it provides with RSS or by scraping the page.
Am I missing other ways to download Map Notes data, and process them easily?
What about NotesReview
where its source code is at GitHub - ENT8R/NotesReview: 📝 Interface for searching and resolving OpenStreetMap notes …
But I have no clue about its loading process initially.
The OSM API endpoints for Notes do in fact support JSON output format, and by using the Notes Search endpoint, you can run global queries, and restrict them by from and to timestamps.
For json output, simply append “.json” to the URL:
NotesReview limits to 100 notes.
OSM API limits to 10 000 notes.
I need to analyze how many notes have been resolved this year in Latam (from Mexico to Argentina) which returns an output much bigger than these limits.
That’s what I was thinking to find a dump file in the Planet with a different format (CSV files, JSON) and not only a huge XML file.
Right, that’s per single API call. It doesn’t say, that you can’t split your time period into smaller chunks and provide start/end time as needed.
What exactly is the issue with large XML file processing? SAX-based parsers do exist. Why not use them?
In fact, I am trying to insert the notes into a database, to perform queries.
Inserting directly the XML file in the database is possible, but then doing the process with xpath is complicated (for me).
Thus, I am preprocessing the notes via command line, to import flat files (csv) in the database. The xmlproc and other XML parsers for Linux requiere a lot of memory, because I am hitting OoM.
So this is becoming a lot more complicated, because the only format for all notes is XML.
Also, extracting all notes from API by chunks is overloading this mechanism. I would like to run this from time to time, and I think by using the API is feasible but not the best way.
I found some Python script to parse osn files and convert them into osm file format as nodes: GitHub - sekilab/osn2osm: Transform OpenStreetMap notes *.osn to *.osm file
I haven’t tested this, but it seems to work for the whole file without hitting some memory limits.
Maybe instead of writing out the osm file, you could insert the data into a database table.
I am not aware of this, but what is the purpose to convert the osn file into osm?
I tried the repository you told me and I got this:
- The latest version does not compile: IndentationError: expected an indented block after function definition on line 54
- The previous version fails: IndentationError: expected an indented block after function definition on line 54
You’re right, the code has some formatting issues, and I didn’t test it before. I’ve uploaded a fixed version here: osn2osm.py · GitHub
Now, when running
bunzip2 -c planet-notes-221009.osn.bz2 | python3 osn2osm.py > res.osm
all non-closed notes are transformed into OSM format within a few minutes.
There isn’t much benefit to convert osn to OSM format. You could maybe use osmium tools to filter all nodes according to some polygon (e.g. Osmium manual pages – osmium-extract (1)), and use the resulting file for your analysis.
Alternative option would be to use the XML parsing in the Python, and implement some database code on your own.
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="osm notes">
<bounds minlon="-180" minlat="-90" maxlon="180" maxlat="90" origin="http://www.openstreetmap.org/api/0.6"/>
<node id="122" version="6" timestamp="2013-04-24T16:26:39Z" uid="0" user="anonimous" changeset="122" lat="49.6652946" lon="14.2467785">
<tag k="name" v="opened by anonimous at 2013-04-24T16:26:39Z: Pokracuje tu cesta
commented by gorn at 2013-05-08T22:40:01Z: Omlouvám se, ale nerozumím, jestli je to otázka (Pokračuje to cesta?) a nebo oznámení že tu cesta je a že jí máme domalovat.
commented by anonimous at 2013-05-08T22:51:06Z: Pokud víte kudy cesta vede, projeďte jí se zapnutou GPSkou a trasu pošlete, někdo jí domaluje.
commented by anonimous at 2013-07-28T10:06:52Z: Vim kudy vede ale nemam cas ji projit a nakreslit do mapy :)
commented by gorn at 2013-08-20T08:13:43Z: stčí projít se zapnutou GPS
commented by gorn at 2015-01-12T22:39:18Z: nebo aspoň na mapě označte kam až vede. Na Hůrku? Na východ od ní? Na západ?"/>
<tag k="osm_note" v="yes"/>
Interesting idea of being able to make an extract of notes dump file with osmium, I have tried but gives me
OSM tag value is too long
Hmm, yes osmium has this hardcoded length check in place, b/c keys and values in the OSM XML format may only have 255 unicode chars: libosmium/types.hpp at master · osmcode/libosmium · GitHub
If you’re only interested in statistics, you could probably cut the note description to 255 chars, so that the OSM file created by the Python script could still be processed by osmium.
Have you tried some random XML streaming parser? I am not a great fan of XML but so far I managed to handle getting data from XML into other format, including databases, this way.
Finally, I did it with Saxon, and I needed 5 GB memory for Java to process the Planet notes file. Now, I have the notes in Postgres, and I can query thing about notes resolving in Latam (# of days to close a note, user that has closed more notes, user that has closed notes in different countries, etc.)
Regarding the XML, that’s why my starting message was about asking different ways to download dump notes, if there is another method.
If that will become problem then you can try streaming XML parser (that is not requiring loading of full file to parse it)
I recently used GitHub - sopherapps/xml_stream: A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators), with an option to convert to dictionaries (as it had more comprehensible docs than more widely used solutions)