Ways to download Map Notes

AngocA · October 8, 2022, 1:33am

I am interested in downloading all and delta notes, and I only see these two options:

From Planet, at this moment is a 231 MB compressed XML file: Index of /notes
From the API with a bounding box: API v0.6 - OpenStreetMap Wiki

However, I was looking for something different, because processing a 1 GB XML file is complex. Something like a flat file, a JSON, or another format. Also, it would be interesting to have Map Notes changes in the last minute, last hour, last day. In fact, I don’t see a way to download the delta changes in Map Notes.

The only thing I have found is the Pascal Neis site, and many other sites use the data it provides with RSS or by scraping the page.

Am I missing other ways to download Map Notes data, and process them easily?

stephan75 · October 8, 2022, 6:33am

What about NotesReview

where its source code is at GitHub - ENT8R/NotesReview: 📝 Interface for searching and resolving OpenStreetMap notes …

But I have no clue about its loading process initially.

mmd · October 8, 2022, 11:00am

The OSM API endpoints for Notes do in fact support JSON output format, and by using the Notes Search endpoint, you can run global queries, and restrict them by from and to timestamps.

For json output, simply append “.json” to the URL:

https://api.openstreetmap.org/api/0.6/notes/search.json

AngocA · October 8, 2022, 1:45pm

NotesReview limits to 100 notes.

OSM API limits to 10 000 notes.

I need to analyze how many notes have been resolved this year in Latam (from Mexico to Argentina) which returns an output much bigger than these limits.

That’s what I was thinking to find a dump file in the Planet with a different format (CSV files, JSON) and not only a huge XML file.

mmd · October 8, 2022, 1:56pm

Right, that’s per single API call. It doesn’t say, that you can’t split your time period into smaller chunks and provide start/end time as needed.

What exactly is the issue with large XML file processing? SAX-based parsers do exist. Why not use them?

AngocA · October 8, 2022, 6:31pm

In fact, I am trying to insert the notes into a database, to perform queries.
Inserting directly the XML file in the database is possible, but then doing the process with xpath is complicated (for me).
Thus, I am preprocessing the notes via command line, to import flat files (csv) in the database. The xmlproc and other XML parsers for Linux requiere a lot of memory, because I am hitting OoM.
So this is becoming a lot more complicated, because the only format for all notes is XML.

Also, extracting all notes from API by chunks is overloading this mechanism. I would like to run this from time to time, and I think by using the API is feasible but not the best way.

mmd · October 8, 2022, 7:18pm

I found some Python script to parse osn files and convert them into osm file format as nodes: GitHub - sekilab/osn2osm: Transform OpenStreetMap notes *.osn to *.osm file

I haven’t tested this, but it seems to work for the whole file without hitting some memory limits.

Maybe instead of writing out the osm file, you could insert the data into a database table.

AngocA · October 9, 2022, 2:30am

I am not aware of this, but what is the purpose to convert the osn file into osm?

I tried the repository you told me and I got this:

The latest version does not compile: IndentationError: expected an indented block after function definition on line 54
The previous version fails: IndentationError: expected an indented block after function definition on line 54

mmd · October 9, 2022, 8:03am

You’re right, the code has some formatting issues, and I didn’t test it before. I’ve uploaded a fixed version here: osn2osm.py · GitHub

Now, when running

bunzip2 -c planet-notes-221009.osn.bz2 | python3 osn2osm.py > res.osm

all non-closed notes are transformed into OSM format within a few minutes.

There isn’t much benefit to convert osn to OSM format. You could maybe use osmium tools to filter all nodes according to some polygon (e.g. Osmium manual pages – osmium-extract (1)), and use the resulting file for your analysis.

Alternative option would be to use the XML parsing in the Python, and implement some database code on your own.

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="osm notes">
  <bounds minlon="-180" minlat="-90" maxlon="180" maxlat="90" origin="http://www.openstreetmap.org/api/0.6"/>

  <node id="122" version="6" timestamp="2013-04-24T16:26:39Z" uid="0" user="anonimous" changeset="122" lat="49.6652946" lon="14.2467785">
    <tag k="name" v="opened by anonimous at 2013-04-24T16:26:39Z: Pokracuje tu cesta
commented by gorn at 2013-05-08T22:40:01Z: Omlouvám se, ale nerozumím, jestli je to otázka (Pokračuje to cesta?) a nebo oznámení že tu cesta je a že jí máme domalovat.
commented by anonimous at 2013-05-08T22:51:06Z: Pokud víte kudy cesta vede, projeďte jí se zapnutou GPSkou a trasu pošlete, někdo jí domaluje.
commented by anonimous at 2013-07-28T10:06:52Z: Vim kudy vede ale nemam cas ji projit a nakreslit do mapy :)
commented by gorn at 2013-08-20T08:13:43Z: stčí projít se zapnutou GPS
commented by gorn at 2015-01-12T22:39:18Z: nebo aspoň na mapě označte kam až vede. Na Hůrku? Na východ od ní? Na západ?"/>
    <tag k="osm_note" v="yes"/>
  </node>

aTarom · October 9, 2022, 8:49am

Interesting idea of being able to make an extract of notes dump file with osmium, I have tried but gives me

OSM tag value is too long

Mateusz_Konieczny · October 9, 2022, 10:15am

Have you tried some random XML streaming parser? I am not a great fan of XML but so far I managed to handle getting data from XML into other format, including databases, this way.

AngocA · October 9, 2022, 1:58pm

Finally, I did it with Saxon, and I needed 5 GB memory for Java to process the Planet notes file. Now, I have the notes in Postgres, and I can query thing about notes resolving in Latam (# of days to close a note, user that has closed more notes, user that has closed notes in different countries, etc.)

Regarding the XML, that’s why my starting message was about asking different ways to download dump notes, if there is another method.

Mateusz_Konieczny · October 9, 2022, 2:01pm

If that will become problem then you can try streaming XML parser (that is not requiring loading of full file to parse it)

I recently used GitHub - sopherapps/xml_stream: A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators), with an option to convert to dictionaries (as it had more comprehensible docs than more widely used solutions)