However, I was looking for something different, because processing a 1 GB XML file is complex. Something like a flat file, a JSON, or another format. Also, it would be interesting to have Map Notes changes in the last minute, last hour, last day. In fact, I don’t see a way to download the delta changes in Map Notes.
The only thing I have found is the Pascal Neis site, and many other sites use the data it provides with RSS or by scraping the page.
Am I missing other ways to download Map Notes data, and process them easily?
The OSM API endpoints for Notes do in fact support JSON output format, and by using the Notes Search endpoint, you can run global queries, and restrict them by from and to timestamps.
For json output, simply append “.json” to the URL:
I need to analyze how many notes have been resolved this year in Latam (from Mexico to Argentina) which returns an output much bigger than these limits.
That’s what I was thinking to find a dump file in the Planet with a different format (CSV files, JSON) and not only a huge XML file.
In fact, I am trying to insert the notes into a database, to perform queries.
Inserting directly the XML file in the database is possible, but then doing the process with xpath is complicated (for me).
Thus, I am preprocessing the notes via command line, to import flat files (csv) in the database. The xmlproc and other XML parsers for Linux requiere a lot of memory, because I am hitting OoM.
So this is becoming a lot more complicated, because the only format for all notes is XML.
Also, extracting all notes from API by chunks is overloading this mechanism. I would like to run this from time to time, and I think by using the API is feasible but not the best way.
all non-closed notes are transformed into OSM format within a few minutes.
There isn’t much benefit to convert osn to OSM format. You could maybe use osmium tools to filter all nodes according to some polygon (e.g. Osmium manual pages – osmium-extract (1)), and use the resulting file for your analysis.
Alternative option would be to use the XML parsing in the Python, and implement some database code on your own.
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="osm notes">
<bounds minlon="-180" minlat="-90" maxlon="180" maxlat="90" origin="http://www.openstreetmap.org/api/0.6"/>
<node id="122" version="6" timestamp="2013-04-24T16:26:39Z" uid="0" user="anonimous" changeset="122" lat="49.6652946" lon="14.2467785">
<tag k="name" v="opened by anonimous at 2013-04-24T16:26:39Z: Pokracuje tu cesta
commented by gorn at 2013-05-08T22:40:01Z: Omlouvám se, ale nerozumím, jestli je to otázka (Pokračuje to cesta?) a nebo oznámení že tu cesta je a že jí máme domalovat.
commented by anonimous at 2013-05-08T22:51:06Z: Pokud víte kudy cesta vede, projeďte jí se zapnutou GPSkou a trasu pošlete, někdo jí domaluje.
commented by anonimous at 2013-07-28T10:06:52Z: Vim kudy vede ale nemam cas ji projit a nakreslit do mapy :)
commented by gorn at 2013-08-20T08:13:43Z: stčí projít se zapnutou GPS
commented by gorn at 2015-01-12T22:39:18Z: nebo aspoň na mapě označte kam až vede. Na Hůrku? Na východ od ní? Na západ?"/>
<tag k="osm_note" v="yes"/>
</node>
Have you tried some random XML streaming parser? I am not a great fan of XML but so far I managed to handle getting data from XML into other format, including databases, this way.
Finally, I did it with Saxon, and I needed 5 GB memory for Java to process the Planet notes file. Now, I have the notes in Postgres, and I can query thing about notes resolving in Latam (# of days to close a note, user that has closed more notes, user that has closed notes in different countries, etc.)
Regarding the XML, that’s why my starting message was about asking different ways to download dump notes, if there is another method.