Parsing an osm-file with PHP simpleXML :: getting a few more tags

I am new to PHP’s SimpleXML. i want to work with SimpleXML on OSM-files.

The original version of this question was derived from here: OSM Data parsing to get the nodes with child https://stackoverflow.com/questions/16129184/osm-data-parsing-to-get-the-nodes-with-child

I am thankful that hakre offered a great example in the comments that makes a overwhelming
starting point for my project. Below I have added my own answer to the question, how to refine the code to ad more tags. I can work on the methods using SimpleXML and Xpath; The job is most easily done with xpath, the used PHP XML library is based on libxml which supports XPath 1.0 which covers the various querying needs very well.

**goal: **how to get more out of it: I want to filter the data to get the nodes with special category. Here is sample of the OSM data I want to get the whole schools within an area. The first script runs well - but now I want to refine the search and add more tags. Finally I want to store all into MySQL.

So we need to make some XML parsing with PHP:

The following is a little OSM Overpass API example with PHP SimpleXML

should this be added in this part!?


# get all school nodes with xpath
$xpath = '//node[tag[@k = "amenity" and @v = "school"]]';
$schools = $result->xpath($xpath);
printf("%d School(s) found:\n", count($schools));
foreach ($schools as $index => $school)
{
    # Get the name of the school (if any), again with xpath
    list($name) = $school->xpath('tag[@k = "name"]/@v') + ['(unnamed)'];
    printf("#%02d: ID:%' -10s  [%s,%s]  %s\n", $index, $school['id'], $school['lat'], $school['lon'], $name);
}

since i am learning - i break down the code into pieces…For my question, the second part is more interesting here.

That is querying the XML data we have already. Again - as mentioned above: This is most easily done with xpath, the used PHP XML library is based on libxml which supports XPath 1.0 which covers the various querying needs very well. The following example lists all schools and tries to obtain their names as well.

# get all school nodes with xpath
$xpath = '//node[tag[@k = "amenity" and @v = "school"]]';
$schools = $result->xpath($xpath);
printf("%d School(s) found:\n", count($schools));
foreach ($schools as $index => $school)
{
    # Get the name of the school (if any), again with xpath
    list($name) = $school->xpath('tag[@k = "name"]/@v') + ['(unnamed)'];
    printf("#%02d: ID:%' -10s  [%s,%s]  %s\n", $index, $school['id'], $school['lat'], $school['lon'], $name);
}

The key point here are the xpath queries: Two are used, the first one to get the nodes that have certain tags.

//node[tag[@k = "amenity" and @v = "school"]]

This line says: Give me all node elements that have a tag element inside which has the k attribute value “amenity” and the v attribute value “school”. Explanation: This is the condition we have to filter out those nodes that are tagged with amenity school.

Further on xpath is used again - a second time: now relative to those school nodes to see if there is a name and if so to fetch it: Therefore we use the foreach-syntax:


foreach ($schools as $index => $school)
{
    # Get the name of the school (if any), again with xpath
    list($name) = $school->xpath('tag[@k = "name"]/@v') + ['(unnamed)'];
    printf("#%02d: ID:%' -10s  [%s,%s]  %s\n", $index, $school['id'], $school['lat'], $school['lon'], $name);
}

and


tag[@k = "name"]/@v'
= $school->xpath('tag[@k = "name"]/@v') + ['(unnamed)'];

and this is pretty important


tag[@k = "name"]/@v'

This line says: Relative to the current node, give me the v attribute from a tag element that as the k attribute value “name”. As you can see, some parts are again similar to the line before. I think you can both adopt them to your needs.

Because not all school nodes have a name, a default string is provided for display purposes by adding it to the (then empty) result array:


list($name) = $school->xpath('tag[@k = "name"]/@v') + ['(unnamed)'];
                                                    ^^^^^^^^^^^^^^^
                                                Provide Default Value

So here some of the results for that code-example:


Query returned 907 node(s) and took 1.10735 seconds.
more than 2000 School(s) found:
#00: ID:332534486   [39.5017565,16.2721899]  Scuola Primaria
#01: ID:1428094278  [39.3320912,16.1862820]  (unnamed)
#02: ID:1822746784  [38.9075566,16.5776597]  (unnamed)
#03: ID:1822755951  [38.9120272,16.5713431]  (unnamed)
#04: ID:1903859699  [38.6830409,16.5522243]  Liceo Scientifico Statale A. Guarasci
#05: ID:2002566438  [39.1347698,16.0736924]  (unnamed)
#06: ID:2056891127  [39.4106679,16.8254844]  (unnamed)
#07: ID:2056892999  [39.4124687,16.8286119]  (unnamed)
#08: ID:2272010226  [39.4481717,16.2894353]  SCUOLA DELL'INFANZIA SAN FRANCESCO
#09: ID:2272017152  [39.4502366,16.2807664]  SCUOLA MEDIA 

and now i try to figure out how i can enter more xpath queries at the above mentioned code

goal: to get out even more important data - see here Key:contact - OpenStreetMap Wiki

Well - we are already extracting the name: If we want to have more data then we just have to run a few more xpath queries inside our loop for all the address keys and the website. So - additionally: we do not have to forget to look for the website key additional to contact:website. cf: https://wiki.openstreetmap.org/wiki/Key:website

**conclusio: **well - i think that i need to extend the xpath requests within the loop where xpath is used again, now relative to those school nodes to see if there is a name and if so to fetch it:


tag[@k = "name"]/@v'
tag[@k = "contact:website"]/@v'
tag[@k = "contact:email"]/@v'

What do you say…?

i did some further tess and found out very interesting things

see more here: - the code that runs very well:


#'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''#
<?php
/**
 * OSM Overpass API with PHP SimpleXML / XPath
 *
 * PHP Version: 5.4 - Can be back-ported to 5.3 by using 5.3 Array-Syntax (not PHP 5.4's square brackets)
 */
//
// 1.) Query an OSM Overpass API Endpoint
//

$query = 'node
  ["amenity"~".*"]
  (38.415938460513274,16.06338500976562,39.52205163048525,17.51220703125);
out;';

$context = stream_context_create(['http' => [
    'method'  => 'POST',
    'header' => ['Content-Type: application/x-www-form-urlencoded'],
    'content' => 'data=' . urlencode($query),
]]);

# please do not stress this service, this example is for demonstration purposes only.
$endpoint = '[url]http://overpass-api.de/api/interpreter[/url]';
libxml_set_streams_context($context);
$start = microtime(true);

$result = simplexml_load_file($endpoint);
printf("Query returned %2\$d node(s) and took %1\$.5f seconds.\n\n", microtime(true) - $start, count($result->node));

//
// 2.) Work with the XML Result
//

# get all school nodes with xpath
$xpath = '//node[tag[@k = "amenity" and @v = "school"]]';
$schools = $result->xpath($xpath);
printf("%d School(s) found:\n", count($schools));
foreach ($schools as $index => $school)
{
    # Get the name of the school (if any), again with xpath
    list($name) = $school->xpath('tag[@k = "name"]/@v') + ['(unnamed)'];
    printf("#%02d: ID:%' -10s  [%s,%s]  %s\n", $index, $school['id'], $school['lat'], $school['lon'], $name);
}


//node[tag[@k = "amenity" and @v = "school"]]
//tag[@k = "name"]/@v'

$query = 'node
  ["addr:postcode"~"RM12"]
  (51.5557914,0.2118915,51.5673083,0.2369398);
   node
  (around:1000)
  ["amenity"~"fast_food"];
           out;';

$context = stream_context_create(['http' => [
    'method'  => 'POST',
    'header' => ['Content-Type: application/x-www-form-urlencoded'],
    'content' => 'data=' . urlencode($query),
]]);

$endpoint = '[url]http://overpass-api.de/api/interpreter[/url]';
libxml_set_streams_context($context);

$result = simplexml_load_file($endpoint);
printf("Query returned %2\$d node(s) and took %1\$.5f seconds.\n\n", microtime(true) - $start, count($result->node));


see the results:


me/martin/dev/php/o1.php on line 68
linux-3645:/home/martin/dev/php # php o1.php
Query returned 2799 node(s) and took 17.02055 seconds.

33 School(s) found:
#00: ID:332534486   [39.5018840,16.2722854]  Scuola Elementare
#01: ID:1428094278  [39.3320912,16.1862820]  (unnamed)
#02: ID:1822746784  [38.9075566,16.5776597]  (unnamed)
#03: ID:1822755951  [38.9120272,16.5713431]  (unnamed)
#04: ID:2002566438  [39.1349460,16.0736446]  (unnamed)
#05: ID:2056891127  [39.4106679,16.8254844]  (unnamed)
#06: ID:2056892999  [39.4124687,16.8286119]  (unnamed)
#07: ID:2272010226  [39.4481717,16.2894353]  Scuola dell'infanzia San Francesco
#08: ID:2272017152  [39.4502366,16.2807664]  Scuola Media
#09: ID:2358307794  [39.5015031,16.3905965]  I.I.S.S. Liceo Statale V. Iulia
#10: ID:2358307796  [39.4926280,16.3853662]  Liceo Classico
#11: ID:2358307797  [39.4973761,16.3858275]  Scuola Media
#12: ID:2358307800  [39.5015527,16.3941156]  I.T.C. e per Geometri
#13: ID:2358307801  [39.4983862,16.3807796]  Istituto Professionale
#14: ID:2448031004  [38.6438417,16.3873106]  (unnamed)
#15: ID:2458139204  [39.0803263,17.1291649]  Sacro Cuore
#16: ID:2552412313  [39.0765212,17.1224610]  (unnamed)
#17: ID:2582443083  [39.0815417,17.1178983]  Liceo Socio Biologico Gravina
#18: ID:2585754364  [38.8878393,16.4076323]  Scuola Elementare
#19: ID:2585754366  [38.8877600,16.4076216]  Scuola Media
#20: ID:3071126720  [38.6022703,16.5554408]  Scuola Media
#21: ID:3071127683  [38.6027273,16.5563125]  Scuola Elementare
#22: ID:3081362915  [39.2865638,16.2601963]  Convitto Nazionale Bernardino Telesio
#23: ID:3081362921  [39.2856714,16.2613594]  Liceo Classico B. Telesio
#24: ID:3081362926  [39.2888949,16.2577446]  Scuola
#25: ID:3732551794  [39.5132435,16.2863285]  (unnamed)
#26: ID:3740289655  [39.5167318,16.2838146]  scuola media
#27: ID:3740289656  [39.5164344,16.2821103]  scuola elementare
#28: ID:4004532684  [38.7804787,16.5122952]  Liceo Artistico
#29: ID:4589289756  [38.6794209,16.1063084]  Scuola Comprensiva Trentacapilli
#30: ID:4843966477  [39.0709866,17.1288384]  Pegaso
#31: ID:5297629775  [38.5768845,16.3263536]  Scuola Media Statale "Ignazio La Russa"
#32: ID:5316865306  [39.0807997,17.1264225]  Enrico Fermi
Query returned 3 node(s) and took 17.44780 seconds.


so far so good : if i add some lines in the part 2 i run into errors… -see below:
background: i want to get more data out of the dataset - i wnat to have more information about.

i want to get more data out of it: - and coded like so;


{
    # Get the name of the school (if any), again with xpath
    list($name) = $school->xpath('tag[@k = "name"]/@v') + ['(unnamed)'];
    list($name) = $school->xpath('tag[@k = "contact:website"]/@v');
    list($name) = $school->xpath('tag[@k = "contact:email"]/@v');
    printf("#%02d: ID:%' -10s  [%s,%s]  %s\n", $index, $school['id'], $school['lat'], $school['lon'], $name);
}

note - within the part 2 that works with the XML-Result.



//
// 2.) Work with the XML Result
//

# get all school nodes with xpath
$xpath = '//node[tag[@k = "amenity" and @v = "school"]]';
$schools = $result->xpath($xpath);
printf("%d School(s) found:\n", count($schools));
foreach ($schools as $index => $school)
{
    # Get the name of the school (if any), again with xpath
    list($name) = $school->xpath('tag[@k = "name"]/@v') + ['(unnamed)'];
    list($name) = $school->xpath('tag[@k = "contact:website"]/@v');
    list($name) = $school->xpath('tag[@k = "contact:email"]/@v');
    printf("#%02d: ID:%' -10s  [%s,%s]  %s\n", $index, $school['id'], $school['lat'], $school['lon'], $name);
}


the question is: how to get more out of it… at least the address and the website and now i try to figure out how i can enter more xpath queries at the above mentioned code and get out even more important data - see here Key:contact - OpenStreetMap Wiki

contact:phone
contact:fax    
contact:website
contact:email

I will dig into all documents and come back later the weekend… and report all the findings

well - i think that i need to extend the xpath requests within the loop where xpath is used again, now relative to those school nodes to see if there is a name and if so to fetch it:

tag[@k = “name”]/@v
tag[@k = “contact:website”]/@v
tag[@k = “contact:email”]/@v

What do you say…?