NOAA ENC import

Ben_Gamari · September 25, 2023, 2:23am

I am writing to propose importing United States marine navigational aids from [NOAA][]'s S-57 electronic navigation charts (ENCs) into OpenStreetMap. These charts are in the public domain and serve as the canonical source for navigational mark locations in United States waters.

While OSM currently includes some United States seamarks, a prototype import of lateral buoys suggests that the majority of these buoys are currently missing from the OSM dataset. Furthermore, many of the seamarks which are present are out-of-date in either their position or metadata.

I propose that NOAA seamarks be consistently added into OSM and be kept up-to-date via automated import.

Project Overview

I am working to develop infrastructure to import and update seamarks from NOAA ENCs into OSM. The import includes lateral (BOYLAT/BCNLAT), cardinal (BOYCAR/BCNCAR), safe-water (BOYSAW/BCNSAW), and special-purpose (BOYSPP/BCNSPP) buoys and beacons, capturing a variety of attributes of each including colour, light, and fog signal information. This is similar to a seemingly stalled import attempt from 2013.

Currently the import would entail:

9318 modifications to existing OSM seamarks
13038 new seamarks which are not currently represented in OSM

Additionally, there are 1403 seamarks in OSM which do not have an obvious corresponding NOAA seamark. This is addressed below in the “Open Questions” section.

To make the import more manageable and potentially facilitate manual review, I have broken the import into regions of each roughly 500 seamarks by state boundaries and further subdivision.

Covered tags

Imported NOAA seamarks will bear the following OSM tags using the following S-101 attributes:

OSM tag	S-101 schema analogue
`seamark:name`	`OBJNAM`
`seamark:fog_signal_category`	`CATFOG`
`seamark:beacon_lateral:system`	`MARSYS`
`seamark:beacon_lateral:shape`	`BOYSHP`
`seamark:beacon_lateral:colour`	`COLOUR`
`seamark:beacon_lateral:category`	`BOYCAT`
`seamark:light:colour`	`COLOUR`
`seamark:light:visibility`	`LITVIS`
`seamark:light:exhibition`	`EXCLIT`
`seamark:light:character`	`LITCHR`
`seamark:light:period`	`SIGPER`
`seamark:light:multiple`	`MLTYLT`
`seamark:light:range`	`VALNMR`

Provenance tracking

To track the provenance of imported seamarks, I currently use a new key, source:noaa_lnam, to record the S100 LNAM field used to identify the feature in the source NOAA ENCs.

In addition to LNAM, it may also be useful to preserve the SORDAT (source date) field from the ENCs. This would make it easy to identify changes to seamarks in future updates.

Matching algorithm

To find OSM seamarks associated with NOAA seamarks a simple distance-thresholding method is currently used, taking the closest match within 200 meters. Here we call a pair of a NOAA seamark and its closest match in the OSM dataset a pair of “correlated seamarks”.

As with all automated matching algorithms, this may fail in false-positives and false-negatives:

A false-positive correlation is one where two unrelated buoys are incorrectly deemed the same by the algorithm. This would have the effect of the import redefining an existing existing OSM seamark to be NOAA seamark.
A false-negative correlation is one where two related buoys are deemed distinct. This would mean that instead of updating the existing OSM entry, the import would introduce a new, redundant seamark.

False-positive characterisation

To quantify the false-positive rate, I conducted a study comparing the NOAA OBJNAM and OSM seamark:name of correlated seamarks as an oracle for buoy equivalence. Of the 1455 seamarks that satisfy the distance-threshold criteria, 114 match exactly in their names.

Of those that do not match exactly in name, the majority match via their buoy number. For instance, here are a few typical pairings:

`seamark:name`	`OBJNAM`
7	Shilshole Bay Buoy 7
Spa Creek Channel Junction Buoy SC	Spa Creek Channel Junction Buoy SC
2DW	Dundalk Terminal West Channel Buoy 2DW
Upper Chesapeake Channel Lighted Buoy 40	Upper Chesapeake Channel Lighted Ice Buoy 40
R N “38”	Merrimack River Buoy 38
SD 4	Sea Dog Creek Buoy SD4
R18	Lewis Bay Approach Channel Lighted Buoy 18
R “4” Gong	Hampton Harbor Channel Gong Buoy 4

To match via buoy number, I compared the last tokens (as split by whitespace after dropping double-quotes) of the two names. Counting matches in buoy number, 1304 of the 1455 correlated seamarks match in name.

Examining the remaining non-matching pairs by hand revealed that only 66 of the 1455 correlated seamarks disagree substantially in their names. This suggests that the false-positive rate is quite low for this method (4.5%).

Open Questions

What to do with seamarks in the covered region that are not present in NOAA ENCs? Some of these may be non-NOAA seamarks while in other cases these correspond to seamarks that were previously present but have since been decommissioned. This puts us in the awkward situation of either losing useful information or keeping redundant, inaccurate marks.
What, if anything, should be done to mitigate false matches?
Currently seamark:name of OSM seamarks is often a descriptive string (e.g. G C 13, short for “green can 13”) rather than the proper NOAA seamark name (that is, the S57 OBJNAM field).

To take a specific example, currently the seamark with NOAA name Piscataqua River Lighted Buoy 2 is given the seamark:name value R "2" in the OSM dataset. The R in the latter name is redundant to the seamark:colour attribute (which indicates that the seamark is red). To avoid this sort of redundancy, I suggest that seamark:name should be changed to reflect the object’s NOAA name. Is there a precedent for how seamark:name should be defined?

Future work

Here I only propose to include coastal seamarks; seamarks for United States inland waterways are maintained by the Army Corp of Engineers, which provides a [similar ENC product][IENC] which could be similarly imported in the future.

Beyond seamarks, there is a wealth of additional navigational information in NOAA ENCs (e.g. cable areas, anchorage areas). Perhaps this would be a useful addition to OSM in the future.

Ben_Gamari · September 25, 2023, 2:26am

Apologies for the missing links; Discourse appears to prohibit new users from including more than three links. The missing links are:

NOAA: Charting | National Oceanic and Atmospheric Administration
IENC: https://navigation.usace.army.mil/Survey/InlandCharts

pnorman · September 25, 2023, 2:32am

What are all the tags you are proposing to use?

Ben_Gamari · September 25, 2023, 3:06am

@pnorman, my current import pipeline populates the source tag and the following seamark:* tags:

seamark:name
seamark:fog_signal_category
seamark:beacon_lateral:system
seamark:beacon_lateral:shape
seamark:beacon_lateral:colour
seamark:beacon_lateral:category
seamark:light:colour
seamark:light:visibility
seamark:light:exhibition
seamark:light:character
seamark:light:period
seamark:light:multiple
seamark:light:range

Note that all of these are already defined (see Seamarks/Seamark Attributes) and have clear analogues in the NOAA ENC dataset. This set is, in my opinion, a fairly minimal useful set and can be extended as necessary.

I have amended the proposal to make this explicit.

quantenschaum · September 25, 2023, 10:39am

Hi @Ben_Gamari, I am interested in this as well and I started working on the exact same thing, except focusing on Germany and the Netherlands. You may want to have look into GitHub - quantenschaum/mapping. Generally what I came up with so far can be applied to US ENCs as well. Maybe we can put out work together?

Cheers!

Ben_Gamari · September 25, 2023, 11:46am

Hi @Ben_Gamari, I am interested in this as well and I started working on the exact same thing, except focusing on Germany and the Netherlands. You may want to have look into GitHub - quantenschaum/mapping . Generally what I came up with so far can be applied to US ENCs as well. Maybe we can put out work together?

Hi @quantenschaum. It indeed looks like there is a good amount of overlap here; in light of this I wish I had opened this proposal earlier. My pipeline has been essentially ready to run for since last year but I held off opening the proposal as I felt that the false-positive characterisation needed more work.

Ultimately, I’m rather hesitant to start over with a new pipeline, especially as I have already put in a fair amount of manual assessment effort to my pipeline’s output. However, I would be happy to have a chat about our respective pipelines so that we might share our approaches, learn from one anothers’ mistakes, and exchange some ideas or implementation.

Ben_Gamari · September 25, 2023, 12:00pm

It occurs to me that I had also deleted the “infrastructure” link target to meet the three-link limit. I have now fixed this. For the record, the implementation of this import can be found in Ben Gamari / buoy · GitLab.

Also, on re-reading the proposal I realize I had never updated the tense of prose to reflect the fact that the import pipeline itself is, to first order, complete. The work that remains from my perspective is:

to build consensus around the goal and approach
ensure that OSM upstream is satisfied with the false-positive and false-negative rates (or finish the manual audit that I had started)
restructure the proposal into a Wiki page and merge in usage documentation
make any adaptations to the implementation called for during the proposal process
perform initial import
configure an automated periodic update job

quantenschaum · September 25, 2023, 12:51pm

Cool! Seems we are in the same boat here (pun intended).

I will have a look into your pipeline code. For German data there are some legal issue to clarify first, before it can be imported into OSM. For personal use you may already use the data available. I actually started with this in mind first. I converted data of various sources to OSM XML, then converted it to OBFs and used it in https://osmand.net/, which is based on OSM data. So I cloud extend publicly available OSM data with imports from other sources. The are OBFs with depth data available for the US.

The import of this data into OSM is not trivial because the imported data has to be merged with existing data in OSM. This is the most difficult part of the process, the actual conversion is maybe tedious to set up but otherwise trivial.

In the wiki I found these import guidelines.

The script I wrote tries to match existing points (buoys) by name and/or position. I separated modification of existing data points, addition of new points and removal of points into separate steps. After each run all changes can and need to be reviewed thoroughly. This is assisted via the remote control feature of JOSM. You step through each change and JOSM zoom to the location automatically.

A fully automatic import will never be possible IMHO. There are so many things that can go wrong. So far I see the following problems.

automated removal can cause huge damage, especially it the nodes are connect to other objects
the ENCs contain redundant data, the same area can be covered by several enc of different usage bands (sometimes there are multiple ENCs of the same UB covering the same area) The objects in these ENCs sometime disagree (diffrent positions, metadata, present/missing), this is a real mess.
false positives/negatives mess up the data as well
there are data sources that contain more up to date information like the notices to mariners. If someone imports data from these sources (usually manually, very brave those folks), these changes would be reverted by the automatic import.
there data related to the imported data that is not directly connect to the imported data, waterway=fairway for example, if buoys are the fairway has to be moved as well, but it is not connected to the buoys or anything else. In NL the buoys are moved often (weekly) due to shifting sands, so an automated updated would just mess up the map

I thinks it’s quite hard to actually get a fully automated import working w/o causing damage. So, I took a semiautomatic approach, the tedious work is done by the script, the review has to be done by a human.

Ben_Gamari · September 25, 2023, 1:31pm

Indeed; however, I believe I have addressed this concern fairly thoroughly in my proposal. Please do have a look at the “Matching algorithm” section. Furthermore, I have structured the import to facilitate manual review in bounded-size chunks. I have also performed a significant amount of manual review on the preliminary import to validate that my correlation is working as expected.

I agree that the initial import will need to done manually. However, subsequent updates should be doable in an automated fashion without heuristics, only relying on the identifiers in the OSM source attribute.

By the way, thanks for all of your efforts in improving Osmand’s nautical mapping capabilities. I have been using it routinely for the last few years and it has been a joy to see it improve release after release.

quantenschaum · September 25, 2023, 6:52pm

I will read it! I spent some thoughts on this, too, and are very interested in your ideas. I also had the idea of adding a marker in seamark:source that uniquely identifies each object once it has gone through the initial import. The problem was, that I did not find a stable marker/id in the imported dataset that I could put into OSM. And there remains the problem of updating surrounding objects when buoys are moved.

OsmAnd is really nice and I also use it quite a lot. It is actually capable of doing much more than it can do as it comes OOTB. It of course needs good (nautical) data. With depth data from the ENCs and generated light sectors you almost get a full nautical chart.

aighes · September 26, 2023, 1:22am

On those 9318 objects, which are existing in OSM already, did you made any checks regarding the differences? Are the positions matching? Did you verified for some area the data you want to import against the ground truth? Just to understand the data quality you want to import.

pnorman · September 26, 2023, 3:15am

Unfortunately the seamark related tags suffer from how they’ve been adopted as if they were their own sub-project that didn’t interact with wider OSM tagging.

In this case, seamark:name is a problem - the feature is a seamark (although it needs a tag to indicate that), and a feature’s name should be indicated with name.

quantenschaum · September 26, 2023, 6:08am

IMHO they do not “suffer”. It is actually quite good to have the seamarks in kind of a separate layer. Seamarks are usually placed less dense then other objects in OSM. So, when editing them you usually download a very large area via overpass only containing seamarks. If you now move them around or delete them, you would mess up the other data if they were connected to “regular” OSM nodes. If they are kept separate, this does not happen. seamark:name is not a problem, it contains the dedicated name of the seamark that appears in a nautical chart and which can be different from the name of the object used by landlubbers The seamark prefix is also handy for (semi)automated updates, because these scripts can only operate on these tags and leave the other data untouched.

quantenschaum · September 26, 2023, 8:43am

@Ben_Gamari the SQL based processing of the data is nice. I didn’t know before that ogr2ogr can write to SQL DBs. Why did you choose PostGIS? Because it supports arrays and SQlite does not?

quantenschaum · September 26, 2023, 11:49am

@Ben_Gamari some thoughts on your matching algorithm

Using the lnam as unique ID is very good. I missed that in my approach. Is this ID unique across (overlapping) ENCs? This would allow to select the most accurate version of an object. I came across cases where the same object appears in different ENCs at different positions and with different attributes. Here one could select the version from the highest usage band or with smallest scamax/scamin or most recent date. It also avoids adding the same object multiple times.
Just taking the closest match within 200m may lead to false matches because there are other similar objects close by, I would require a unique match (one single matching object within the radius, not just the first/closest).
The 200m threshold may work well in average but no in all cases. In densely buoyed areas it is too big, if buoys are moved more that 200m (yes, that happens) it is too small.
How do you handle the case if someone else updated a buoy based on more recent data (notices to mariners) than is in the ENC? Your script would simply overwrite these changes, wouldn’t it?
How do you ensure that objects in the proximity of the updated points are updated as well? Classical case: 2 buoys are moved, there is a waterway=fairway going between these buoys, which has to be moved along but it is not connected to the buoys and it does not even have a seamark tag.

I really like your SQL based processing. But independent of the way it is done, I think, a fully automated import will not work properly. Due to the facts that the imported data has to be merged and other users can add/remove/modify data and that they can do all sorts of unpredictable things, I currently do not see a way around a manual review of very single change.

IMHO seamark:name should be set to OBJNAM, because this the official name of this seamark. The “description” (color, shape,…) is in the other attributes.

quantenschaum · September 26, 2023, 12:15pm

Thoughts on the source tag. This tag can be misleading and redundant. When data is added to OSM, the source can be attached to the changeset. This is, in my opinion, the place where it actually belongs, because this particular changeset is based on the given source (assuming it’s not a lie). Usually objects in OSM undergo many changes, contributed by different users, at different points in time, based on different primary sources. So, an object in OSM with its tags is most likely not based on a single source as a single source tag suggests. Actually there might be not a single tag in the latest version of an object that has a value originating from the source given in source. You can find out which tag was changed in which changeset and evaluate the actual source of each tag and the position of the object by going through all changesets. Having a single source for data composed of parts from different sources is not right IMHO.

With modern editors, the source=* tag is typically added to the changeset when a change is made, not as a tag on a piece of information which may be updated at different times based on different sources.

says the Wiki

quantenschaum · September 27, 2023, 11:52am

Thoughts on lnam. The lnam is a globally unique object identifier used in ENCs and is required for each object in an ENC (IHO Transfer Standard p18). It is unique between different providers of ENCs and also between overlapping ENCs (IHO S-57 Appendix p130).

This is great! If the lnam gets stored in OSM it really helps to uniquely identify seamarks. And it can also be used to avoid importing duplicates.

Since one is encouraged to invent new tags and ENCs are an international standard, I would like to propose seamark:lnam as straight forward place to store it in OSM.

@Ben_Gamari started this as a US specific project and I started my work in Europe, but it is actually applicable for any ENC from anywhere on the world.

aighes · September 27, 2023, 9:59pm

AFAIK is this the “solution” to avoid rendering name on Carto or was the solution for mapnik.

pnorman · September 28, 2023, 1:34am

It isn’t up to the tagger if a particular style wants to render a feature. No functional style will show a name just because a feature has a name tag - it will show names of specific types of objects.

I am concerned by the lack of a tag indicating what the feature is (e.g. a seamark=* tag)

aighes · September 28, 2023, 4:14am

Agree, just saying this dates back to like ~2008 and in that time mapnik was rendering every name.