Minutely updated vector tiles demo

Hello.
Is there new news about the progress of this topic?

Is there new news about the progress of this topic?

7 Likes

It’s been a busy couple of weeks for me with non-OSMF work stuff as I was at both PGConf.dev and State of the Map US. I then picked up a cold at the end of SOTM US, so I am just now getting back to work.

My status talk was not accepted into SOTM US so I did a bird’s of a feather session where I talked with @Branko_Kokanovic and @jake-low. I also got to talk to more people than I can list at the conference, generally about vector tiles and my work. I need to write all of the discussions up still, so I’ll provide details later.

Tilekiln is mostly feature complete. I sent an update to the EWG recently and the remaining work for this part of the project is error handling, logging, documentation, and upstream Shortbread work. Since I was away I haven’t had a chance to hear back yet.

The original project agreed with the OSMF was to include development of a style that can “show off what can be done with OpenStreetMap data” and deployment of both that style and Shortbread to OSMF hardware. Shortly after the contract was signed it got reduced in scope to meet budget requirements, cutting out the showing off what can be done with OSM data and deployment. The plan was to extend the contract if the first part was successful and the OSMF was able to go ahead.

I proposed to the EWG that the contract be extended to cover the second part of the work. They discussed it while I was at the conferences. I think if the work gets picked up depends mainly on funding availability.

Because deploying to production OSMF hardware is part that was cut from phase 1, I’m not working on it right now. I have some flexibility to adapt what I am doing as long as I stay within the total budget, but that does not extend to bringing in work that was explicitly cut.

My hope is that by SOTM I will be able to demo a page showing minutely updated vector tiles hosted by the OSMF and possibly have submitted it to be a featured layer.

I will have a blog post coming out at some point with details of the recent work, and another with a summary of discussions I’ve had with others.

26 Likes

One question that it would be interesting to know the answer to - what is expected to be the schema used for the vector tiles that eventually appear on osm.org ?

I suspect that the answer might be both “something based on Shortbread” and “we don’t know for sure yet because the EWG and other interested parties need to discuss it” but it’d be interesting to hear what the plans are.

For the benefit of anyone who may be not aware, the “vector tile schema” is what determines what data is available for people to use in their own map styles. Too little, and people won’t have access to the data that they want**; too much, and the tiles would become unfeasibly large.

** to save anyone checking, busway is in the shortbread schema.

6 Likes

too much, and the tiles would become unfeasibly large

This seems to imply the existence of some feratures making up a significant portion of all OSM data, that at the same time are of no interest to map stylers.

I’m curious what those are. In my experience, the bulk of OSM data is a pretty small subset of features.

1 Like

There absolutely are features that some map styles don’t want, but alas some other map styles will want them. A decision about where to draw the line needs to be made.

That’s a good point. Let’s take a niche tag (for instance ski pistes :stuck_out_tongue_winking_eye:) that account for a small amount of data.
Do we have an idea of the strain imposed on the tool chain to include them?

With raster tiles, none (people just omit it from their database).

With vector tiles, there’s a storage space cost of including it “just in case” it is useful for someone, and also a difficult to quantify human cost of making the schema more complicated than it would otherwise be.

If you play around with something like tilemaker you can actually measure the first of those directly.

The first item on the EWG top 10 task list ( Top Ten Tasks - OpenStreetMap Wiki ) is
‘Localized map rendering,’
which means multilingual display.

However, the current Shortbread schema only includes the

  • name_en,
  • name_de

labels.

I believe we should establish some guidelines for multilingual support now.

We also need a technical solution that ensures no language community is told, “Sorry, but your language is too small to have a map in your own language.”

https://wiki.openstreetmap.org/wiki/Map_internationalization

5 Likes

Further down the road one can imagine thematic layers of vector tiles to mix. But I guess I’d like ‘Openstreetmap vector tiles’ to be pretty heavy to begin with.
If it’s just a base layer with all languages, then this thread should better be addressed first.

2 Likes

For an excerpt of Canada that includes from roughly Seattle to a little northeast of Edmonton (so it has e.g. Whistler-Blackcomb and the ski resorts of the Canadian Rockies):

  • vanilla shortbread: 528,906,518 bytes
  • vanilla shortbread plus things with piste:type: 529,378,109 bytes

So ~0.1% larger.

Some features will be much more expensive to include, but ski pistes in particular seem likely to be cheap - there’s few of them and they’re often relatively low resolution.

2 Likes

Assuming the second part is picked up by the OSMF it will be Shortbread initially followed up Street Spirit. The former guarantees a stable schema that can be used by others while the latter will include only what is needed by its cartography.

There’s a few sources. A single attribute present on all features in a tile can increase size by 20% and will be worse in most real-world situations. Merging features is also impacted by additional attributes. For example a road split into two ways where the speed limit changes can be merged back into one linestring if you don’t care about the speed limits. If you start putting them into tiles you’re back to needing two linestrings. Merging features can cut the size of tiles in half.

A high-ordinality feature present on buildings would be the worst case. For example, ref:bag is present on many buildings in the Netherlands. Adding it to each building polygon would stop collection into multipolygons and then add an additional attribute on to one of the most numerous features.

This also shows how the total size added is not always a useful measure. If you had a similar tag in Luxembourg it would add almost nothing to the average tile size, but the experience for users in Luxembourg would be much worse.

This is more of a Shortbread question than an OSMF vector tiles one. Shortbread is considering how to support multiple languages best. I’m leaning towards name_XX for all supported languages. This lets styles chose how to manage fallbacks and do things like “Aachen (Aix-la-Chapelle)”. We also may leave picking what languages to support up to people generating tiles. Any style will have to fallback to the plain name attribute, so compatibility is still maintained.

Figuring out what languages to support is tricky. There’s no programmatic way to know what languages are on a feature so a manually curated list is required. With Tilekiln’s templating this wouldn’t be too hard to do.

6 Likes

I’m exploring a solution whereby all languages are in the underlying tile storage, but when requested, there’s a querystring that gets passed to the tile server with a language list, and the tile is returned with other languages excluded.

5 Likes

I like numbers, so I looked at the storage space taken by niche tags in an OSM planet file.
Just to give an idea of the amount of initial data involved, I filtered an Planet.pbf that I maintain up-to-date with osmium (84.4GB) for:

  • The 65 tags used by osm2pgsql “default style” => 77.5GB or 91.2%
    source
  • The 6746 unique tags used by projects registered at taginfo => 84.3GB or 99.97%
    source

But this is raw data, and as Paul explained above, this is a bit more complicated and doesn’t compress so well in tiles when you have lots of different tags on single objects.

Also this illustrates that the maintenance of the tag-list involved that goes from a no-brainer value of 65 carefully picked ones to a 100x increase.

In the EWG Top Ten Tasks, there’s another feasible task:

“The goal is to enable clickable POIs on the osm.org front page. Additional UI features, such as highlighting icons on hover, could significantly improve the user experience. Note: nowadays, Vector Tiles would likely cover these requirements more or less out of the box.”

For this, it will probably be necessary to extend the Shortbread POI layer with the osm_id:

1 Like

The issue is you don’t know what all languages are unless you pre-define it.

Yep. What I’m tentatively thinking of for cycle.travel (which uses its own schema) is defining region polygons where a language other than that in the name= tag is widely spoken. Then, for objects in those region polygons, outputting that name additionally.

Examples might be Catalonia and Wales, where I would want to output name:ca and name:cy respectively, as well as name:es and name:en. But I wouldn’t want to bloat the tile with all the various transliterations that might find their way into a city node.

No. Vector tiles already have the concept of an id field separate from the key/value attributes. Read the MVT spec.

1 Like

That’s the approach I use (actually with raster tiles) - regions are processed based on language, and in my case the “name” tag might be set to (perhaps) name:cy if in a Welsh-speaking part of Wales. Currently I only do that at initial data load - at minutely update would be an extra challenge.

A similar approach could result in names having a main “name” tag and only name:xx tags relevant to that region (which is what would be needed here), but the challenge is that there are a lot of xx across OSM as a whole

1 Like

You just include everything with name:*. In the OSM US tile server I’ve defined an exhaustive list of every language with usage above a non trivial threshold because planetiler only takes a white list at this point. But that seems an artificial limit. Is there a constraint in the mvt spec that would prevent doing this by wildcard?

1 Like