"Miksi OSM on pulassa"

Petsamo · September 13, 2019, 11:03pm

Törmäsin äskettäin syväluotaavaan englanninkieliseen artikkeliin jonka koin hyvin mielenkiintoiseksi tosin hyötyä siitä ei kamalasti ollut mutta pani miettimään. Suosittelen heille jotka ovat pitempään jo muokanneet OSM:ää.

Artikkeli käsittelee OSM:n teknisiä toteutuksia ja tulevaisuutta.
Jos vaivaudut lukemaan loppuun asti, kerro ihmeessä mielipiteesi.

Lähde: https://blog.emacsen.net/blog/2018/02/16/osm-is-in-trouble/

Why OpenStreetMap is in Serious Trouble

I was a contributor for OpenStreetMap for a long time, and I advocated for OpenStreetMap for a long time, but the project has stalled while the proprietary mapping world has continued to improve in data quality. For those of us who care about Free and Open data, this is a problem. In this article, I explore the reasons why I think OSM has stalled, as well as solutions to get the project back on track.

I was a contributor to the OpenStreetMap project from 2008 until roughly 2016. I was heavily invested in the project. In that time, I mapped, organized mapping groups in two cities, contributed to the founding of OpenStreetMap US, a non-profit dedicated to OpenStreetMap in the United States, gave talks on and about OpenStreetMap, contributed to the OpenStreetMap.org codebase, mentored two students for OSM through Google Summer of Code, started a working group dedicated to import of data into OpenStreetMap in the US, created and coordinated a massive bot run in the US, moderated the Reddit page r/OpenStreetMap and was a member of the OpenStreetMap Data Working Group, which gave me escalated privileges (both politically and technically) for the project. There are few parts of OpenStreetMap where I didn’t have some either direct or indirect involvement.

I’m also the author of an article called Why the World Needs OpenStreetMap, which appeared in two large publications, The Guardian Online and Gizmodo, was on the front page of Hacker News twice and translated into at least four different languages. I was a proud OpenStreetMap advocate!

Before I criticize the project, I want to state emphatically that I still believe wholeheartedly in the core principles of OpenStreetMap. We need a Free as in Freedom geographic dataset just as much today as we did in the past. When I wrote my article about OSM in 2012, self-driving cars and other services were still a dream. Today the importance of having a highly accurate, libre geographic dataset is more important than ever, and I support those working to make it happen.

That said, while I still believe in the goals of OpenStreetMap, I feel the OpenStreetMap project is currently unable to fulfill that mission due to poor technical decisions, poor political decisions, and a general malaise in the project. I’m going to outline in this article what I think OpenStreetMap has gotten wrong. It’s entirely possible that OSM will reform and address the impediments to its success- and I hope it does. We need a Free as in Freedom geographic dataset.

As long as this post is, it’s not a comprehensive list of all the problems with the project, only the ones I found most directly affect the project’s success and that I wasn’t able to address myself during my time on the project.

When the World Needs a Map, Give them a Database

The first problem that I feel plagues OSM is that the OpenStreetMap Foundation views the mission of the project to provide the world a geographic database, but not geographic services. OSM gives people the tools to create their own map rather than offering them a simple, out of the box solution. Providing the ability for individuals and organizations to make their own map may work well for some, but it discourages small and medium size organizations from using OSM and thus engaging with the project. And even if they do use our data, their engagement is through a third party, rather than directly with us.

When you go to OpenStreetMap.org, you see a map and a few extras, such as a search window, along with a few extra buttons such as “Log In” and “Edit.” It would be reasonable to assume that OpenStreetMap is a map, like Google Maps or other map projects, but while there is a map on OpenStreetMap.org, OpenStreetMap doesn’t want you to use it. Instead, they want you to use the information from OpenStreetMap to make your own map, or find someone else to make the map for you.

If you find this strange or confusing, you’re not alone.

A map is nothing more than a visualization of a collection of facts. We can understand this in terms of geometry. Let’s imagine a furniture store called Frita’s Furniture. Our map is a simple two-dimensional plane just like we had in geography class, and it’s at 10,10. We might also imagine a road called Main Street that runs from location 2, 9 all the way down to 15, 9.

The location of the store and the road are geographic facts, but if we wanted to represent this data visually, we’d typically use a map. We’d choose just how to draw the road. Would we use a simple line or a more road-line picture? How wide should the line be? What color would it be? Where would we put the name of the road? We could put it across the line, or alongside it, or some other way entirely. And do we want to represent the store as a dot, or an icon of a store?

Years ago, map makers would handle this process manually, but with computers, we generally call this process map rendering, and there are many decisions around the rendering of a map, such the usage of the map, local conventions and even just keeping within a given organization’s map style.

But most people don’t care about any of this. They just want a map. OSM has a map on its website but discourages its use by third parties. Instead, users are expected to either find a commercial service to render the map for them or else do it themselves.

The project leaders claim that this is because they want people using OpenStreetMap to understand the difference between the geographic data and its visual representation and to encourage a free market ecosystem of rendered map providers, but it’s also a fact that many of the individuals who push for this separation also sell commercial map services. I explore this conflict of interest later in this post.

Unclear Usage Policies

I mentioned earlier that OSM discouraged use of its maps on other websites. It does this through technically enforced usage policies. Understanding a usage policy is usually a straightforward process. An individual or organization gets permission to use a service a certain amount. We could imagine this being done by the number of map requests, or by bandwidth, etc. But OSM’s usage policy is entirely different. They allow but discourage the use of the free map and then disallow any single application that is using over 5% of the map bandwidth. This policy is bizarre on several levels.

To understand why this is so strange, we can use an analogy. Let’s imagine that I make ice cream. I put the recipe for my ice cream outside my house and suggest people make their own. I also offer free samples out of my home. Above my door, I hang a sign saying “Please don’t ask for free samples”. Then when people come in and ask for a sample, I give it to them. People may spread the word about my free ice cream and suggest their friends use it. Imagine we have a person named Fred who is a fan of my ice cream and recommends that all his friends go to my house for ice cream. I continue to dish out free ice cream to anyone who asks. But if one individual like Fred refers too many people to me, I will cut off access to everyone Fred sent.

Furthering this analogy, I will tell Fred’s friends that they’ve eaten too much free ice cream, instead of telling Fred. And how many people is too many people? The answer for OpenStreetMap is anything over five percent of the total amount of free ice cream that I’ve dished out that day. Fred has no idea how many people I’ve served, so the only thing he can do is ultimately not refer people to my house.

This analogy works because no since no single service knows what any other service is doing, there’s no way to know how many other applications have requested how many map requests. Also, since the top services and applications will change over time, you may be fine one day and in trouble the next. Again, there’s no way to know. And when you do cross the line of using the service too much, your users would get an unfriendly message about unavailability, not you.

OSM could create standard usage policy, spelling out exactly how much free usage is allowed. It could also choose to create “premium membership” and encourage people to use its tile service (rendered map) service, but right now, using OSM tiles without going through a third party is hard.

A Bad Geocoder

When you type an address into a map and it gives you the location, that is called Geocoding. When your GPS or phone knows where you are and gives you a building or street address, that is called Reverse Geocoding. The geocoder featured on OpenStreetMap.org is called Nominatim, and it’s awful.

Nominatim is not the only OSM geocoder. Much like the map rendering, it is possible to write your own or use a commercial geocoding service. But Nomatim is the most popular geocoder available for OpenStreetMap, it’s used on the website and Nominatim is the service that is listed on OpenStreetMap.org under its APIs.

In its defense, let me say that Geocoding is hard and Nominatim itself is quite complex; its almost a feat of engineering. The developers who work on it put enormous effort into writing Nominatim. The problem is that such software needs to be maintained or sometimes replaced entirely to be useful. While there have been Nominatim maintainers, it’s not been given the time or attention that it needs and deserves.

To understand why Nominatim is bad, one has to understand how most people use a geocoder. They’re most often looking up a business or something vague like “Staples downtown Springfield”. That simple three letter query is asking quite a bit. It is asking the computer to know what Springfield is and to limit the query to that. It’s then asking to limit it to (or near!) an area called “downtown”, and finally, it’s limiting results to Staples.

But Nomintim can’t handle such queries. It can barely handle simple address queries, such as “123 Main Street”. As an example, if I typed an address into Nominatim near my location in New York City, it might come up with a result in Iowa, which for whatever reason, it’s more inclined to offer me.

If I try to specify my location as “Manhattan”, as of the time of this writing Nomintim will first assume that I mean Manhattan, Kansas, ignoring both the prominence of Manhattan in New York and the fact that the query itself is originating from New York City. Worse still, it’s not possible to search for intersections. If I type “53rd and 6th, New York City” into Nominatim, it doesn’t understand it. Even if I try to refine the search as “53rd Street and 6th Avenue, New York City”, it doesn’t work. Intersections are not addresses to Nominatim. It doesn’t understand stores, or “near” or categories such as “restaurant”. The results it comes up with are often irrelevant, and the service is quite slow.

While other geocoders for OSM exist, such as Pelias and Photon, only Nominatim is run and supported by the OSM Foundation.

No moderation/review model

One of the most significant technical problems with OSM is the lack of a review model, that is for a change to the map to be staged and then reviewed before being applied. Not having this functionality caused ripples of problems throughout the system, some of which I’ll discuss here.

New mapper problem

Editing on OSM can be challenging for a beginner, and as the project tried to attract new mappers (editor contributors), we would run into people who just mapped incorrectly. Unfortunately, because OSM’s data model doesn’t include a review stage bad edits are committed to the map and often left undiscovered, or even if they’re removed, the original editor doesn’t usually see why.

Having the ability for a mapper to contribute changes and then have those changes be reviewed would have potentially left the map with higher quality data and a sort of mentorship model between new contributors and more experienced editors.

I hoped that this would be improved when I mentored a feature placed into OpenStreetMap called “Changeset Comments”, in which users could leave feedback for one another’s changes. Unfortunately, this ended up not being something many people used constructively, and it was a mess.

Without Moderation, Bots are Hard

Bots could be very useful in OSM in finding mistakes caused either by inaccurate data sources or by editing blunders. For example, if there was a road named “Main Street” connected to another road called “Main Stret,” it was likely a spelling error and should be corrected. But it would be a good thing if changes were reviewed by a human being first.

But since OSM doesn’t provide any mechanism for reviewed edits, these kinds of suggested changes don’t exist. Either bot edits are executed without oversight, which could lead to errors, or they aren’t done at all and the project misses out.

Imports are Difficult

Imports are challenging for a variety of reasons, but having a moderation or review model would make things much easier by allowing changes to be staged. The inability to stage and review massive changes to the map have caused problems in the past and many bad imports go unnoticed. If the project instead required that a human review edits before being committed to the system, bad imports could be detected before they cause problems.

Due to the lack of staging inside OSM itself, staging systems have been written for other OSM related projects. These systems often require editors to then manually place those changes in OSM. This process is labor intensive and makes some imports so challenging that they die before they begin.

Vandalism is hard to manage

Like Wikipedia, OSM has people who purposefully vandalize the project. Vandals have a variety of motivations. Some vandals are your run of the mill Internet trolls who enjoy causing problems. Sometimes a company wants something on the map despite community consensus against it and change the map to suit their needs, even if the project as a whole is against it. Sometimes mappers use OSM to make political statements such as in the case of disputed territories, and sometimes a geospatially based game, such as Pokemon Go, will use OSM to generate its data and players find that they can change the map to gain an advantage. Whatever the reason, OSM has vandals.

Vandalism is difficult in OSM because without a moderation system, it has to be “cleaned up” rather than prevented in the first place. Detecting vandalism is difficult. Several people ran monitoring tools to try to find problematic edits and imports. I was one of those people.

Even if problematic edits are detected, removing then means making even more edits. The history of the project is littered with lots of small changes that are only there to remove some previous change. Worse still, if the vandalism isn’t detected early then someone else might modify an object that was previously vandalized, creating a situation in which either a tool or a person would have to separate the good edits from the bad, a manual process that can be labor intensive.

A moderation tool would prevent much of this. Many vandals would find that their work would not get into the database and would move on. While some malicious edits would still get through, we would be able to address a majority before it became a problem.

External Tools are Hard

One of my contributions to OpenStreetMap was working to improve MapRoutlette, a tool which helps find problems in OSM and offers users the opportunity to fix them. One feature that we wanted in MapRoutlette was to be able to present users with simple “Yes/No” type questions. Unfortunately, while not impossible this would have been a complicated task in OpenStreetMap. If these edits could have gone to a moderation queue, we could have been more confident, and possibly not needed MapRoutlette at all in some cases.

Many developers wanted to solve this same problem, offering the ability to add helpful but anonymous edits to the project. But since OpenStreetMap requires every edit be committed by an individual user, rather than a company or bot account, the barrier of entry for casual mappers was often too high.

OSM’s Lack of Layers

Most geographic databases use a layered approach to represent different features. One layer may represent political boundaries; another may represent the road network, a third may represent water features, and so on.

Instead of the traditional layers, OSM chooses to use a single layer and then tags (key/value pairs) on individual objects. At first this seems like a good idea, but ultimately it ends up creating a huge mess.

Tools are Harder to Write

Imagine if we wanted to write an editor for OSM that only worked with the road network. This task would seem straightforward in that we would just need to extract features that correspond are tagged as road, such as highway=*. Unfortunately, it’s not that simple.

First, an editor that edits these road features must not only pick up the roads (ways in OSM terminology) but also the points (nodes) that make up that road. Secondly, if the road is particularly complicated, it may be represented as a relation. Editing this way is time consuming but straightforward.

What is not as straightforward is that a feature such as a road may also be playing double-duty as another feature, such as a political boundary, as may any of its associated features. Editing roads may inadvertently result in changing a political boundary.

Having map features that represent such radically different meaning puts an onus on both tool makers and the individual editor working on OSM to be aware of any changes they make possibly having consequences that go beyond what they think they’re doing.

Imports Are Difficult Without Layers

One of the keys of the Free and Open Source software movements have been code reuse, the idea that you can integrate software together from different sources and have it work seamlessly together. One would think that it would be much the same with geographic data, but because of the lack of layers, it’s very challenging to import data into OSM.

Without layers, it’s difficult to extract a specific region by feature and analyze or replace that. Instead, because of its complex tagging system, it needs to be analyzed as a whole. Imports are possible but made more difficult without layers to make the job of data analysis by isolation easier.

No Support For Observational, or Other Datasets

One of the core tenants of OpenStreetMap is that it only stores persistent, personally verifiable data. The only exceptions to this are political boundaries- and even these exceptions can be problematic. Unfortunately, this also presents a problem with third parties want to use OSM for things outside of the project scope.

As an example, let’s take Pokemon Go. Pokemon Go is an augmented reality game in which real-life features are connected with imaginary creatures in which the player must battle and collect. The frequency and location of where these creatures appear is based on various map features.

Pokemon Go players wanted to use OSM to document the location of creatures to make it easier for other players to find rare creatures and improve their collection. OSM disallows this kind of data in the same way that it might for bird watchers- while it’s interesting, the impermanence of this data made it a poor candidate for the project and thus the data would be immediately removed.

But it doesn’t have to be something as trivial as a game- layers could also allow other specialized data such as potholes, red light cameras or even bird or animal sightings. It would make the project useful to many more people.

Lack of Permanent IDs

In any database, objects have an ID field, usually a numeric value to look the record up by. OSM is no different and every object inside OSM has an ID field. Unfortunately, in OSM the ID fields represent the low-level objects rather than any high-level concept. This creates a huge problem. I will call this idea a “Conceptual Object”, and show how the lack of permanent IDs for them is problematic.

Diving Deeper

To understand why the lack of permanent IDs is a problem, we have to dive a little deeper into how OSM works. While many of these low-level details are beyond the scope of this article, I will present the basics of how OSM stores information. A point in OSM is called a node, and every node has an ID. Points may be collected into a line, and that line is called a way, and collections of nodes and ways may be combined into a more complex object called a relation. A relation may also contain other relations. Nodes, ways, and relations all have ID fields.

To illustrate this, let’s think of a building, that building has properties. The building is at a certain location, it’s a specific size, shape, and has an address. If it’s large enough, it may have multiple addresses. But the concept of the building is unified. In OSM, that building could be represented by a single node, representing the address. Or a building may be represented by a way of the building outline, or a building may be represented by a relation, encompassing details of the various building elevations, levels, and roof types. The problem is that if I’m doing a lookup, there is no straightforward way to ask about the building. Instead, I will have to look at aspects of the building, such as its address, or its location.

History Lost

As strange as this may seem, it’s entirely possible in OpenStreetMap to take a node from one side of the world, move it to the other side of the world and use it for something else entirely. For example, it is technically possible to take a part of a house, move it to another continent and use it as part of a road. While this is highly unusual, this is not disallowed. If I look at the history of the node, I will see it move. While this may keep the history of the element, it does not keep the conceptual history of an object.

For example, if we start with a representation of an object being the building as a single node, then move to a complex relation, that won’t be reflected in the object history, and thus the changes over time are lost.

Permanent IDs on conceptual objects could help with this by providing a history of what the data represents rather than just the data itself.

Import Conflation

Amongst other problems, not having a permanent ID for a conceptual object in OSM is the challenging of conflating objects in OSM to objects in other datasets. For example, if we’re given a building database from a local government, each building in that dataset will have an ID. We will want to compare that ID to our existing objects. Unfortunately, to do that, we’re left with two choices- either we create a new identifier (key) in which to do the conflation, or we have to use the second dataset’s ID inside OSM- neither of which is an optimal solution.

It’s Hard to Build Connections to Other Datasets

Many people have envisioned projects that connect to OSM to offer reviews or other data associated with OSM, but without a permanent ID, this is not practical. Objects in OpenStreetMap may contain some data such as cuisine type or opening hours along with the name and address, but the review site will need to be able to have a permanent link to objects in OSM, which it can’t currently.

No Standards in Data Representation

In OpenStreetMap, there are no formal standards in the project for the representation of features on the map. As an example, let’s take the example of a sidewalk. Sidewalks are useful things to have on a map because they tell us if the road is pedestrian friendly. Sometimes sidewalks are represented by an attribute on the road itself. Sometimes sidewalks are represented as a line (way) that runs parallel to the road. Sometimes those ways have the name of the street as their own name, and sometimes they don’t have any name at all.

If you are a mapper, this is confusing, since there’s no one standard way to map things. If you’re trying to build tools to work with OpenStreetMap, the lack of standardization of data across the project makes it challenging to work with as a whole.

There is an informal process for data representation, mainly done on the Wiki, but because this isn’t formally enforced, and changing data en mass may be considered a form of vandalism, data consumers are forced to write tools that accept many representations of the same data.

The APIs are Slow to Evolve

As of writing, the current official OpenStreetMap API is 0.6. The API hasn’t version hasn’t changed since 2009. While a stable API can be a good thing for a mature software project, in the case of OpenStreetMap, this is as more a reflection of poor project management.

An API is part of a protocol that either allows a client to talk to a server or for servers to communicate with each other. In this case, we’re talking about OpenStreetMap’s editing API which is used between OSM and editing software.

The OpenStreetMap editing API is very powerful and complete, but it has some design choices that made sense in 2009 that have largely been replaced by better technical options in nine years since. These include small changes, such as the data serialization format, as well as more significant changes such as the internal data representation.

As an example, in 2012 there have been several proposals made to create a new datatype called an a area that would greatly simplify the representation of certain types of geographic features. Despite this and the offer of technical help, the project has not made any significant progress on this or other important technical issues.

OSM has Hidden Gatekeepers

Dovetailing on the previous section, we have to ask why the project has not made more technical progress, and the answer is that sadly the keys to the OSM castle largely do not lie in the hands of the OpenStreetMap Foundation, but instead in the hands of one or two individuals who act as gatekeepers to the project’s source code and infrastructure.

While it’s not uncommon for a Free Software or Open Source project to have a “Benevolent Dictator for Life”, these roles are often replaced by a more formal structure as the needs of the project grow. In the case of OpenStreetMap, there is a formal entity which owns the data, called the OpenStreetMap Foundation. But at the same time, the ultimate choices for the website, the geographic database and the infrastructure are not under the direct control of the Foundation, but instead rest largely on one individual, who (while personally friendly) ranges from skeptical to openly hostile to change.

As a former professional system administrator, I relate strongly to these types of individuals. At the same time, the desires of them need to be balanced by the overall needs of the project to make progress and keep momentum to keep its userbase happy and engaged.

That is not the case here, and it’s to the detriment of the project.

The OpenStreetMap Foundation Culture

It would be easy to think about the OpenStreetMap Foundation (the OSMF) as similar to the Wikipedia Foundation, but aside from the high-level view of being the holder of Free Data, the two projects are managed radically differently.

The Wikipedia Foundation is a multi-million dollar organization that not only manages Wikipedia but other projects as well, such as the lesser known Wikidata and Wikinews. These projects aid in the organization’s broad mission to provide high-quality information to the world. To serve this mission, Wikipedia spends a great deal of money on its infrastructure as well as directing and funding development of new tools for the community to use.

OpenStreetMap, on the other hand, relies primarily on donated hosting services and runs on a shoestring budget. It has no paid employees and does not fund or direct the development of its software base.

This has lead to some organizations trying to take up the mantle and improve the situation, including an organization that I helped found called OpenStreetMap US, which is a US based non-profit organization focused on promoting OSM in the United States. Among our goals for the organization was to fill in the gaps of development and mapping resources by the OSMF, which we partially succeeded in doing, but because of the fragmentation of organizations, we were less successful than we hoped.

In addition to OpenStreetMap US and other “chapters” around the world, there is the Humanitarian OpenStreetMap Team, whose mission it is to help promote OSM in developing nations and rally the OSM community during humanitarian crises. There is no reason that HOT needed to be an independent organization other than the unwillingness by the OSMF to expand its role. Even Steve Coast, one of the founders of OpenStreetMap saw and tried to address this problem with his organization, “Map Club.”

The obvious question is why the OpenStreetMap leadership takes the positions that it does, despite the clear need for change. The answers in my view are commercialism in the project, along with a cultural desire to retain the feel of the project’s early days.

While there are companies built around Wikipedia’s engine (the Wikimedia Server), there are not many companies making money from repackaging Wikipedia. OpenStreetMap, on the other hand, has a commercial ecosystem around it, largely from the business of creating customized maps for customers.

Many of the founders of the project, as well as others, have launched commercial services around OSM. Unfortunately, this creates an incentive to keep the project small and limited in scope to map up the gap with commercial services which they can sell. This also applies to HOT, which has a financial incentive to get grant money for itself and not have those resources going to the OSMF.

In addition to these conflicts of interest is a desire to keep the project small in scope by senior members of the community who see the project as being about people and the mapping hobby and want to avoid imports or other activity that could be seen as removing the human factor from the project. They also see the dangers inherent in creating an organizational structure that demands money and fears it would create a perpetual cycle of needing to find donors simply to support a management layer.

I disagree and view the lack of a more active structure by the OSMF as the cause of the project’s both stagnation and significant commercial influence.

The World Had Changed

When OSM was launched, governments did not release their data under free licenses. They only began doing so because OSM exists now as competition. Yet due to the problems I’ve outlined, OSM imports are difficult, and updating imports, once they get in OSM is nearly impossible. This is a critical problem for the project.

Similarly, when OSM was launched, drones were not cheap and available. AI wasn’t able to do good visual detection of roads, and flying cars were still science fiction. Now all of these tools exist, and yet OSM is still stuck largely editing by hand.

If OSM relies exclusively on manual labor and be unable to work with other datasets, its data quality will continue to decline and the project will ultimately stagnate and fail.

Just the Roadblocks

It may appear at first that this article is a comprehensive list of everything I find wrong with OSM. It’s not. There are many more concerns I have about the project, but I’ve limited my article to the scope of concerns I have that I feel are stopping the entire project from progressing. There will be time to fix the small issues if (and only if) the project as a whole succeeds. If it doesn’t, then the small nit-picky problems are going to be irrelevant anyway.

It’s my sincere hope that this article will be a call-to-action for OSM. There are many brilliant and inspiring individuals in the project. If I’m am a pun, I hope OSM will once again find its way.

Posted by Serge Wroclawski Fri 16 February 2018