Proposed automated edit of phone numbers in the USA

I don’t really want to go ahead with this while there are dissenters, but those were exactly my two points.

  1. Anything starting with + is an odd form in the US.
  2. There is no standard (or alternatively, there are lots of standards).

Based on the straw poll, the format chosen is the most preferred/least disliked.

You’re both right, mostly. E.164 is a standard for the format of dialing sequences. E.123 is a standard for the format of written numbers in printed matter. In practice, phone=* has become a tagging scheme for both at the same time, hence the differing views on what should be the ideal.

To the extent that E.164 disallows punctuation, it’s because only numeric digits and pauses exist in an electronic signaling context. You do not have separate keys or rotary positions for dashes, spaces, and letters on your Bell landline phone like you do on your Smith-Corona typewriter. That sentence in the Wikipedia article is why we shunt phonewords over to phone:mnemonic=* instead of tagging phone=+1-710-555-BEEF.

If you want to rid phone=* of punctuation and make it fully machine-readable, so that software always has to format it before presenting it to the user, then just say so. That’s a perfectly reasonable stance without taking a standard out of its context. There’s even a standard that’s pretty close to what you’re suggesting, RFC 3966, for tel: URIs. But any such migration requires a discussion outside of the United States category and should not derail an edit proposal that maintains the current human-readable approach. Rest assured, the proposed edit doesn’t have to be the final word on phone numbers.

2 Likes

I have a really hard time believing that people in the US broadly and significantly prefer “+1-212-555-1212” to “+1 212-555-1212”. I also expect that in the worldwide community, “+cc nnn nn nnnn” (however the spaces are supposed to be, for that particular cc) is preferred.

As for “Rest assured, the proposed edit doesn’t have to be the final word on phone numbers.” that’s sort of true in a theoretical sense, but this is going to bot-overwrite the judgement of humans – inclding edits that I have personally and carefully made – and I think it’s pretty likely that will never be undone.

So I’d like to ask, if one analyzes all telephone numbers in the entire OSM database:

  • What is the pattern of space vs hyphen vs nothing, between country code and phone number?
  • What is the pattern of space vs hyphen vs nothing, within phone numbers?

Except (if I’m reading the proposal correctly), the bot edit will not change the numbers you have formatted because the only difference would be spacing characters:

It sounds like this bot edit would add the +1 and maybe change periods to hyphens, but won’t change spaces to hyphens or vice versa.

2 Likes

If this is really just going to add a “+1 “ if missing, and change periods to hyphens, and thus “+1 212 555 1212” is left alone, “212 555 1212” is changed to “+1 212 555 1212” and “212-555-1212” is changed to “+1 212-555-1212”, then I’m totally fine with this. I’m also ok with dropping parens around the area code, and replacing “(212) 555-1212” with “212-555-1212”.

I’m still not ok with adding “+1-”. Especially to “212 555 1212” which is already in the world-mainstream-space-separated style, conforming to E.123 as I read above. So I’d like to see: when making a change, the changed bits will incrementally conform to E.123 with spaces as separation. This seems like a really straightforward thing to do, and an approach that should be universally acceptable for representation in the db. I don’t understand the reluctance to do it that way.

(I do see a theoretical argument for E.164 and then presentation separately, but I think it’s far more mapper friendly to use the E.123 space-separated variant. Especially since I suspect even most people that mostly get this subject didn’t realize that E.164 doesn’t allow spaces and really they are writing E.123 when they thought they were writing E.164.)

More than a million elements worldwide have one or more hyphens in phone/contact:phone/fax/contact:fax=*. Unsurprisingly, these elements are most heavily concentrated in the NANP area, but there are also concentrations in several other countries.

The concentrations in Germany and Austria are due to the DIN 5008 convention of setting off direct inward dialing with a hyphen. (Think extensions, but without having to pause.) Unlike the hyphens in North American formats, E.123 explicitly forbids this convention in both national and international numbers, recommending a space if anything. However, the German-speaking community adopted this convention for practical reasons, initially on a worldwide basis, unaware of the incompatibility.

Bucketing phone numbers by country code is a bit tricky because of all the messy invalid phone numbers in the database, but this very conservative query finds that the NANP is not alone in frequently tagging hyphens:

Country code Region Hyphen prevalence
+998 Uzbekistan[1] 97%
+58 Venezuela 89%
+1 North America 85%
+504 Honduras 64%
+501 Belize 63%
+54 Argentina 58%
+507 Panama 56%
+81 Japan 48%
+55 Brazil 46%
+60 Malaysia 45%
+380 Ukraine[2] 37%
+46 Sweden 34%
+977 Nepal 34%
+353 Ireland 33%
+91 India 29%
+972 Israel and Palestine 25%
+506 Costa Rica 22%
+40 Romania[3] 19%
+591 Bolivia 19%
+82 South Korea 19%
+375 Belarus 17%
+886 Taiwan 17%
+62 Indonesia 15%
+993 Turkmenistan 14%
+880 Bangladesh 14%
+976 Mongolia 14%
+373 Moldova 13%
+975 Bhutan 13%
+381 Serbia 13%
+855 Cambodia 11%
+43 Austria 11%
+503 El Salvador 10%
+92 Pakistan 10%
+960 Maldives 10%
+66 Thailand 9.4%
+505 Nicaragua 8.3%
+509 Haiti 8.2%
+852 Hong Kong 7.7%
+973 Bahrain 7.5%
+593 Ecuador 7.3%
+352 Luxembourg 6.8%
+20 Egypt 6.6%
+49 Germany 6.6%
+7 Russia and Kazakhstan 6.5%
+974 Qatar 6.3%
+592 Guyana 5.9%
+52 Mexico 5.8%
+502 Guatemala 5.3%
+387 Bosnia and Herzegovina 5.2%
+968 Oman 4.7%
+63 Philippines 4.7%
+998 Uzbekistan[4] 4.6%
+40 Romania[5] 4.3%

Taken together with the previous query, this query shows a correlation between the countries that use hyphens anywhere and the countries that use hyphens directly after the country code:

Country code Region Hyphen prevalence
+58 Venezuela 67%
+1 North America 49%
+60 Malaysia 39%
+501 Belize 36%
+81 Japan 34%
+353 Ireland 32%
+91 India 28%
+977 Nepal 27%
+591 Bolivia 19%
+46 Sweden 18%
+886 Taiwan 15%
+40 Romania[6] 15%
+972 Israel and Palestine 15%
+855 Cambodia 11%
+976 Mongolia 10%
+506 Costa Rica 10%
+993 Turkmenistan 8.3%
+975 Bhutan 8.2%
+852 Hong Kong 6.5%
+880 Bangladesh 6.3%
+62 Indonesia 5.9%
+504 Honduras 5.9%
+973 Bahrain 5.7%
+66 Thailand 5.1%
+92 Pakistan 4.9%

The lower prevalence across the board is mainly because a relative handful of mappers have tried their darnedest to replace the hyphen with a space based on the style preferred by some validators. I’ve already pointed out that 6% more users prefer to use a hyphen than a space in this position, and that’s an undercount due to other mechanical edits, such as the one that migrates website=* from HTTP to HTTPS or the various AllThePlaces-related edits. As long as editors continue to take a hands-off approach, this is a losing battle. Consider this a generous upper bound to the degree of consistency we can ensure if we decide on a space here.


  1. When tagged as +99 (8…). ↩︎

  2. When tagged as +38 (0…). ↩︎

  3. When tagged as +4 0…. ↩︎

  4. When tagged as +998 …. ↩︎

  5. When tagged as +40 …. ↩︎

  6. When tagged as +4-0…. ↩︎

3 Likes

Hello!

We have, once again, generated a lot of discussion about “What is the best format for OSM?” and “let me tell you about the various formats” but gotten way way way away from “is this an acceptable edit”. Please take discussions about your preferred format to another thread. I presume there are many to choose from.

Once those threads have reached an agreement you are free to make a new thread “Proposed automated edit of phone numbers Globally” but we should not block improvement of the USA tagging any more.

I have not seen a specific critique of this edit proposal in some time and I would encourage the original poster to proceed.

5 Likes

I was out last night doing some on-the-ground surveying in Seattle, WA, US and took this picture because it reminded me of this thread :slight_smile: NNN-NNNN-NNN

I entered it as contact:phone=+1-400-108-0899 :smiling_face_with_horns: Let’s spend the next week debating contact: prefix use!

3 Likes

+1[ -]YYY[ -]XXX[ -]XXXX

2 Likes

Is that actually their phone number? According to Wikipedia the 400 area code doesn’t exist…

400–499

Code Numbering plan area or use Date Notes
400 not in use; available for non-geographic assignment * easily recognizable code (ERC)

List of North American Numbering Plan area codes - Wikipedia

4 Likes

Wow, good catch. +86 400-1080-899 is the chain’s international toll-free number. They also post it on the wall at all their locations in China, also without the +86 country code:

If ever there were a need to tag international phone numbers with country codes! I guess they kept it in the national format on the logo for authenticity. Even then, toll-free numbers in China normally follow a 400-NNN-NNNN format that would be familiar in North America. Go figure.

Just commenting to note my support for this automated edit!

1 Like

The first edit has been made, for Alabama, for just a single element.

Further edits are delayed, awaiting an import role on the bot account (which has been requested).

Although the other edits have not been made, they are not showing on the website, meaning that everything you see here

are edits that will not be made by the bot.

Your bot can continue to operate, fixup can be performed in relatively small changesets on low rate.

The issue is that it was trying to upload more than 1000 changes at once and getting blocked. Adding workarounds for the rate limiting seems against the point.

Rate limiting in this case isn’t wrong, because this kind of edits implies many independent changes, so no necessity to make changes atomically.

I put in some limits, and also now have the importer role.

Today’s edits:

Arkansas (1 edit) or OSMCha

Arizona (1052 edits) or OSMCha

California (1093 edits) or OSMCha

Colorado (5 edits) or OSMCha

I am happy with these edits and so I will let the bot continue with the rest of the states from tomorrow, unless I hear any objections.

5 Likes