I am once again foolishly attempting to standardize the formatting of a tag. I presume there’s been some discussion about which is preferable etc.
Standard formatting makes it easier to discover actual data entry issues and direct folks more cleanly to things needing human intervention and investigation (phone numbers missing digits, etc).
I have a set of validator rules that standardize to the all dashes form but that just happened to be the most common in the dataset I first pulled (King County WA I think?). I am happy to modify them to whatever the consensus is.
I will not be attempting to solve way folks tag extensions at this time.
Just for fun, here’s a list of the various formats that are most common in my experience:
I personally like +1.213.456.7890 which I started using when I was working with customers in a few countries around the world and it seemed to be the one format that everyone could use without asking me what it meant.
However it seems that the wiki slightly prefers spaces for separators so I use +1 213 456 7890 when entering phone tag values. Given the expectations of US mappers I doubt that very many are in that format. The only time I correct other mappers is when they leave off the international dialing code (“+”) and the country code (“1”).
By the way, the “1” code is actually for all countries participating in the North American Numbering Plan which includes Canada and some Caribbean countries.
Here’s the output of my validator rules when run on KY. Note that the first one detects all combinations of space and dash as delimiter where the format still isn’t +1-NNN-NNN-NNNN. (ex: +1-NNN-NNN NNN).
If you take a look at the 1399 phone= tags in Kentucky (picking a random state I haven’t touched) the most common formatting is all dashes format at 292. The second most common is all dashes but missing the +1, followed in third by (NNN) NNN-NNNN. All dots is extremely uncommon as it turns out.
This is pretty typical with my experience of all dashes being the most common (but not majority) of phone numbers in the US.
And to illustrate what we get for having one format… Once the rules are applied as outlined here, it becomes super easy to see the residual numbers that need non-robot attention:
My vote is for this format. I think it’s both recognizable to Americans who don’t often encounter the +1, while making the “core” of the number obvious. I have a JOSM validation rule that changes every other format you have listed to it, although I think it leaves +1-NXX-NXX-NXXX and +1 NXX NXX-NXXX alone. Very easy to change obviously if there is a different consensus.
Just as a hint - there is a ITU-T E.164 phone number standard. It does not say a lot about formatting but just as another datapoint for the discussion.
E.164 is about the digits, not the number’s presentation. E.123 is the ITU standard for formatting numbers; however, it gives considerable leeway to regional norms, particularly the North American practice of grouping digits with hyphens. (The hyphens conflict with the DIN 5008 standard, which is for German phone numbers in German-language publications. Bizarrely, parts of the wiki call for conformance to DIN 5008 in every language and region.)
A couple years ago, @Kovoschizsent me down a rabbit hole investigating the correct format for phone and fax numbers in the NANP region. The short story is that there isn’t a single standard, because OSM is trying to have it both ways: most consumers of phone numbers either strive for human readability, in which case consistency is a matter of house style, or machine readability, in which there are no delimiters. But OSM tries to strike a compromise between both formats, probably because editors and data consumers haven’t historically done much to format the numbers themselves.
In the real world, people rarely specify the +1 calling code, because the area code is already part of an international scheme. The 1 mostly appears as part of a toll-free number, such as a 1-800 number, in which case most people habitually include the hyphen.
Wikivoyage: +1 YYY XXX‑XXXX or +1 YYY‑XXX‑XXXX (if the NPA requires ten-digit dialing) or +1‑YYY‑XXX‑XXXX (if the area requires 11-digit dialing, or for a toll-free number)
If I’m reading this code correctly, it’ll flag any phone number that contains an extension. This is the other rabbit hole I went down. There are many styles for indicating an extension; I added a section on extensions to the wiki post above.
Yeah, it’ll throw a warning, but there’s no autofix for that one so presumably you’d take a look and leave it alone. Didn’t want to make that regular expression even more complicated
(this is now phone validator chat digression)
I wanted to see what the delta between your much more compact rule is and my own method mostly to make sure I wasn’t botching anything too bad. I downloaded all of the numbers in Tennessee and did 2 runs through validation, once with @whammo’s rules and once using the methodology above.
The only big difference is there’s an “aggressive” phase outlined in my method that will replace out all the common delimiters and then try to format a number out of it. This means I a number like +(901) 249-0549 will eventually get to a format that is “correct”. A small difference is that I flag numbers that have [A-Z] and have a formatter to convert those to digits while moving the original number to phone:mnemonic.
You may consider adjusting your fixup to non include numbers that have “1” in places that make is an invalid phone number but that’s also an easy check to write on it’s own.
Extensions are a massive annoyance and I’d love for it to just be " ext. "
Here’s the residual phone numbers for both methods.
I was curious if there was a “definitely avoid” formatting that some very common software stack would NOT recognize as a phone number. I added a couple to the wiki post. Feel free to add as you see fit.
For humor:
ChatGPT: 15/18 - says 15 of them are phone numbers when asked how many are phone numbers
WolframAlpha: 0/18 - when asked “is +1 213 4567890 a valid phone number in the united states?” etc
Until I researched this topic, I never quite appreciated the reasons for all these different formats. Clearly, some variants like 000.000.0000 and 000/000-0000 are purely stylistic. But other variants communicate subtle nuances about local dialing procedures, at least according to some style guides:
Format
Dialing procedure
Trunk prefix
Area code
1 000 000-0000 1 (000) 000-0000 (1-000) 000-0000
7-digit dialing
Optional
Optional
1 000-000-0000 (+1) 000-000-0000
10-digit dialing
Optional
Required
1-000-000-0000
11-digit dialing
Required
Required
I suppose these nuances can be the responsibility of a software library that includes a table of dialing procedures by NPA. But that raises the question of why we’re even bothering to format these tags for human consumption. Omitting any punctuation would be less error-prone and more compatible with the tel: URI format that smartphones actually use to place calls. Another consideration is that NPAs sometimes adopt ten-digit dialing; either we’d have to make a bulk edit, or every software application would need to update its table of dialing procedures.
Osmose has this pin for ‘phone number not in the expected format’ (In Italy). Don’t know if it has some sort of AI routine learning what the most common formats are for numbers of different digit counts and if with city 0 prefix (has to be dialed even from abroad) or without for mobile numbers. For the US/Canada can only remember it complaining if the international +1 prefix is missing.