Data: singular or plural?

What do folk think about the word data : is it singular or plural?

These data were imported and are now stored on the uMap server.

or

This data was imported and is now stored on the uMap server.

  • In English, “data” is singular
  • In English, “data” are plural
0 voters

(I’m doing some translation of uMap documentation into English.)

“Data” is very often treated as a mass noun (uncountable noun), which calls for singular grammatical forms even when it refers to lots and lots of stuff. This is especially true in technology and geography, the fields most relevant to OSM, where “datum” has a different meaning than “data”.

If you want to refer to data in general, use singular. If you want to refer to a single point of data – a datapoint – use singular. If you want to refer to a single set of datapoints – a dataset – use singular.

And now you have not one but two alternatives that skirt the issue using collective nouns. I hope you find these informations helpful.

It’s an interesting case as noted before. The word itself is plural, but is commonly used like a singular.

Compare: “This datum has been uploaded” and “This data has been uploaded”.
Datum is rarely used nowadays, data point is more common for a single, self-contained piece of information (also an interesting construction).

In that sense, it’s similar to information which has no grammatical plural, and signifies an unspecified amount. So data is generally being used as an uncountable noun - what’s special is that it’s actually derived from a plural instead of a singular.

You basically said what I wanted to say. Interesting that you used the plural of information, eventhough I mostly encounter it as an uncountable noun :slight_smile:

You need to change the wording on the vote. Strictly speaking data is plural but few people stick to that when using it in writing (or conversation). So the question should not be whether it is plural or singular. Instead you should ask what “style guide” we should adopt.

You could avoid the whole issue and use the word dataset instead: “This dataset was imported and is now stored on the uMap server.” For the case when a single table was uploaded. Not sure if umap can upload multiple files in one go but if it can then: “These datasets…”

Both are valid forms, and the people that prefer data in the plural are monsters.

I’d suggest that both are correct in most cases, but whichever is more correct depends on context. If I was talking about lots of pieces of data from lots of different sources, or just one dataset, I’d use the singular form; if stressing that several different datasets were being used I might use the plural.

In general (British) English the singular form is more common, but often organisations have a house style, like the ONS (plural) or the Graun (singular).

It makes a change from discussing tracktype, I guess :smiley:

I have always understood data as plural because in reality data always come amass. A dataset (which is singular) usually contains many data. Very much the same in German language where data come as “Daten” which is 100% plural while a “Datensatz” (containing lots of “Daten”) is singular.

There is no commonly used term in german language for a “single data unit”. In fact the singular form of “Daten” is “Datum” which is primarily used for the chronologic date.

Coming back to english there is also “datum” for a single value. And again there is “date” for the chronological date which is needed to arrange for a date for instance.

My vote for plural 100% :smiling_face_with_sunglasses:.

As a native English speaker I can tell you that (for me) “these data were imported” just sounds wrong.

“These datasets were imported” is fine though.

So IMO “data” is singular but “datasets” is plural.

Sure, datasets is plural whereas dataset is singular. As the word says, a dataset contains a set of data. This alone should imply that data is plural, because if it is singular the dataset would contain a very singe data unit only.

If that is correct, which plural form of data would you use to make clear that the dataset contains many of them and not only a single one?

I always use data singular, as a mass noun, comparable to water.

Data is stored in containers (files, databases) and flows (or is sucked or pumped) from storage to storage.

If it’s about the actual pieces of information, in Dutch I use the Dutch word: “Gegevens”. Daten, auf Deutsch. In English this happens to be “data”, and in this case… I use it as a plural noun. Or, to emphasize it’s about the information, I would say details. But most of the time, data is just data (s.).

in Welsh it’s treated as plural, but for English I’m not sure. Singular sounds more natural.

I assume the original intention of latin speakers was to distinguish between one fact and several. As other people than Romans adopted rthe concepts they sometimes kept, sometimes changed the word. Usage and need for precision will determine choice of form.

p.

Also for translation, I always treat it as plural, because usually it’s being used when multiple values are referenced.
I hate English for the cases where even if you know it perfect, the same exact phrase and writing can have multiple meanings, unless you ask the writer what they meant. I face that issue sometimes during translations.