Hapax legomenon
A hapax legomenon is a word that only occurs once in a corpus of text. The plural is either hapax legomena, or hapaxes. The word comes from Ancient Greek, and means (something) only said once.
In this context, a word that occurs twice is called dis legomenon (/ˈdɪs/), one that occurs three times tris legomenon (/ˈtrɪs/) and one that occurs four times tetrakis legomenon (/ˈtɛtrəkɪs/).
Hapax legomena are quite common, as predicted by Zipf's law,[1] which states that the frequency of any word in a work (corpus) is inversely related to its rank in the frequency table. For large corpora, about 40% to 60% of the words (counting by type) are hapax legomena, and another 10% to 15% are dis legomena.[2] In the Brown Corpus of American English, about half of the 50,000 words are hapax legomena within that corpus.[3]
Note that hapax legomenon refers to a word's appearance in a body of text, and does not talk about its origin nor how often it is used in speech. For this reason, it is different from a nonce word, which may never be recorded, or which may find currency and may be widely recorded, or which may appear several times in the work which coins it, and so on.
References
change- ↑ Paul Baker, Andrew Hardie, and Tony McEnery, A Glossary of Corpus Linguistics, Edinburgh University Press, 2006, page 81, ISBN 0-7486-2018-4.
- ↑ András Kornai, Mathematical Linguistics, Springer, 2008, page 72, ISBN 1-84628-985-8.
- ↑ Kirsten Malmkjær, The Linguistics Encyclopedia, 2nd ed, Routledge, 2002, ISBN 0-415-22210-9, p. 87.