Measuring Prefixation and Suffixation in the Languages of the World

Harald Hammarström

doi:10.18653/v1/2021.sigtyp-1.8

Measuring Prefixation and Suffixation in the Languages of the World

Abstract

It has long been recognized that suffixing is more common than prefixing in the languages of the world. More detailed statistics on this tendency are needed to sharpen proposed explanations for this tendency. The classic approach to gathering data on the prefix/suffix preference is for a human to read grammatical descriptions (948 languages), which is time-consuming and involves discretization judgments. In this paper we explore two machine-driven approaches for prefix and suffix statistics which are crude approximations, but have advantages in terms of time and replicability. The first simply searches a large collection of grammatical descriptions for occurrences of the terms ‘prefix’ and ‘suffix’ (4 287 languages). The second counts substrings from raw text data in a way indirectly reflecting prefixation and suffixation (1 030 languages, using New Testament translations). The three approaches largely agree in their measurements but there are important theoretical and practical differences. In all measurements, there is an overall preference for suffixation, albeit only slightly, at ratios ranging between 0.51 and 0.68.

Anthology ID:: 2021.sigtyp-1.8
Volume:: Proceedings of the Third Workshop on Computational Typology and Multilingual NLP
Month:: June
Year:: 2021
Address:: Online
Editors:: Ekaterina Vylomova, Elizabeth Salesky, Sabrina Mielke, Gabriella Lapesa, Ritesh Kumar, Harald Hammarström, Ivan Vulić, Anna Korhonen, Roi Reichart, Edoardo Maria Ponti, Ryan Cotterell
Venue:: SIGTYP
SIG:: SIGTYP
Publisher:: Association for Computational Linguistics
Note:
Pages:: 81–89
Language:
URL:: https://aclanthology.org/2021.sigtyp-1.8
DOI:: 10.18653/v1/2021.sigtyp-1.8
Bibkey:
Cite (ACL):: Harald Hammarström. 2021. Measuring Prefixation and Suffixation in the Languages of the World. In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, pages 81–89, Online. Association for Computational Linguistics.
Cite (Informal):: Measuring Prefixation and Suffixation in the Languages of the World (Hammarström, SIGTYP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.sigtyp-1.8.pdf

PDF Cite Search