Dragonfly: Advances in Non-Speaker Annotation for Low Resource Languages
Cash Costello, Shelby Anderson, Caitlyn Bishop, James Mayfield, Paul McNamee
Abstract
Dragonfly is an open source software tool that supports annotation of text in a low resource language by non-speakers of the language. Using semantic and contextual information, non-speakers of a language familiar with the Latin script can produce high quality named entity annotations to support construction of a name tagger. We describe a procedure for annotating low resource languages using Dragonfly that others can use, which we developed based on our experience annotating data in more than ten languages. We also present performance comparisons between models trained on native speaker and non-speaker annotations.- Anthology ID:
- 2020.lrec-1.862
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 6983–6987
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.862
- DOI:
- Bibkey:
- Cite (ACL):
- Cash Costello, Shelby Anderson, Caitlyn Bishop, James Mayfield, and Paul McNamee. 2020. Dragonfly: Advances in Non-Speaker Annotation for Low Resource Languages. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6983–6987, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Dragonfly: Advances in Non-Speaker Annotation for Low Resource Languages (Costello et al., LREC 2020)
- Copy Citation:
- PDF:
- https://aclanthology.org/2020.lrec-1.862.pdf
Export citation
@inproceedings{costello-etal-2020-dragonfly, title = "{D}ragonfly: Advances in Non-Speaker Annotation for Low Resource Languages", author = "Costello, Cash and Anderson, Shelby and Bishop, Caitlyn and Mayfield, James and McNamee, Paul", editor = "Calzolari, Nicoletta and B{\'e}chet, Fr{\'e}d{\'e}ric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Moreno, Asuncion and Odijk, Jan and Piperidis, Stelios", booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference", month = may, year = "2020", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://aclanthology.org/2020.lrec-1.862", pages = "6983--6987", abstract = "Dragonfly is an open source software tool that supports annotation of text in a low resource language by non-speakers of the language. Using semantic and contextual information, non-speakers of a language familiar with the Latin script can produce high quality named entity annotations to support construction of a name tagger. We describe a procedure for annotating low resource languages using Dragonfly that others can use, which we developed based on our experience annotating data in more than ten languages. We also present performance comparisons between models trained on native speaker and non-speaker annotations.", language = "English", ISBN = "979-10-95546-34-4", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="costello-etal-2020-dragonfly"> <titleInfo> <title>Dragonfly: Advances in Non-Speaker Annotation for Low Resource Languages</title> </titleInfo> <name type="personal"> <namePart type="given">Cash</namePart> <namePart type="family">Costello</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Shelby</namePart> <namePart type="family">Anderson</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Caitlyn</namePart> <namePart type="family">Bishop</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">James</namePart> <namePart type="family">Mayfield</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Paul</namePart> <namePart type="family">McNamee</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2020-05</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <language> <languageTerm type="text">English</languageTerm> <languageTerm type="code" authority="iso639-2b">eng</languageTerm> </language> <relatedItem type="host"> <titleInfo> <title>Proceedings of the Twelfth Language Resources and Evaluation Conference</title> </titleInfo> <name type="personal"> <namePart type="given">Nicoletta</namePart> <namePart type="family">Calzolari</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Frédéric</namePart> <namePart type="family">Béchet</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Philippe</namePart> <namePart type="family">Blache</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Khalid</namePart> <namePart type="family">Choukri</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christopher</namePart> <namePart type="family">Cieri</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Thierry</namePart> <namePart type="family">Declerck</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sara</namePart> <namePart type="family">Goggi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hitoshi</namePart> <namePart type="family">Isahara</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bente</namePart> <namePart type="family">Maegaard</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Joseph</namePart> <namePart type="family">Mariani</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hélène</namePart> <namePart type="family">Mazo</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Asuncion</namePart> <namePart type="family">Moreno</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jan</namePart> <namePart type="family">Odijk</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Stelios</namePart> <namePart type="family">Piperidis</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>European Language Resources Association</publisher> <place> <placeTerm type="text">Marseille, France</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> <identifier type="isbn">979-10-95546-34-4</identifier> </relatedItem> <abstract>Dragonfly is an open source software tool that supports annotation of text in a low resource language by non-speakers of the language. Using semantic and contextual information, non-speakers of a language familiar with the Latin script can produce high quality named entity annotations to support construction of a name tagger. We describe a procedure for annotating low resource languages using Dragonfly that others can use, which we developed based on our experience annotating data in more than ten languages. We also present performance comparisons between models trained on native speaker and non-speaker annotations.</abstract> <identifier type="citekey">costello-etal-2020-dragonfly</identifier> <location> <url>https://aclanthology.org/2020.lrec-1.862</url> </location> <part> <date>2020-05</date> <extent unit="page"> <start>6983</start> <end>6987</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T Dragonfly: Advances in Non-Speaker Annotation for Low Resource Languages %A Costello, Cash %A Anderson, Shelby %A Bishop, Caitlyn %A Mayfield, James %A McNamee, Paul %Y Calzolari, Nicoletta %Y Béchet, Frédéric %Y Blache, Philippe %Y Choukri, Khalid %Y Cieri, Christopher %Y Declerck, Thierry %Y Goggi, Sara %Y Isahara, Hitoshi %Y Maegaard, Bente %Y Mariani, Joseph %Y Mazo, Hélène %Y Moreno, Asuncion %Y Odijk, Jan %Y Piperidis, Stelios %S Proceedings of the Twelfth Language Resources and Evaluation Conference %D 2020 %8 May %I European Language Resources Association %C Marseille, France %@ 979-10-95546-34-4 %G English %F costello-etal-2020-dragonfly %X Dragonfly is an open source software tool that supports annotation of text in a low resource language by non-speakers of the language. Using semantic and contextual information, non-speakers of a language familiar with the Latin script can produce high quality named entity annotations to support construction of a name tagger. We describe a procedure for annotating low resource languages using Dragonfly that others can use, which we developed based on our experience annotating data in more than ten languages. We also present performance comparisons between models trained on native speaker and non-speaker annotations. %U https://aclanthology.org/2020.lrec-1.862 %P 6983-6987
Markdown (Informal)
[Dragonfly: Advances in Non-Speaker Annotation for Low Resource Languages](https://aclanthology.org/2020.lrec-1.862) (Costello et al., LREC 2020)
- Dragonfly: Advances in Non-Speaker Annotation for Low Resource Languages (Costello et al., LREC 2020)
ACL
- Cash Costello, Shelby Anderson, Caitlyn Bishop, James Mayfield, and Paul McNamee. 2020. Dragonfly: Advances in Non-Speaker Annotation for Low Resource Languages. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6983–6987, Marseille, France. European Language Resources Association.