The next Cardamom Seminar Series will feature Ekaterina Vylomova talking about "Documenting and modelling inflectional paradigms in under-resourced languages". Register on Eventbrite: https://www.eventbrite.ie/e/cardamom-seminar-series-tickets-241186263607
The seminar will take place on Zoom at 10am (Dublin time) on the 1st Feb 2022
This talk will present the UniMorph project, an attempt to create a universal (cross-lingual) annotation schema. UniMorph allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and a bundle of universal morphological features defined by the schema. Since 2016, the UniMorph database has been gradually developed and updated with new languages, and SIGMORPHON shared tasks served as a platform to compare computational models of inflectional morphology. During 2016–2021, the shared tasks made it possible to explore the data-driven systems’ ability to learn declension and conjugation paradigms and evaluate how well they generalize across typologically diverse languages. It is essential since the elaboration of formal techniques of cross-language generalization and prediction of universal entities across related languages should provide new potential to the modelling and documentation of under-resourced languages. The talk will outline the major challenges we faced while converting the language-specific features into the UniMorph schema, especially in under-resourced languages. In addition, we will discuss typical errors made by the majority of the systems, e.g. incorrectly predicted instances due to allomorphy, form variation, misspelt words, looping effects. Finally, it will provide case studies for Russian, Tibetan, and Nen.
The Cardamom project is funded by the Irish Research Council and supported by Science Foundation Ireland as part of the Insight SFI Centre for Data Analytics.