Recent Developments for the Linguistic Linked Open Data Infrastructure

written by Declerck, Thierry, McCrae, John, Hartung, Matthias, Gracia, Jorge, Chiarcos, Christian, Montiel, Elena, Cimiano, Philipp, Revenko, Artem, Sauri, Roser, Lee, Deirdre, Racioppa, Stefania, Nasir, Jamal, Orlikowski, Matthias, Lanau-Coronas, Marta, Fäth, Christian, Rico, Mariano, Elahi, Mohammad Fazleh, Khvalchik, Maria, Gonzalez, Meritxell, Cooney, Katharine on 2020-04-01

In this paper we describe the contributions made by the European H2020 project “Prêt-à-LLOD” (‘Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors’) to the further development of the Linguistic Linked Open Data (LLOD) infrastructu

Evaluating the Impact of Bilingual Lexical Resources on Cross-lingual Sentiment Projection in the Pharmaceutical Domain

written by Hartung, Matthias, Orlikowski, Matthias, Veríssimo, Susana on 2020-03-12

Rolling out text analytics applications or individual components thereof to multiple input languages of interest requires scalable workflows and architectures that do not rely on manual annotation efforts or language-specific re-engineering per target language. These scalability challenges aggravate

Linguistic Linked Open Data for All

written by John P. McCrae, Thierry Declerck on 2020-01-14

In this paper we briefly describe the European H2020 project "Prêt-à-LLOD" ('Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors'). This project aims to increase the uptake of language technologies by exploiting the combination of linke

Creation and Enrichment of a Terminological Knowledge Graph in the Legal Domain

written by Patricia Martín-Chozas on 2020-01-02

This Doctoral Consortium paper presents a methodology to automate the creation of rich terminologies from plain text documents, by establishing links to external resources and by adopting the W3C standards for the Semantic Web. The pro-posed method comprises six tasks: refinement, disambiguatio

Automatic Detection of Language and Annotation Model Information in CoNLL Corpora

written by Abromeit, Frank, Chiarcos, Christian on 2019-11-27

We introduce AnnoHub, an on-going effort to automatically complement existing language resources with metadata about the languages they cover and the annotation schemes (tagsets) that they apply, to provide a web interface for their curation and evaluation by means of domain experts, and to publish

Results of the Translation Inference Across Dictionaries 2019 Shared Task

written by Jorge Gracia, Besim Kabashi, Ilan Kernerman, Marta Lanau-Coronas, Dorielle Lonke on 2019-11-27

The objective of the Translation Inference Across Dictionaries (TIAD) shared task is to explore and compare methods and techniques that infer translations indirectly between language pairs, based on other bilingual/multilingual lexicographic resources. In its second, 2019, edition the participating

Translation Inference through Multi-lingual Word Embedding Similarity

written by Donandt, Kathrin, Chiarcos, Christian on 2019-11-27

This paper describes our contribution to the Shared Task on Translation Inference across Dictionaries (TIAD-2019). In our approach, we construct a multi-lingual word embedding space by projecting new languages in the feature space of a language for which a pretrained embedding model exists. We use t

Validating the OntoLex-lemon lexicography module with K Dictionaries' multilingual data

written by Julia Bosque-Gil, Dorielle Lonke, Jorge Gracia, Ilan Kernerman on 2019-11-27

The OntoLex-lemon model has gradually acquired the status of de-facto standard for the representation of lexical information according to the principles of Linked Data (LD). Exposing the content of lexicographic resources as LD brings both benefits for their easier sharing, discovery, reusability an

Challenges for the Representations for Morphology in Ontology Lexicons

written by Bettina Klimek, John P. McCrae, Maxim Ionov, James K. Tauber, Christian Chiarcos, Julia Bosque-Gil, Paul Buitelaar on 2019-10-25

Recent years have experienced a growing trend in the publication of language resources as Linguistic Linked Data (LLD) to enhance their discovery, reuse and the interoperability of tools that consume language data. To this aim, the OntoLex-lemon model has emerged as a de-facto standard to represent

A Character-Level LSTM Network Model for Tokenizing the Old Irish text of the Würzburg Glosses on the Pauline Epistles

written by Adrian Doyle, John P. McCrae, Clodagh Downey on 2019-08-28

This paper examines difficulties inherent in tokenization of Early Irish texts and demonstrates that a neural-network-based approach may provide a viable solution for historical texts which contain unconventional spacing and spelling anomalies. Guidelines for tokenizing Old Irish text are presented