A Domain Independent Semantic Measure for Keyword Sense Disambiguation

written by María G. Buey, Carlos Bobed, Jorge Gracia, Eduardo Mena on 2021-03-23

Understanding the user's intention is crucial for many tasks that involve human-machine interaction. To that end, word sense disambiguation (WSD) techniques play an important role. WSD techniques typically require well-formed sentences as context to operate, as well as pre-defined catalogues of

Lemmatized English Word2Vec data

written by Christian Chiarcos, Tomas Mikolov et al. on 2021-01-07

# Lemmatized English Word2Vec data This is a version of the original GoogleNews-vectors-negative300 Word2Vec embeddings for English. In addition, we provide the following modified files: - converted to conventional CSV format (and gzipped) - subclassified:   for the most frequent 1.000.000 wo

Bilingual Lexicon Induction across Orthographically-distinct Under-Resourced Dravidian Languages

written by Bharathi Raja Chakravarthi, Navaneethan Rajasekaran, Mihael Arcan, Kevin McGuinness, Noel E. O'Connor, John P. McCrae on 2020-12-21

Bilingual lexicons are a vital tool for under-resourced languages and recent state-of-the-art approaches to this leverage pretrained monolingual word embeddings using supervised or semi-supervised approaches. However, these approaches require cross-lingual information such as seed dictionaries to tr

COST Action "European network for Web-centred linguistic data science" (NexusLinguarum)

written by Thierry Declerck, Jorge Gracia, John P. McCrae on 2020-12-21

We present the current state of the large "European network for Web-centred linguistic data science". In its first phase, the network has put in place several working groups to deal with specific topics. The network also already implemented a first round of Short Term Scientific Missions (

English WordNet: A new open-source WordNet for English

written by John P. McCrae, Ewa Rudnicka, Francis Bond on 2020-12-21

No description

iLOD: InterPlanetary File System based Linked Open Data Cloud

written by Jamal A. Nasir, John P. McCrae on 2020-12-21

The proliferation of the World Wide Web and the Semantic Web applications has led to an increase in distributed services and datasets. This increase has put the infrastructural load in terms of availability, immutability, and security, and these challenges are being failed by the Linked Open Data (L

Overview of the track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text

written by Bharathi Raja Chakravarth, Ruba Priyadharshini, Vigneshwaran Muralidaran, Shardul Suryawanshi, Navya Jose, Elizabeth Sherly, John P. McCrae on 2020-12-21

Sentiment analysis of Dravidian languages has received attention in recent years. However, most social media text is code-mixed and there is no research available on sentiment analysis of code-mixed Dravidian languages. The Dravidian-CodeMix-FIRE 2020, a track on Sentiment Analysis for Dravidian Lan

CoNLL-RDF ontology

written by Christian on 2020-12-18

The CoNLL-RDF ontology provides machine-readable semantics for an inventory of CoNLL properties (and classes) for a growing collection of about two dozen CoNLL and related formats currently used in language technology.

Leveraging Linguistic Linked Data for Cross-Lingual Model Transfer in the Pharmaceutical Domain

written by Jorge Gracia, Christian Fäth, Matthias Hartung, Max Ionov, Julia Bosque-Gil, Susana Veríssimo, Christian Chiarcos, Matthias Orlikowski on 2020-12-15

We describe the use of linguistic linked data to support a cross-lingual transfer framework for sentiment analysis in the pharmaceutical domain. The proposed system dynamically gathers translations from the Linked Open Data (LOD) cloud, particularly from Apertium RDF, in order to project a deep lear

Mmorph_ttl

written by Racioppa, Stefania, Declerck, Thierry on 2020-09-01

This zip file containes the results of the conversion of Mmorph morphologies into the OntoLex-Lemon model, using the Turtle syntax as the serialization method.  The content of the file is: 380.405 base forms and 2.534.735 fullforms, covering English, German French, Spanish ,Italian and Du