written by
María G. Buey, Carlos Bobed, Jorge Gracia, Eduardo Mena
on 2021-03-23
Understanding the user's intention is crucial for many tasks that involve human-machine interaction. To that end, word sense disambiguation (WSD) techniques play an important role. WSD techniques typically require well-formed sentences as context to operate, as well as pre-defined catalogues of
written by
Christian Chiarcos, Tomas Mikolov et al.
on 2021-01-07
# Lemmatized English Word2Vec data
This is a version of the original GoogleNews-vectors-negative300 Word2Vec embeddings for English.
In addition, we provide the following modified files:
- converted to conventional CSV format (and gzipped)
- subclassified:
for the most frequent 1.000.000 wo
written by
Bharathi Raja Chakravarthi, Navaneethan Rajasekaran, Mihael Arcan, Kevin McGuinness, Noel E. O'Connor, John P. McCrae
on 2020-12-21
Bilingual lexicons are a vital tool for under-resourced languages and recent state-of-the-art approaches to this leverage pretrained monolingual word embeddings using supervised or semi-supervised approaches. However, these approaches require cross-lingual information such as seed dictionaries to tr
written by
Thierry Declerck, Jorge Gracia, John P. McCrae
on 2020-12-21
We present the current state of the large "European network for Web-centred linguistic data science". In its first phase, the network has put in place several working groups to deal with specific topics. The network also already implemented a first round of Short Term Scientific Missions (
written by
John P. McCrae, Ewa Rudnicka, Francis Bond
on 2020-12-21
No description
written by
Jamal A. Nasir, John P. McCrae
on 2020-12-21
The proliferation of the World Wide Web and the Semantic Web applications has led to an increase in distributed services and datasets. This increase has put the infrastructural load in terms of availability, immutability, and security, and these challenges are being failed by the Linked Open Data (L
written by
Bharathi Raja Chakravarth, Ruba Priyadharshini, Vigneshwaran Muralidaran, Shardul Suryawanshi, Navya Jose, Elizabeth Sherly, John P. McCrae
on 2020-12-21
Sentiment analysis of Dravidian languages has received attention in recent years. However, most social media text is code-mixed and there is no research available on sentiment analysis of code-mixed Dravidian languages. The Dravidian-CodeMix-FIRE 2020, a track on Sentiment Analysis for Dravidian Lan
written by
Christian
on 2020-12-18
The CoNLL-RDF ontology provides machine-readable semantics for an inventory of CoNLL properties (and classes) for a growing collection of about two dozen CoNLL and related formats currently used in language technology.
written by
Jorge Gracia, Christian Fäth, Matthias Hartung, Max Ionov, Julia Bosque-Gil, Susana Veríssimo, Christian Chiarcos, Matthias Orlikowski
on 2020-12-15
We describe the use of linguistic linked data to support a cross-lingual transfer framework for sentiment analysis in the pharmaceutical domain. The proposed system dynamically gathers translations from the Linked Open Data (LOD) cloud, particularly from Apertium RDF, in order to project a deep lear
written by
Racioppa, Stefania, Declerck, Thierry
on 2020-09-01
This zip file containes the results of the conversion of Mmorph morphologies into the OntoLex-Lemon model, using the Turtle syntax as the serialization method.
The content of the file is: 380.405 base forms and 2.534.735 fullforms, covering English, German French, Spanish ,Italian and Du