Prêt-à-LLOD Projects

Our projects source code are all available at GitHub and docker images are available at Docker Hub

TermitUp Github and Dockerhub

TermitUp is a tool for terminology enrichment: given a domain specific corpus, TermitUp performs statistical terminology extraction (with TBXTools) and cleans the resulting term list with a series of liguistic processes. Then, it queries several language resources (some part of the Linguistic Linked Open Data cloud) for candidate terms matching those in the term list. TermitUp builds sense indicators for both the source and the candidate terms, and performs a Word Sense Disambiguation process (with Semantic Web Company's service), matching those concepts with the closest domain. From the concepts matched in the external resources, TermitUp retrieves every piece of information available (translations, synonyms, definitions and terminological relations), already disambiguated, and enriches the source term lists, creating links amongst the resources in the LLOD. Afterwards, TermitUp offers the possibility of creating hierarchical relations amongst the terms in the source list and also of validating the synonymy relations retrieved from the external resources. Finally, the results are published in separate json-ld files, modeled in SKOS-XL, that permits keeping the provenance of specific pieces of the data retrieved.

Teanga Github and Dockerhub and Documentation

Teanga is a command-line tool that automates and facilitates using sequences of containerized rest api as a workflow. It aims at creation of complex Natural Language Processing workflows using Airflow, OpenAPI specification and docker.

Term-à-LLOD Github

A system for transforming and publishing terminologies as linked data which relies on a virtualization approach (i.e. docker). The system simplifies the transformation and hosting of terminological resources. As a proof-of-concept we publish and link the well-known IATE terminology as well as to various smaller terminologies. The program is written in java.

Policy Driven Data Manager (PDDM) Github and Deployed here

PDDM offers everything necessary for a policy-driven data management: data models (vocabularies to support RDF licensing data), data (licenses as RDF!), and services (HTTP REST API for compliance checking).

CIDER-EM Github and Dockerhub

Context and Inference baseD ontology alignER. CIDER-EM is a word-embedding-based system for monolingual and cross-lingual ontology alignment. Evolves the CIDER-CL tool by including the use of word embeddings.

Linghub Deployed here

Linghub is the new version of Linghub.org a centralised repository of metadata about linguistic resources that helps you to find langauge resources for your applications

Ontology Lexicalization Github and Dockerhub

The system extracts property lexicalization patterns from a set of annotated (i.e. DBpedia/Wikidata annotated) Corpora and to To extract RDF graph patterns corresponding to an ngram and develops a Lemon Lexicon. The program is developed using perl and java.

Fintan Github and Dockerhub

Flexible INtegrated Transformation and Annotation eNgineering platform

Otic Github and Dockerhub

This service executes the "One Time Inverse Consultation" algorithm to obtain translation pairs between two languages in order to get indirect translations.

This method has been used as baseline in the TIAD task (Translation inference across dictionaries) in 2019, showing good results in comparison with the participants.

The One Time Inverse Consultation (OTIC) method was proposed by Tanaka and Umemura [1] in 1994, and adapted by Lin et. al [2] for the creation of multilingual lexicons. In short, the idea of the OTIC method is to explore, for a given word, the possible candidate translations that can be obtained through intermediate translations in the pivot language. Then, a score is assigned to each candidate translation based on the degree of overlap between the pivot translations shared by both the source and target words.

Cycles Github and Dockerhub

This project computes the cycle-based method to obtain translation pairs between two languages in order to get indirect translations.

This method has participate in the TIAD task (Translation inference across dictionaries) in 2020 in combination with the OTIC method, showing good results in comparison with other participants [1].

This technique was proposed by [2] in 2006. The idea was exploting the properties of the Apertium RDF Graph, by using cycles to identify potential targets that may be a translation of a given word.