Contributions of Prêt-à-LLOD to standardization activities

written by Thierry Declerck on 2021-09-28

frontpage.jpeg

The use of and the contribution to standards is a way to ensure sustainability of results (algorithms, data, formats, guidelines, etc.) delivered by a project. Prêt-à-LLOD is particularly involved in W3C standardisation activities, having as a focus the further development of modules for the OntoLex-Lemon specifications, developed in the context of a W3C Community Group https://www.w3.org/2016/05/ontolex/. Already in September 2019 an extension module called “Lexicog”, dealing with the representation of dictionaries and other linguistic resources containing lexicographic data, has been published https://www.w3.org/2019/09/lexicog/

We list in this blog post standardisation activities Prêt-à-LLOD partners are currently involved in:

  • Ontolex Module for Morphology: Several partners of Prêt-à-LLOD are contributing to the Ontolex module for Morphology, which is now in an advanced stage so that we can expect it to be published in the lifetime of the project. The current state of the module development is available at https://www.w3.org/community/ontolex/wiki/Morphology.
  • OntoLex Module for Frequency, Attestations and Corpus Information (OntoLex-Frac): Goethe University Frankfurt is leading this standardisation activities, which aims to (1) extend OntoLex with corpus information to address challenges in lexicography, (2) model lexical and distributional-semantic resources (dictionaries, embeddings) as RDF graphs, (3) provide an abstract model of relevant concepts in distributional semantics that facilitates applications that integrate both lexical and distributional information. The current state of development of the FRaC module can be seen at: https://github.com/ontolex/frequency-attestation-corpus-information/blob/master/index.md. More recently, discussions on how to include multimodal language data have been pursued in the context of this working group.
  • The Prêt-à-LLOD partners UPM and DFKI are working on a first version of a candidate extension to Ontolex-Lemon to represent terminologies generated from heterogeneous data sources. A draft has already been published in the wiki of the W3C community group: https://www.w3.org/community/ontolex/wiki/Terminology The Ontolex Community Group will soon start discussing this draft.
  • LexInfo 3.0: LexInfo is an ontology that was defined to provide data categories for the Lemon model. It has since since been updated with the new OntoLex-Lemon model of the OntoLex community group. A new version of the LexInfo ontology has been published, under the direction of the Prêt-à-LLOD partner NUIG. This version is available at https://github.com/ontolex/lexinfo/blob/master/ontology/3.0/lexinfo.owl
  • META-SHARE metadata ontology: Prêt-à-LLOD partners are contributing to the finalisation of the META-SHARE metadata ontology. The current state of development is documented at: http://w3id.org/meta-share/meta-share
  • The Prêt-à-LLOD UPM continues its contributions to the Open Digital Rights Language (ODRL) W3C Community Group and implements aspects of Task 5.2 of the project “Policy-driven language resource discovery and access” in compliance with the current version (2.2) of ODRL (https://www.w3.org/TR/2018/REC-odrl-model-20180215/)

Prêt-à-LLOD partners are also involved and leading another W3C Community Group: The Linked Data for Language Technologies (LD4LT) is a community group that was founded in the previous FP7 project “LIDER'' and that is used for the broader discussion of issues related to linked data and its applications in NLP. The Prêt-à-LLOD project revitalized the work in this Community Group. As such, this group is discussing public activities, such as the META-SHARE OWL ontology, mentioned above. More recently, a focus has been in establishing contacts with relevant past and present initiatives in the ISO standardization communities, and work has started, for example, in porting the ISO SynAF standard onto OntoLex-Lemon, using also OLiA and LexInfo vocabularies, as well as insights from the W3C Web Annotation data model (https://www.w3.org/TR/annotation-model/). A LD4LT annotation workshop has been recently organized (https://www.w3.org/community/ld4lt/wiki/LD4LT_Annotaton_Workshop_Zaragoza_2021), as a satellite event to the 3r Language, Data and Knowledge (LDK 2021) conference, which was also supported by Prêt-à-LLOD (http://2021.ldk-conf.org/).