Abstract
A vision of a truly multilingual Semantic Web has found strong support with the Linguistic Linked Open Data community. Standards, such as OntoLex-Lemon, highlight the importance of explicit linguistic modeling in relation to ontologies and knowledge graphs. Nevertheless, there is room for improvement in terms of automation, usability, and interoperability. Neural Language Models have achieved several breakthroughs and successes considerably beyond Natural Language Processing (NLP) tasks and recently also in terms of multimodal representations. Several paths naturally open up to port these successes to the Semantic Web, from automatically translating linguistic information associated with structured knowledge resources to multimodal question-answering with machine translation. Language is also an important vehicle for culture, an aspect that deserves considerably more attention. Building on existing approaches, this article envisions joint forces between Neural Language Models and Semantic Web technologies for multilingual, transcultural, and multimodal information access and presents open challenges and opportunities in this direction.
Introduction
One central endeavor of the Semantic Web (SW) [7] is intelligent access to heterogeneous and distributed sources of knowledge. However, limiting this access to natural languages predominant in the world inevitably creates biases and hegemonies. Supporters of a multilingual SW can account for several successes to overcome the language barrier, from multilingual structured knowledge resources, such as BabelNet [55] and Framester [23], to multilingual methods and applications (cf. e.g. [9]). Nevertheless, approaches that further improve the level of automation, usability, and interoperability are required.
This article proposes a vision that is based on Neural Language Models (NLMs) to foster a multilingual, transcultural, and multimodal Semantic Web. Its contribution is a detailed exploration of this vision based on existing approaches and an outline of currently valid challenges and envisioned opportunities, which provides a solid starting point for the Semantic Web and NLP community to initiate and/or advance such interdisciplinary research.
A language model is designed to assign probabilities to an input sequence, i.e., learn a joint probability function of sequences of signs. Based on this idea, powerful Natural Language Processing (NLP) applications from machine translation (e.g. [67]) and natural language generation (e.g. [24]) to textual entailment (e.g. [5,15]) have been proposed.
NLMs learn implicit semantic representations of sequences on their hidden layer(s), resulting in a dense real-valued vector for each word, phrase, sentence, document, or knowledge base triple, which turned out to be a powerful representation. Such embeddings have been applied to a large variety of traditional SW tasks, from link prediction to ontology alignment [31]. Recent NLMs have provided a strong backbone to many Artificial Intelligence (AI) applications that go beyond traditional NLP tasks, see for instance [59] for a wide range of tasks, including a new best performance on the Winograd Schema Challenge. Resolving pronouns in such schemas requires world knowledge, such as spatio-temporal relations and mental states.
Regarding automation and usability of SW technologies, NLMs have successfully been applied to translating from natural language to natural language but also to ontology representation [58] and structured query [71] languages. Automatically translating natural language questions to queries can improve the usability of SW query interfaces. However, the usage of NLMs goes considerably beyond translating languages, structured or unstructured. Neural Machine Translation (NMT) based on NLMs has even been applied to noise-tolerant RDFS reasoning [48].
Language enables communication and at the same time serves as a vehicle for cultural and social identity. This function of natural language should find consideration in approaches to the multilingual Semantic Web by building on decades of research on cross-cultural and transcultural communication (e.g. [36]). NLMs potentially provide interesting methods to port information learned for one language and culture to another in form of domain adaptation and transfer learning [6,47]. Nevertheless, a more thorough basis is required to capture cultural aspects, such as cognitive principles guiding our communication.
Communication in natural language is by no means confined to textual boundaries and can be signed, spoken, or written. This calls for multimodal representations of language in relation to SW technologies, which finds strong support in state-of-the-art language modeling. Recent advances of NLMs provide powerful approaches that allow flexible alignments between text and video [65] and translate directly from speech to speech without a need for textual transcriptions [40].
In short, this vision goes beyond plurality of language and envisions multilingual, transcultural, and multimodal information access backed by NLMs and the Semantic Web. As preliminaries, this article first briefly defines language models and the Multilingual Semantic Web. The sections Multilingual, Transcultural, and Multimodal detail existing joint approaches on different SW tasks, each of which is followed by a description of the challenges and opportunities for joining language modeling and SW approaches. Neither of these can be fully accounted for in this article, but are detailed to the point of grounding envisioned future research directions.
Language model: A brief definition
Language modeling has been key to the success of NLP applications and tasks, such as machine translation, speech recognition, question-answering, spelling correction and many more. A language model (LM) predicts a probability of a previously unseen sequence of words based on a preceding learned probability distribution over the whole vocabulary of the training corpus. In general, the joint probability of a sequence is decomposed as the product of conditional probabilities of co-occurring words, two in the case of bigrams, three in the case of trigrams, and so on. Smoothing is a procedure to avoid zero probabilities due to unknown words (cf. e.g. [28] for more details).
Neural language models (NLMs) can learn distributed representations without smoothing and generalize well across contexts. Training tasks generally consist in predicting a center word of a sequence given its context words (skip-gram) or predicting a context given a center word (CBOW), which where popularly introduced by [52]. Both tasks train word embeddings as values of their hidden layers and the base methods have been extended to train vector representations of knowledge graph triples (e.g. [8]).
A common architectural style is that of encoder-decoder, either of which can be independent models. One recent best-performer in machine translation, the Transformer model [67], has soon been propagated to many NLP tasks, from multi-task approaches [59] to text-video combinations [65]. A Transformer combines multi-head attention, that is, a mechanism to single out central words in sequences for given queries, and feedfoward layers in an encoder-decoder architecture. Frequently, this architecture is combined with Byte Pair Encodings (PBE), a form of data compression [22] that iteratively merges most frequent characters or character sequences with a single, unused byte. Since it evaluates words on character-level it strongly mitigates the problem of unknown words.
Multilingual Semantic Web: A long-standing endeavor
For several decades multiple research endeavors [9,50] have made it their mission to provide a truly multilingual SW. To this end, algorithms and systems are required that help overcome linguistic and national boundaries, to grant information access to users of different cultures and languages. Limiting such access to languages spoken by majorities inevitably creates a bias. The SW, with its language-independent representation of knowledge, provides an excellent anchor point for multilingual, transcultural, and multimodal information access.
As a first step towards a multilingual SW, several mediation mechanisms to translate between abstract conceptual layers and lexical manifestations, which frequently are different across languages and cultures, have been proposed. In fact, concepts might exist in one language but not in another, so called lexical gaps, such as the German “Schadenfreude” (joy when something bad happens to someone else) that has been readily adopted in English due to a lack of an equivalence.
Knoweldge representation needs to be able to accommodate such differences. First, the OntoLex-Lemon model that provides an ontology-lexicon interface has found broad uptake by the community and has recently been published as a W3C report [14]. Second, similar models have been proposed to interchange domain-specific terminological information grounded in ontological resources [29]. Combined representations of linguistic, terminological, and ontological knoweldge have been modeled [17]. As a final example, the NLP Interchange Format (NIF) [35] based on Linked Data principles serves to improve the interoperability of NLP tools.
Rich combinations of structured knowledge and linguistic information can be applied to a variety of tasks, such as ontology-based information extraction [21], completing and correcting natural language information [30], translating from knowledge resource to natural language and/or vice versa [26], and ontology learning from text [60].
Over the past few decades the Linked Open Data (LOD) cloud and resources published in the Resource Description Framework (RDF) and Web Ontology Language (OWL) have experiences a tremendous growth, however, predominantly in English with several notable exceptions, such as BabelNet [55] and WikiData [68]. To foster this endeavor, automated means, such as NLMs, can improve and fasten approaches. While this section detailed initiatives by the multilingual SW community, Section 4 focuses on the utilization of NLMs towards a multilingual SW.
Multilingual
Within the context of this article, multilingual refers to this aspect of the presented vision, that is, how NLMs can contribute to multilingual SW tasks and technologies.
Machine translating the SW One most immediate application scenario of NLMs is the translation of natural language contents of the SW. Ontology labels, especially in domain ontologies, provide a rich terminological layer, but are still predominantly in English. To overcome this problem, Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) models have been applied to translate ontology labels [6]. As an interesting side-aspect, the impact of injection approaches of domain-specific terminological knowledge to NMT and SMT on the translation quality are evaluated. The most promising knowledge augmentation method is domain adaptation of a trained model with terminological expressions, which has been utilized before to translate ontology labels [49] and fine-tune machine translation [62].
Challenges and opportunities As concluded in a recent survey on machine translation and SW technologies [54], this combination is still in its infancy. SW technologies have the potential to aid NMT models for disambiguating senses and targeting NMT to particular domains of discourse, which in turn can be applied to produce multilingual domain-specific ontology descriptions.
A most promising direction for such combinations lies in the injection of domain, lexical, and terminological knowledge into NMT systems. While some preliminary evaluations, such as the impact of linguistic processing results injected into a neural question-answering system, are available [38], a systematic investigation is yet to be performed. Knowledge injection or augmentation holds the potential to help bridge the neural-symbolic gap (cf. e.g. [37]) and support Explainable AI (cf. e.g. [44]). It refers to the task of actively including external (knowledge) resources in the training process of NLMs, e.g. by continuing training on a pre-trained model with such a resource or by adjusting the attention mechanism during training with external knowledge.
This topic of injecting knowledge to NLMs implicitly raises an important challenge posed to the Semantic Web community. Current semantic representations, such as RDF and OWL, while readily embraced by the multilingual Semantic Web and Linguistic Linked Open Data (LLOD) community, might require an adaptation towards the more lightweight end in order to be readily adopted by the machine learning and NLP community.
Finally, a fully automated and NLM-based translation of existing ontology labels to rich linguistic representations in form of ontology-lexicon or ontology-terminology models would be a very interesting application of NMT, which brings us to the next topic of learning structured languages.
Machine translating to structured languages NMT can not only translate natural languages. Early neural approaches utilized joint knowledge base and language embeddings to extract relations [69]. [25] utilize multilingual natural-language patterns to learn RDF predicates, which are refined by way of a feedforward neural network. Recent approaches treat the entire problem of structure learning as a machine translation task and utilized an NMT system to learn a specific subset of Description Logic formulas from definitions [58]. For instance, from the input A bee is an animal that produces honey the model produces
A long-standing endeavor in Semantic Web research has been the automated translation of natural language questions to SPARQL queries. Since SPARQL requires syntactic and semantic expertise, a translation from natural language could considerably boost its uptake and make Semantic Web resources broadly available without any prior knowledge of representation and query languages. A broad test of existing NMT models to the task of translating from natural language to SPARQL has been proposed [71].
Challenges and opportunities One substantial future application scenario of NLMs is that of learning structured knowledge resources. Ontology learning experiments with NLMs focus on a subset of logical expressions and on English only. However, automating the process of extracting structured knowledge from natural languages, holds the promise of obtaining conceptualizations specific to the language and culture.
This joining of both technologies is not only attractive for its promised speed of creating resources, but also for the ability to adapt trained models to new domains and languages, such as zero-shot translation, the ability to translate from one language to another without ever explicitly training the language model on this particular language pair. Thus, these approaches have the potential to consider low-resource languages. As a central problem in NLP is the predominance of certain widespread language varieties in applications, this could boost the uptake of SW technologies in machine learning.
Machine translation for reasoning A very recent approach is that of tailoring embeddings to accommodate RDFS reasoning in an NMT task [48]. To this end, RDF graphs are layered and encoded as adjacency matrices, where each layer layout represents a graph word. Input graph and entailments are then represented as sequences of graph words, which enables treating RDFS reasoning as a machine translation task.
Challenges and opportunities Deductive reasoning as a machine translation task is attractive due to its potential reasoning speed, a major challenge for reasoning engines. Encoding information as input to NLM-based reasoning engines is an open research topic. [48] suggest layered graph word embeddings as a first approach. However, there is a lot of room for experimentation and further proposals in this regard.
It would be interesting to evaluate whether embeddings learned for the purpose of reasoning in one formalism might allow for transitioning to or similarity measures of elements or statements in different representation languages. In other words, this could be a potential approach for tackling diversity of ontology languages, as proposed by [53] in form of a meta language. For instance, distributed vector representations learned for an ontology represented in RDF might be comparable to embeddings trained on an OWL ontology or on information encoded in the Unified Modeling Language (UML).
The challenge in combining machine learning and logic lies in conciliating advantages of both without aggravating their limitations. The advantages of symbolic approaches are the provision of sound and explainable reasoning, while neural approaches with NLMs have the potential of providing fast and robust learning. The difficulty in combining them, in fact, is the current low-level representation of information in neural approaches instead of symbolic representations as done in logic. An additional opportunity here is a hybrid interaction of both methodologies, as proposed for image segmentation by [4]. The ability of a fully integrated or hybrid solution to support explainability of NLMs might currently be the most trendy opportunity. A discussion specific to neural-symbolic systems can be found in [37] in this issue.
NLM-based ontology alignment has been successfully applied to matching knowledge bases. For instance, utilizing multilingual pretrained embeddings, domain-specific industry classification standards could be aligned [31]. The task of aligning large ontologies has been subdivided into smaller, tractable tasks utilizing a lexical index, neural embeddings, and locality models [41].
A broader alignment strategy is that of bringing together a multitude of resources from the Linguistic Linked Open Data (LLOD) cloud with ontology resources in Framester [23]. Based on this resource, frame-based embeddings are trained and utilized for knowledge reconciliation purposes [3], but could also be applied to a wide range of NLP tasks.
Challenges and opportunities NLM-based alignment strategies could benefit from the previous tasks in form of using neural-symbolic reasoning to align large multilingual, transcultural, and multimodal ontologies. In addition, the substantial surge of knowledge graph embedding approaches could be joint with the multitude of word embedding models, building on and contributing to the tradition of modeling at the ontology-natural language interface of the SW community. Joint NLM- and SW-based alignment approaches can potentially also foster the transitioning from knowledge represented in one cultural context to another.
Transcultural
When it comes to culture, a multitude of prefixes is commonplace: cross-cultural, intercultural, multicultural, and transcultural. Cross-cultural refers to analytic comparative approaches of different cultures. Intercultural generally establishes a certain understanding for different cultures. Multicultural refers to a plurality of cultures even within a society. And finally transcultural refers to a social concept that denotes a joint shared culture irrespective of origin or nationality. With an ever-growing global connectivity, this last prefix best denotes what this vision entails. Rather than a mere coexisting alignment of cultural representations, a capacity to move between and within cultural and social identity is foreseen. Importance of differences in semantic modeling across cultures finds support in cross-cultural neuro-scientific findings that show differences in categorization and in processing semantic relationships across cultures [34].
Cultural evolution Cultural evolution is closely tied to evolutionary biology science and Darwinian evolutionary principles [51]. A set of algorithms based on evolution by natural selection, that is, variation, hereditiy, and selection, has been put forward and recently extended by fission, fusion, and cooperation in their application to cultural phenomena [66]. As a basic assumption, biological concepts for the origin of living beings can be mapped to the cultural and linguistic domain, which have then been combined in a theory of cultural language evolution [64]. An application of the SW tests this assumption in terms of ontology alignments and evolutionary alignment repair in cultural environments utilizing a multi-agent system [20].
Challenges and opportunities A theory and experiments for the cultural evolution of human language has been thoroughly investigated [64]. It studies, for instance, how linguistic variants are generated in a population and on which basis some variants survive and become dominant. As a social phenomenon, language features cooperative interaction patterns, such as open-ended questions. While many of these phenomena are language-specific, it is highly unlikely that vocabularies and grammars are stored as different language systems by users. Instead, a widely accepted theory is that of storing such knowledge in form of constructions, associating meaning with form. One construction stores many constraints for efficient parsing and is presumed to incorporate several different language systems [64].
Various grammars have been proposed to capture such linguistic constructions. NLMs potentially complement tested construction grammar approaches, increasing the level of automation and potentially domain coverage by means of transfer learning. In particular, NLM-based multi-agent system negotiations of meaning could foster transcultural modeling of cultural evolution. An alternative way of modeling cultural language evolution is that of formal game-theoretic semantics. Bringing NLMs and formal semantics of constructions spanning across language systems together could potentially boost a transcultural SW. In fact, a SW that facilitates cultural exchange cannot only support human users, but building on evolutionary theories can foster a dynamically evolving SW that embraces diversity rather than rigidity, which again raises the challenge of less focus on formalization and more focus on usability and robustness.
Cultural heritage denotes physical artifacts as much as intangible attributes of a culture or society from the past. Several SW approaches can be found, from cultural heritage modeling (e.g, [39]) to creating ontology-based lexicographic tools for the study of ancient culture to enable object-multilingual links [56]. Culture-specific knowledge graphs of cultural heritage have been proposed, such as for Italy [13]. While there is a multitude of NLM-based approaches, little overlap could be detected between NLM- and SW-based research on cultural heritage.
Challenges and opportunities The range of possible joint approaches of NLM and SW technologies to model cultural heritage includes all of the multilingual approaches presented above and most of the multimodal approaches presented below. For instance, based on knowledge graphs, NLMs can be utilized to analyze similarities and differences across cultural heritages as well as refine technologies to share and analyze cultural data. Neural-symbolic reasoning could be particularly powerful for such alignments.
One of the most central challenges in terms of cultural heritage is the linking and representing of cultural data in a harmonized way across individual data collections. Here symbolic ontology-based integration methods could strongly benefit from NLMs and their ability to detect similarities even with noisy data, bringing together rich semantic representations and noise-tolerant, robust learners.
On a more local level, organizations procuring cultural heritage data are frequently interested in a possibility to provide highly personalized user experiences. For instance, a virtual tour through a museum or archaeological site should be fully in the control of each user. In this direction it would be interesting to explore the joint power of NLM-based SW technologies or SW-empowered NLMs to suggest or predict interesting paths or items for each individual user.
Culture-specific modeling Another important transcultural SW connection is that of utilizing ontologies for culture-specific modeling. For instance, [16] explore Australian Indigenous knowledge systems utilizing SW technologies. When utilizing SW technologies for cross-cultural modeling, lexical gaps rapidly become unavoidable. Cross-language information retrieval (CLIR) tasks equally encounter this problem, and have come up with NLM-based methods to bridge such lexical gaps [45]. Embedding spaces have also been analyzed for their ability to represent culture-specific association [42] and their ability for macro-cultural modeling.
Challenges and opportunities Going from modeling individual culture-specific knowledge representations to a transcultural one represents the biggest challenge in this task. Domain ontologies potentially provide a language-independent anchor for transcultural knowledge modeling, joint with NLM-based cross-language information retrieval and analysis approaches. Bringing both together enables transcultural question-answering and potentially automated localization approaches. By transcultural question-answering, this article refers to the phenomenon of not only foreseeing multilingual answers to queries posed to a language-agnostic knowledge base, but the ability to verbalize responses from such a knowledge base in a variety of cultural spheres and states of cultural language evolution. Localization differs from translation in that it focuses on a regional adaptation of contents more than their transformation to a different language or linguistic variation. As such it takes cultural preferences into account.
One powerful aspect that could potentially boost transcultural modeling is a solid cognitive basis, such as multilingual knowledge extraction and modeling related to embodied cognition [32,33]. Such a cognitive framework can be utilized to analyze and model cultural differences on a cognitive-conceptual basis rather than a primarily data-driven approach.
One important aspect of culture are regional linguistic differences. Considering dialects and linguistic variations in machine translation and semantic speech technologies is still an open field of research. Rich variation-aware linguistic representation models in connection to ontologies, that is, extensions of ontology-lexicon and ontology-terminology models, injected into NLMs are promising for this task. Especially in this regard a connection to other modalities, such as speech synthesis approaches, could bring significant benefits.
Multimodal
NLMs promise to boost not only the SW’s multilinguality but are capable of contributing to its multimodality. For the sake of the vision, a broad perspective will be adopted also considering multisensory approaches, from vision to tactile. Such multimodal representations can be utilized in intelligent conversational agents, multimodal information extraction, robotics, among many more.
Semantic speech technologies Speech technologies building on SW resources and NLM systems promise to support important present-day applications, such as assisted living. Google registered a patent on utilizing language models for understanding conversations based on SW resources [1]. A speech interface for question-answering systems has been proposed [43], which, in combination with the above multilingual strategies for NLM-based question answering, could provide broad access to SW resources. Another Google patent for reformulations of speech queries has been registered [18], providing alternative queries if the submitted one returns no results. Most of these systems rely on text transcriptions utilizing automated speech recognition (ASR) systems. The recently published Translatotron [40] omits this step and translates directly from speech to speech in the speaker’s voice.
Challenges and opportunities Intelligent voice interaction is a booming business model as much as vibrant research field. Building on neural-symbolic reasoning, such systems could enable a multilingual, multimodal query-answering system on formally structured resources. Major challenges here are similar to those of transcultural modeling. Local contexts and linguistic variations need to be taken into account to grant broad information access and a high usability. Nevertheless, achieving a speech-empowered SW technologies strongly furthers the endeavor to break down access barriers to represented information.
Semantic video technologies In order to include the visual-manual modality to convey meaning in form of sign language, knowledge needs to be conveyed by video. [19] combine speech synthesis, machine translation, and SW technologies to create a machine-readable knowledge representation for the Turkish sign language. In consequence, NMT can be utilized to translate between natural language and sign language, as has been suggested utilizing the above sign language representation for Turkish [61].
Challenges and opportunities Semantic video technologies still suffer from a lack of broad coverage in terms of language and visual-manual modality. Establishing such a coverage potentially provides all users, including users with special needs or illiterate users, with access to information truly breaking down information barriers. In fact, very few SW sign language approaches can be found. Latest NLM advances can contribute to automating and improving existing approaches. For instance, VideoBERT [65] treats video frames as “video words” utilizing a vector quantization approach and an off-the-shelf speech recognition system to transcribe audio. Resulting representations allow for a seamless transition between text and video. Further adding cross-modal reasoning approaches could boost the interface between video and natural language [27]. A video-enabled NLM-based SW can strongly support barrier-free online communication, especially if transformations between different modalities are provided.
Semantic sensor web A Semantic Sensor Web in the Internet of Things vision [63] is probably the closest corpus of related approaches. Building on SW enablement or Linked Data standards, sensor data are linked and annotated. Thus, SW query technologies can be applied to sensor data [12]. Its link to language models comes from the necessity of connecting sensor data to human communication means, the human-robot interface, such as natural language understanding of robot instructions which has been shown to benefit from ontology-natural language groundings [57].
Challenges and opportunities Linking sensory data and language can boost human-robot interactions, as sensory information, their semantic representation, and neural-symbolic reasoning could be highly beneficial to the task of explainable robotics and AI [44]. Recent advances in terms of cross-modal predictions [46] connected to SW technologies can potentially boost cognitive AI systems.
One major challenge of connecting NLMs with semantic sensor data is that of magnifying biases. NLMs have been shown in multiple studies to easily suffer from biases, which unfortunately is also true for sensor data, thereby bearing the risk of multiplying and intensifying biases across modalities. For example, sensors in self-driving cars have been shown to detect lighter skin tones better than darker ones [70].
Multisensor semantic data might also relate to neural patterns and the ability to automatically decode them. A recent approach managed to reconstruct a word from neural activation patterns from auditory inputs [2]. Thus, one future scenario of this neural-symbolic vision is the application of SW and NLM models in the brain-computer interface.
Conclusion
Building on selected existing approaches, this article laid out a vision of a multilingual, transcultural, and multimodal Semantic Web (SW) utilizing Neural Language Models (NLMs). Joining forces of SW and NLMs can boost a wide variety of tasks, such as extracting data from different languages and channels, formally interlinking them, and verbalizing logical answers in natural language or sign language in response to multimodal queries.
The biggest challenge and at the same time opportunity is a seamless connection of and transition between multilingual, transcultural, and multimodal knowledge representations. Individual bridges across this big gap have been built, such as transitioning from natural language text to video and sign language. However, integrating transcultural representations requires further investigations. In fact, cultural modeling and transcultural alignments might be the pillar that requires most further construction work to provide a footing for this vision, which targets a truly unbiased and fully accessible Semantic Web.
One central opportunity of this vision is the fact that seamlessly accessibly and dynamic SW technologies can foster not only cultural language evolution but simultaneously knowledge evolution. One key to this vision is thus an easily accessible representation mechanism – one that can easily be adopted by other communities, such as machine learning – that strongly embraces diversity and boosts diversity-aware AI, which in the end will foster robustness.
While the focus here was on benefits of SW technologies,some advantages that NLMs can obtain by joining forces with SW technologies have been braced upon. For instance, injecting structured and formal knowledge into NLM architectures have shown improvements for machine translation and textual entailment task. Further investigations in the ability of SW technologies to support NLM tasks and applications would be interesting.
