OLiA – Ontologies of Linguistic Annotation

Abstract

This paper describes the Ontologies of Linguistic Annotation (OLiA) as one of the data sets currently available as part of Linguistic Linked Open Data (LLOD) cloud. Within the LLOD cloud, the OLiA ontologies serve as a reference hub for annotation terminology for linguistic phenomena on a great band-width of languages, they have been used to facilitate interoperability and information integration of linguistic annotations in corpora, NLP pipelines, and lexical-semantic resources and mediate their linking with multiple community-maintained terminology repositories.

Keywords

Linguistic terminology grammatical categories annotation schemes corpora NLP tag sets interoperability

1 Background

The heterogeneity of linguistic annotations has been recognized as a key problem limiting the interoperability and reusability of NLP tools and linguistic data collections. Several repositories of linguistic annotation terminology have been developed to facilitate annotation interoperability by means of a joint level of representation, or an ‘interlingua’, the most prominent probably being the General Ontology of Linguistic Description [12, GOLD] and the ISO TC37/SC4 Data Category Registry [20, ISOcat].

Still, these repositories are developed by different communities, and are thus not always compatible with each other, neither with respect to definitions nor technologies (e.g., there is no commonly agreed formalism to link linguistic annotations to terminology repositories).

The Ontologies of Linguistic Annotation (OLiA) have been developed to facilitate the development of applications that take benefit of a well-defined terminological backbone even before the GOLD and ISOcat repositories have converged into a generally accepted repository of reference terminology: They introduce an intermediate level of representation between ISOcat, GOLD and other repositories of linguistic reference terminology and are interconnected with these resources, and they provide not only means to formalize reference categories, but also annotation schemes, and the way that these are linked with reference categories.

2 Architecture

The Ontologies of Linguistic Annotations [3] represent a modular architecture of OWL2/DL ontologies that formalize the mapping between annotations, a ‘Reference Model’ and existing terminology repositories (‘External Reference Models’).

The OLiA ontologies were developed as part of an infrastructure for the sustainable maintenance of linguistic resources [28], and their primary fields of application include the formalization of annotation schemes and concept-based querying over heterogeneously annotated corpora [10,24].

In the OLiA architecture, four different types of ontologies are distinguished (cf. Fig. 1 for an example):

The OLiA Reference Model specifies the common terminology that different annotation schemes can refer to. It is derived from existing repositories of annotation terminology and extended in accordance with the annotation schemes that it was applied to.

Multiple OLiA Annotation Models formalize annotation schemes and tagsets. Annotation Models are based on the original documentation, so that they provide an interpretation-independent representation of the annotation scheme.

For every Annotation Model, a Linking Model defines ⊑ relationships between concepts/properties in the respective Annotation Model and the Reference Model. Linking Models are interpretations of Annotation Model concepts and properties in terms of the Reference Model.

Existing terminology repositories can be integrated as External Reference Models, if they are represented in OWL2/DL. Then, Linking Models specify ⊑ relationships between Reference Model concepts and External Reference Model concepts.

The OLiA Reference Model specifies classes for linguistic categories (e.g., olia:Determiner) and grammatical features (e.g., olia:Accusative), as well as properties that define relations between these (e.g., olia:hasCase).

Conceptually, Annotation Models differ from the Reference Model in that they include not only concepts and properties, but also individuals: Individuals represent concrete tags, while classes represent abstract concepts similar to those of the Reference Model. Figure 1 gives an example for the individual PDAT from the STTS Annotation Model, the corresponding STTS concepts, and their linking with Reference Model concepts. Taken together, these allow to interpret the individual (and the part-of-speech tag it represents) as an olia:Determiner, etc.

3 Data set description

The OLiA ontologies are available from http://purl.org/olia under a Creative Commons Attribution license (CC-BY).

The OLiA ontologies cover different grammatical phenomena, including inflectional morphology, word classes, phrase and edge labels of different syntax annotations, as well as prototypes for discourse annotations (coreference, discourse relations, discourse structure and information structure). Annotations for lexical semantics are only covered to the extent that they are found in syntactic and morphosyntactic annotation schemes. Other aspects of lexical semantics are beyond the scope of OLiA: Existing reference resources for lexical semantics available in RDF include WordNet, VerbNet and FrameNet, their linking with OLiA is recommended as part of the lexicon model lemon [21], and has been implemented, for example, in lemonUby [11].

Since their first presentation [3], the OLiA ontologies have been continuously extended. At the time of writing, the OLiA Reference Model distinguishes 280 MorphosyntacticCategorys (word classes), 68 SyntacticCategorys (phrase labels), 18 MorphologicalCategorys (morphemes), 7 MorphologicalProcesses, and 405 different values for 18 MorphosyntacticFeatures, 5 SyntacticFeatures and 6 SemanticFeatures (for glosses, part-of-speech annotation and for edge labels in syntax annotation).

Fig. 1.

Interpreting annotations in terms of the OLiA Reference Model.

As for morphological, morphosyntactic and syntactic annotations, the OLiA ontologies include 32 Annotation Models for about 70 different languages, including several multi-lingual annotation schemes, e.g., EAGLES [3] for 11 Western European languages, and MULTEXT/East [8] for 15 (mostly) Eastern European languages. As for non-(Indo-)European languages, the OLiA ontologies include morphosyntactic annotation schemes for languages of the Indian subcontinent, for Arabic, Basque, Chinese, Estonian, Finnish, Hausa, Hungarian and Turkish, as well as multi-lingual schemes applied to languages of Africa, the Americas, the Pacific and Australia. The OLiA ontologies also cover historical varieties, including Old High German, Old Norse and Old Tibetan. Additionally, 7 Annotation Models for different resources with discourse annotations have been developed.

External reference models currently linked to the OLiA Reference Model include GOLD [3], the OntoTag ontologies [2], an ontological remodeling of ISOcat [4], and the Typological Database System (TDS) ontologies [26]. From these, only the TDS ontologies are currently available under an open (CC-BY) license,1

http://languagelink.let.uu.nl/tds/ontology/LinguisticOntology.owl

but these take a focus on typological data bases rather than NLP and annotation interoperability. GOLD and ISOcat should be available under an open license,2

Actually, the developers are sympathetic to the idea of releasing this data under an open license, Helen Aristar-Dry (for GOLD) and Menzo Windhouwer (for ISOcat), pers. communication, June 2012.

but can currently not be redistributed because of an uncertain licensing situation (no explicit license information). The OntoTag ontologies are available to the author but have not been publicly released.

In this context, the function of the OLiA Reference Model is not to provide a novel and independent view on linguistic terminology, but rather to serve as a stable intermediate representation between (ontological models of) annotation schemes and these terminology repositories. This allows any concept that can be expressed in terms of the OLiA Reference Model also to be interpreted in the context of ISOcat, GOLD, OntoTag or TDS. OLiA serves to aggregate annotation terminology as found in linguistic resources and provides a middle ground between these and the External Reference Models linked to it. We would like to emphasize that OLiA is not meant as a substitute for any of these repositories, but rather, that it serves to facilitate their further harmonization and interoperability, as they are maintained by different communities and remain for the foreseeable future in a continuous state of enrichment and specialization. Initial efforts towards their gradual convergence include the support of linking mechanisms to external knowledge bases in GOLD and ISOcat. Within a GOLD context, for example, OLiA may be referred to as a Community-of-Practice Extension for the NLP community. From the perspective of ISOcat, it may be seen as an ontological view on annotation terminology among the otherwise unstructured data categories. Along with ontologies for other ISOcat profiles, e.g., metadata [35], OLiA may provide a seed for populating RELcat [29], an ongoing effort to provide structured views on ISOcat data.

As compared to a direct linking between annotation models and these terminology repositories, the modular structure limits the number of linkings that need to be defined (if a new Annotation Model is linked to the Reference Model, it inherits its linking with ISOcat, GOLD, OntoTag and TDS), and also, it provides stability (GOLD and ISOcat are developed in community processes with occasional revisions), a clear and non-redundant taxonomical organization (similar to GOLD, TDS and OntoTag, but very different from the semi-structured ISOcat) and establishes interoperability between GOLD and ISOcat (that – despite ongoing harmonization efforts [19] – are maintained by different communities and developed independently). Using the OLiA Reference Model, it is thus possible to develop applications that are interoperable in terms of GOLD and ISOcat even though both are still under development and both differ in their conceptualizations. Such applications are briefly described in the following section.

4 Application

Initially, the OLiA ontologies have been intended to serve a documentation function, i.e., as a formal means to specify the semantics of annotation schemes [28]. From the ontologies, dynamic HTML can be generated,3

³
http://code.google.com/p/co-ode-owl-plugins/wiki/OWLDoc

and tags in the annotation can be represented as hyperlinks pointing to the corresponding definition [10].

In earlier corpus query systems, e.g., ANNIS [9], OLiA was used to formulate interoperable corpus queries: Assume we wanted to retrieve noun phrases from German newspaper corpora; instead of querying for cat="NX" on TüBa-D/Z [33] or cat="NP" on NEGRA [30], a query for cat in {olia:NounPhrase} was expanded into a disjunction of possible tags and formatted according to the query language under consideration. Only if corpora are represented as Linked Data (which is exceptional at the moment), they can be directly linked with OLiA Annotation Models, and queried without a query preprocessor [6]. When dealing with non-RDF corpora, ontology-based query rewriting using OLiA can be applied as sketched above, it was implemented, for example, in a generic query framework for linguistic corpora in heterogeneous XML-formats [24].

In a similar vein, OLiA can be employed in NLP pipeline systems for tagset-independent, interoperable information processing [2]. In this function, OLiA is part of the NLP Interchange Format (NIF) specification4

⁴

http://persistence.uni-leipzig.org/nlp2rdf/

to formalize linguistic annotations in a conceptually interoperable way. Using OLiA, the NLP2RDF platform developed on this basis unifies various NLP result outputs and maps them into RDF, as currently implemented for English [17] and Korean [16].

Figure 1 illustrates how annotations can be mapped onto Reference Model concepts for the German phrase Diese nicht neue Erkenntnis ‘this well-known (lit. not new) insight’ from the Potsdam Commentary Corpus [31, file 4794]: Given the information that its part-of-speech annotations follow the STTS scheme [27], we may consult the corresponding Annotation Model,5

⁵

http://purl.org/olia/stts.owl

and find that the tag PDAT matches the string value of the propertyhasTag of the individual stts:PDAT. The associated class stts:AttributiveDemonstrativePronoun is a subconcept of olia:DemonstrativeDeterminer.6

⁶

http://purl.org/olia/stts-link.rdf

The word diese ‘this’ from the example can thus be described in terms of the OLiA Reference Model as olia:DemonstrativeDeterminer, etc.

These ontology-based descriptions are comparable across different corpora and/or NLP tools, across different languages, and even across different types of language resources: Recently, the OLiA ontologies have also been applied to represent grammatical specifications of machine-readable dictionaries, that are thus interoperable with OLiA-linked corpora [11,21]. Moreover, through the linking with External Reference Models, OLiA-linked resources are also interoperable with resources directly grounded in GOLD, ISOcat, etc.

Using Semantic Web formalisms to represent corpora and annotations also provides us with the possibility to develop novel, ontology-based NLP algorithms. One application are ensemble combination architectures, where different NLP modules (say, part-of-speech taggers) are applied in parallel, so that they produce annotations for one particular phenomenon, and that these annotations are then integrated. Using OLiA Reference Model specifications to integrate the analyses of multiple NLP tools for German, [5] showed that a simple majority-based combination increased both the robustness and the level of detail of morphosyntactic and morphological analyses: Despite imposing rigid ontological consistency constraints, abstraction from tool-specific representations and integration of different annotations on this basis resulted in an increase of recall. Similar results have been obtained with the OntoTag ontologies for Spanish [23].

We see possible applications of this technology in situations where multiple, domain-specific NLP tools are available. In a monolingual setting, this may be the case where rule-based morphologies [34] or parsers [32] are to be combined with robust statistical part-of-speech taggers, whose coarse-grained tagsets cannot be trivially mapped onto the detailed annotations provided by deep, rule-based systems. Here, OLiA representations leverage tools with different granularity. Currently, we experiment with multilingual annotation projection, where annotations are projected from multiple source languages onto parallel (translated) text in a less-resourced language. Using conceptual representations instead of complex strings as the basis of projection, different types of information can be drawn from different sources, e.g., aspect from Russian verbal morphology, and definiteness from English morphosyntax. For a (hypothetical) language that exhibits both features, information present in both sets of projected annotations may be adopted, merged and, most importantly, pruned using OLiA specifications.

5 Discussion

This paper summarized the development of the OLiA ontologies since 2006, their current status, and a number of applications that have been developed on this basis.

The fundamental idea of the OLiA architecture is that annotation schemes are linked to community-maintained terminology repositories through an intermediate ‘Reference Model’, thereby minimizing the number of mappings necessary to establish interoperability of one annotation scheme with multiple terminology repositories. Further, annotation schemes and their linking to the Reference Model are formalized as separate OWL2/DL ontologies, so that interpretation-independent conceptualization (annotation documentation) and its interpretation in terms of the Reference Model (linking) are properly distinguished.

The OLiA ontologies differ from related approaches in that they take a focus on modeling annotation schemes and their linking with reference categories rather than merely providing reference categories. The differentiation of Annotation Models, the OLiA Reference Model and External Reference Models (community-maintained terminology repositories) represents increasing levels of abstraction, and, possibly, loss of information. However, no information about the original annotation is lost, and tools may chose the appropriate level of abstraction. Unlike a direct mapping approach as apparently favored by GOLD and ISOcat, OLiA allows to recover information about sources of mismatches between Reference Model concepts and Annotation Model concepts, and its declarative linking supports inspection and refinement using standard RDF/OWL tools.

The relationship between annotations and reference concept is not only represented in a transparent way, but also, conceptual mismatches can be represented. When confronted with conceptual overlap/fusion or ambiguity, many tagsets for, say, part-of-speech annotation introduce hybrid categories (e.g., IN for preposition or subordinating conjunction, or TO for all functions of English to [25]) or introduce tagset-specific notations, e.g., | for ambiguities [25], or + for cliticization/fusion [14]. As opposed to such ad-hoc solutions that may or may not be transparent to tagset users, OWL2/DL provides a W3C-standardized vocabulary to formalize these relationships, that also extends beyond individual tagsets: Ambiguity can be modeled as disjunction (⊔), conceptual overlap/fusion as conjunction (⊓).

Moreover, negation (¬) is available in OWL2/DL. This is of particular importance for the linking between External Reference Models and the OLiA Reference Model. For example, an olia:ProQuantifier (pronominal quantifier, can substitute for an independent noun phrase, e.g., someone) can be defined as subclass of gold:Quantifier. According to its definition, however, gold:Quantifier pertains to determiners only, so, a more appropriate superclass would be gold:Quantifier $⊓ \neg$ gold:Determiner.

The physical separation of Linking Models from Annotation Models and Reference Model introduces a clear distinction between externally provided information and the ontology engineer’s interpretation. Annotation Models formalize annotation documentation, and the Reference Model is based on a generalization of a broad band-width of resources. However, there may be different terminological traditions involved, so that apparently similar concepts found in Reference Model and Annotation Model are in fact unrelated. If nevertheless an incorrect identification takes place, the linking can be inspected by existing ontology browsers, and corrected independently from the interpretation-invariant Annotation Model and Reference Model. Furthermore, multiple alternative linkings between an Annotation Model and the Reference Model can be implemented, e.g., to accommodate for systematic tagger errors (i.e., more extensive usage of ⊔), or for multiple dialects of the same tagset (e.g., the STTS tagset distinguishes indefinite attributive pronouns in indefinite noun phrases [PIAT] and in definite noun phrases [PIDAT], but in the TüBa-D/Z corpus, PIAT covers both uses).

In ISOcat, the problem of conflicting interpretations of data categories is currently not addressed, and the definitions provided are not always sufficient to distinguish classes, e.g., the category definite/DC-2004 is defined as ‘value referring to the capacity of identification of an entity’. The concept is (at least partially) grounded in MULTEXT/East [15], which, however, conflates different uses of ‘definite’: (1) postfixed Determiner in Romanian, Bulgarian and Persian nouns or adjectives, (2) difference between ‘full’ and ‘reduced’ adjectives in Slavic (diachronically, full forms reflect a clitic pronominal), (3) a pattern of quantifier agreement in Slavic, and (4) the so-called ‘definite conjunction’ of Hungarian verbs. Even though the generic definition captures most of these different meanings, they remain incompatible with each other. However, without modeling relations between different language-specific annotation schemes and a data category registry from a global perspective, it is possible that such ill-defined data categories and/or links remain undetected.7

⁷

Providing a top-down perspective does not automatically disclose such inconsistencies, but the resulting dialog between tagset provider and ontology developer may facilitate their detection, as in the example given above.

Within MULTEXT/East, for example, only the ontological modeling of language-specific annotation schemes and the common morphosyntactic specifications led to the proper differentiation between these different conceptions of ‘definite’ [8]. The OLiA Reference Model provides such a fully developed taxonomy of linguistic categories. Recent activities to augment ISOcat with a relation category registry [29] may eventually lead to a comparable global perspective, so that the problem of conflicting interpretations of data categories may become more obvious to the ISOcat community, but these are still on-going developments.

In comparison to GOLD, OLiA is more focused on NLP and corpus interoperability, whereas GOLD originates from the language documentation community. Therefore, a number of data categories commonly assumed in NLP were not originally represented in GOLD. For example, gold:CommonNoun was added only following a request by the first author. While the GOLD community process will eventually lead to a compensation of such coverage issues, a more fundamental problem is that the views of academic linguists and NLP engineers may deviate with respect to the overarching taxonomy of concepts. GOLD, for example, seems to conflate both semantic roles (‘case’ in the sense of [13], e.g., gold:BenefactiveCase) and syntactic roles under gold:CaseProperty. Therefore, OLiA adopts a relatively agnostic view on the taxonomical order of concepts. While the taxonomy is modeled in a specific way (mostly following established annotation schemes), it is not assumed that this way of modeling is the only possibility. In fact, alternative taxonomies can be formulated as External Reference Models, and OWL2/DL allows one to formulate specific conditions for the linking, including the use of negation and disjunction. This complements the concept of Community-of-Practice Extensions in GOLD, that presuppose GOLD as an upper model providing the top-level categorizations for dependent ontologies, whereas OLiA remains agnostic about which External Reference Model to be adopted.

Conceptually, the OLiA ontologies are closer related to the OntoTag ontologies [1], that were also applied to develop NLP applications on the basis of ontological representations of linguistic annotations [23]. One important difference is that the OntoTag ontologies are considering only Iberian Romance languages (in particular Spanish), that they are partially designed with a top-down perspective (whereas the development of the OLiA Reference Model is guided by the annotation schemes it is applied to) and are thus richer in consistency constraints (that are, however, often language-specific), and that the OntoTag ontologies are not publicly available at the moment. Within the OLiA architecture, the morphosyntactic layer of the OntoTag ontologies is integrated as an External Reference Model [2].

The OLiA ontologies may play an important role in NLP, corpus and annotation interoperability in that they relate these activities to initiatives in different linguistic communities to establish reference repositories for linguistic annotation terminology, e.g., recent developments towards the creation of a Linguistic Linked Open Data (LLOD) cloud.8

⁸

http://linguistics.okfn.org/llod

In this context, the OLiA ontologies are used to provide linguistic reference terminology for lexical-semantic resources such as lemon [22] and Uby [11] as well as for linguistic corpora such as the Manually Annotated Sub-Corpus of the Open American National Corpus [18].9

⁹

http://datahub.io/dataset/masc

Acknowledgments

We would like to thank Menzo Windhouwer, Steve Cassidy and four anonymous reviewers for valuable feedback and comments to this paper and its immediate predecessor [7]; beyond this, we thank OLiA users, contributors and funders. OLiA has originally been developed at the Collaborative Research Center (SFB) 441 “Linguistic Data Structures” (University of Tübingen, Germany) in the context of the project “Sustainability of Linguistic Resources” in cooperation with SFB 632 “Information Structure” (University of Potsdam, Humboldt-University Berlin, Germany) and SFB 538 “Multilingualism” (University of Hamburg, Germany) from 2006 to 2008. From 2007 to 2011, is has been maintained and further developed at SFB 632 in the context of the project “Linguistic Data Base”. In 2012, the first author continued his research on OLiA in the context of PostDoc fellowship at the Information Sciences Institute of the University of Southern California funded by the German Academic Exchange Service (DAAD). The work of the second author was conducted in the context of the LOEWE cluster “Digital Humanities” at the Goethe-University Frankfurt (2011–2014).

In parts, this data set description is based on [7], shortened, updated and thoroughly revised.

References

[1]

Aguado de Cea,

Gomez-Perez,

Alvarez de Mon and

Pareja-Lora, OntoTag’s linguistic ontologies: Improving semantic web annotations for a better language understanding in machines, in: Proc. of the International Conference on Information Technology: Coding and Computing (ITCC’04), Vol. 2, IEEE Computer Society, Washington, Washington, DC, 2004, pp. 124–128.

[2]

Buyko,

Chiarcos and

Pareja-Lora, Ontology-based interface specifications for a NLP pipeline architecture, in: Proc. of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco,

Calzolari,

Choukri,

Maegaard,

Mariani,

Odijk,

Piperidis and

Tapias, eds, European Language Resources Association (ELRA), 2008.

[3]

Chiarcos, An ontology of linguistic annotations, GLDV-Journal for Language Technology and Computational Linguistics 23(1) (2008), 1–16, http://www.jlcl.org/2008_Heft1/Chiarcos.pdf .

[4]

Chiarcos, Grounding an ontology of linguistic annotations in the data category registry, in: Proc. of the LREC 2010 Workshop Language Resource and Language Technology Standards (LT&LTS). State of the Art, Emerging Needs, and Future Developments, Valetta, Malta,

Budin,

L.R.T.

Declerck and

Wittenburg, eds, 2010, pp. 37–40.

[5]

Chiarcos, Towards robust multi-tool tagging. An OWL/DL-based approach, in: Proc. of 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden,

Hajic,

Carberry and

Clark, eds, Association for Computational Linguistics, 2010, pp. 659–670.

[6]

Chiarcos, Interoperability of corpora and annotations, in: Linked Data in Linguistics,

Chiarcos,

Nordhoff and

Hellmann, eds, Springer, Heidelberg, 2012, pp. 161–179.

[7]

Chiarcos, Ontologies of linguistic annotation: Survey and perspectives, in: Proc. of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey,

Calzolari,

Choukri,

Declerck,

M.U.

Doğan,

Maegaard,

Mariani,

Moreno,

Odijk and

Piperidis, eds, European Language Resources Association (ELRA), 2012, pp. 303–310.

[8]

Chiarcos and

Erjavec, OWL/DL formalization of the MULTEXT-East morphosyntactic specifications, in: Proc. 5th Linguistic Annotation Workshop (LAW 2011), Portland, Oregon,

Ide,

Meyers,

Pradhan and

Tomanek, eds, Association for Computational Linguistics, 2011, pp. 11–20.

[9]

Chiarcos and

Götze, A linguistic database with ontology-sensitive corpus querying, in: System demonstration at Datenstrukturen für linguistische Ressourcen und ihre Anwendungen, Frühjahrstagung der Gesellschaft für Linguistische Datenverarbeitung (GLDV 2007), Tübingen, Germany, 2007.

10.

[10]

Chiarcos,

Dipper,

Götze et al., A flexible framework for integrating annotations from different tools and tag sets, TAL (Traitement Automatique des Langues) 49(2) (2008), 217–246.

11.

[11]

Eckle-Kohler,

McCrae and

Chiarcos, lemonUby – A large, interlinked, syntactically-rich resource for ontologies, Semantic Web Journal (2013), Special Issue on Multilingual Linked Open Data.

12.

[12]

Farrar and

D.T.

Langendoen, An OWL-DL implementation of GOLD: An ontology for the Semantic Web, in: Linguistic Modeling of Information and Markup Languages: Contributions to Language Technology,

A.W.

Witt and

Metzing, eds, Springer, Dordrecht, Germany, 2010.

13.

[13]

C.J.

Fillmore, The case for case, in: Universals in Linguistic Theory,

Bach and

Harms, eds, Holt, Rinehart, and Winston, New York, 1968, pp. 1–88.

14.

[14]

W.N.

Francis and

Kucera, Brown corpus manual, Tech. rep., Department of Linguistics, Brown University, Providence, Rhode Island, 1979, http://icame.uib.no/brown/bcm.html.

15.

[15]

Francopoulo,

Declerck,

Sornlertlamvanich et al., Data Category Registry: Morpho-syntactic and syntactic profiles, in: Proc. LREC-2008 Workshop on Uses and Usage of Language Resource-Related Standards, Marrakech, Morocco,

Witt,

Sasaki,

Teich,

Calzolari and

Wittenburg, eds, 2008, pp. 31–40.

16.

[16]

Hahm,

Lim,

Park,

Yoon and

K.S.

Choi, Korean NLP2RDF resources, in: Proc. 10th Workshop on Asian Language Resources (ALR 2012), Mumbai, India,

Weerasinghe,

Hussain,

Sornlertlamvanich and

R.E.O.

Roxas, eds, 2012, pp. 1–10, The COLING 2012 Organizing Committee.

17.

[17]

Hellmann,

Lehmann,

Auer and

Brümmer, Integrating NLP using Linked Data, in: Proc. 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia,

Alani,

Kagal,

Fokoue,

Groth,

Biemann,

J.X.

Parreira,

Aroyo,

Noy,

Welty and

Janowicz, eds, Lecture Notes in Computer Science, Vol. 8219, Springer, 2013, pp. 98–113.

18.

[18]

Ide,

Baker,

Fellbaum,

Fillmore and

Passonneau, Masc: The manually annotated sub-corpus of american english, in: Proc. of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco,

Calzolari,

Choukri,

Maegaard,

Mariani,

Odijk,

Piperidis and

Tapias, eds, European Language Resources Association (ELRA), 2008, pp. 68–73, http://www.lrec-conf.org/proceedings/lrec2008/ .

19.

[19]

Kemps-Snijders, RELISH: Rendering endangered languages lexicons interoperable through standards harmonisation, in: 7th SaLTMiL Workshop on Creation and Use of Basic Lexical Resources for Less-resourced Languages, Valetta, Malta, 2010.

20.

[20]

Kemps-Snijders,

Windhouwer,

Wittenburg and

Wright, ISOcat: Remodelling metadata for language resources, International Journal of Metadata, Semantics and Ontologies 4(4) (2009), 261–276.

21.

[21]

McCrae,

Spohr and

Cimiano, Linking lexical resources and ontologies on the semantic web with lemon, in: Proc. 8th Extended Semantic Web Conference (ESWC 2011), Heraklion, Greece, 2011, pp. 245–259.

22.

[22]

McCrae,

Montiel-Ponsoda and

Cimiano, Integrating WordNet and Wiktionary with lemon, in: Linked Data in Linguistics,

Chiarcos,

Nordhoff and

Hellmann, eds, Springer, Heidelberg, 2012, pp. 25–34.

23.

[23]

Pareja-Lora and

Aguado de Cea, Ontology-based interoperation of linguistic tools for an improved lemma annotation in spanish, in: Proc. of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta,

Calzolari,

Choukri,

Maegaard,

Mariani,

Odijk,

Piperidis,

Rosner and

Tapias, eds, European Language Resources Association (ELRA), 2010.

24.

[24]

Rehm,

Chiarcos ,

Eckart,

Dellert , Ontology-based xquery’ing of xml-encoded language resources on multiple annotation layers, in: Proc. of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco,

Calzolari,

Choukri,

Maegaard,

Mariani,

Odijk,

Piperidis and

Tapias, eds, European Language Resources Association (ELRA), 2008, http://www.lrec-conf.org/proceedings/lrec2008/ .

25.

[25]

Santorini, Part-of-speech tagging guidelines for the Penn Treebank Project, Tech. rep., Department of Computer and Information Science, University of Pennsylvania 1990.

26.

[26]

Saulwick,

Windhouwer,

Dimitriadis and

Goedemans, Distributed tasking in ontology mediated integration of typological databases for linguistic research, in: Advanced Information Systems Engineering, Proc. of the 17th International Conference, CAiSE 2005, Porto, Portugal, June 13–17, 2005,

Pastor and

J.F.

e Cunha, eds, Springer, 2005, pp. 303–317.

27.

[27]

Schiller,

Teufel,

Stöckert and

Thielen, Guidelines für das Tagging deutscher Textcorpora mit STTS, Tech. rep., Universities of Stuttgart and Tübingen, Germany 1999.

28.

[28]

Schmidt,

Chiarcos,

Lehmberg,

Rehm,

Witt and

Hinrichs, Avoiding data graveyards: From heterogeneous data collected in multiple research projects to sustainable linguistic resources, in: Proc. E-MELD Workshop on Digital Language Documentation, East Lansing, Michigan, 2006.

29.

[29]

Schuurman and

Windhouwer, Explicit semantics for enriched documents. What do ISOcat, RELcat and SCHEMAcat have to offer? in: Proc. 2nd Supporting Digital Humanities Conference (SDH 2011), Copenhagen, Denmark, 2011.

30.

[30]

Skut,

Brants,

Krenn and

Uszkoreit, A linguistically interpreted corpus of German newspaper text, in: Proc. ESSLLI Workshop on Recent Advances in Corpus Annotation, Saarbrücken, Germany, 1998.

31.

[31]

Stede, The Potsdam Commentary Corpus, in: Proc. ACL-2004 Workshop on Discourse Annotation, Barcelona, Spain,

Webber and

D.K.

Byron, eds, ACL, 2004, pp. 96–102.

32.

[32]

Tapanainen and

Järvinen, A nonprojective dependency parser, in: Proc. 5th Conference on Applied Natural Language Processing (ANLP 1997), Washington, DC, 1997, pp. 64–71.

33.

[33]

Telljohann,

E.W.

Hinrichs and

Kübler, Stylebook for the Tübingen treebank of written German (TüBa-D/Z), Tech. rep., Seminar für Sprachwissenschaft, Universität Tübingen, Tübingen, Germany, 2003.

34.

[34]

Zielinski and

Simon, Morphisto: An open-source morphological analyzer for German, in: Proc. of the Conference on Finite State Methods in Natural Language Processing (FSMNLP), Ispra, Italy,

Piskorski,

Watson and

Yli-Jyrä, eds, IOS Press, 2008.

35.

[35]

Zinn,

Hoppermann and

Trippel, The ISOcat registry reloaded, in: Proc. 9th Extended Semantic Web Conference (ESWC 2012), Heraklion, Greece,

Simperl,

Cimiano,

Polleres,

Corcho and

Presutti, eds, Springer, 2012, pp. 27–31.

OLiA – Ontologies of Linguistic Annotation

Abstract

Keywords

1 Background

2 Architecture

3 Data set description

3 http://code.google.com/p/co-ode-owl-plugins/wiki/OWLDoc

Acknowledgments

References

³
http://code.google.com/p/co-ode-owl-plugins/wiki/OWLDoc