Abstract
This paper presents the new developments of CRMtex, an ontological model based on CIDOC CRM, created to describe ancient texts and other semiotic features appearing on inscriptions, papyri, manuscripts and other similar supports. The model is also designed to describe in a formal way the phenomena related to the production, use, conservation, study and interpretation of textual entities. CRMtex was originally intended to detect the close relationship linking ancient texts with the physical objects on which they are supported, the tools and writing systems used for their production, and the various scientific investigations and readings carried out on the text by modern scholars. It eventually evolved to provide researchers with the fundamental concepts for the correct and complete rendering of textual objects, the events representing their history and the cultural and social environments in and for which they were created. The full compatibility of CRMtex with the CIDOC CRM ontology and its extensions ensures persistent interoperability of data encoded by means of its entities with other semantic information produced in cultural heritage and digital humanities. The new entities presented in this paper deal more closely with textual and intertextual structures and try to deepen the close relationships existing between fragments of text or sequences of signs and the underlying meaning they were originally intended to convey.
Introduction
We have been witnessing of late an intense debate that is animating the world of epigraphists and papyrologists about the need to find or possibly develop conceptual models able to express the complex entities of their domains in a semantically rich encoding, and to establish interoperability of their data with those generated, for example, in the areas of archaeological, historical and linguistic studies. The gigantic integration effort established by Papyri.info [36] and Trismegistos [42] and the various attempts made by projects such as EAGLE [15] to develop a semantic model in the field of epigraphy, testify to a constant and growing interest in the use of advanced and efficient conceptual tools for the generation of standardised, integrated and interoperable information in these disciplines. In the epigraphic world, another important initiative, Epigraphy.info [19], aimed at establishing a collaborative environment for digital epigraphy, is trying to raise awareness in the community of epigraphists about the importance of publishing information in a uniform format and, possibly, in a Semantic Web fashion. This initiative has the merit of having brought all the major players in the epigraphic world around a table and having directed their efforts towards the development of ecosystems in which epigraphic data coming from different sources can be easily retrieved and analysed.
In the same perspective, many European and international initiatives are also focussing their attention on information concerning ancient texts and on the interoperability challenges in which they are involved. The recently completed PARTHENOS project [37] has placed interdisciplinarity at the centre of its activities by designing a system in which historical, archaeological and linguistic data coexist in a single digital environment. ARIADNEplus [1,33], an initiative recently started as a continuation of the first, successful, ARIADNE project, is also attempting to integrate archaeological information and data from other disciplines, with particular regard to the study of archaeological artefacts bearing inscriptions, including amphorae, coins and other similar objects, with the clear intent of creating an interoperable archive based on FAIR principles [21] and international standards. ARIADNEplus is also looking for an ontology or application profile capable of relating textual and archaeological data in a consistent manner. This is one of the gaps that our work aims to fill.
In two of our previous publications [22,24] we tried to give an account of what had been done in the field of epigraphy and what tools had been used to describe, in semantic format, textual entities created in antiquity. In these publications, we also laid the foundations for the definition of a semantic model (CRMepi, later expanded to become CRMtex) centred on the semantic definition of the ancient text and the description of its multifaceted aspects. In the present paper, after a quick review of similar recent initiatives, we present the latest developments of the CRMtex model and the conceptual considerations that underlie its evolution.
Looking for new semantic tools
Ontologies and application profiles: Work in progress
Despite the great interest of many scientific communities for the tools proposed by the world of the Semantic Web, it is interesting to note that some wide-ranging initiatives such as EPIDAT [29], the purpose of which is to publish epigraphic data in LOD format, complain of the absence of an ontology able to confer semantic value to epigraphic information. However, an increasing number of activities conducted by groups interested in the subject of ontologies for ancient texts has flourished in recent years.
The ontological approach is also pursued by some major players in the field of epigraphy with various degrees of success: the Epigraphic Database Heidelberg [18], for instance, has released a very basic ontology for the encoding of its vast digital repertoire in Linked Open Data format; however, its model still seems less suitable as a tool for a deep integration.
The Economics and Political Network project (EPNet) [3] is an interesting initiative building an ontological model based on CIDOC CRM to deal with the events and objects connected with the distribution of food in the Roman world. The EPNet ontology looks promising and has already been investigated by the ARIADNEplus project as a prospective part of the application profile for epigraphic data. With the same intent, the Epigraphic Ontology Working Group (EpOnt) [20] is trying to establish an application profile based on concordance of ontologies for recording epigraphic editions. The initiative is interesting and we believe it will produce results very soon.
It should be noted that all these initiatives aim at developing very specific tools for solving the problems concerning the disciplines for which they are conceived. None of them aims to give a common conceptual basis or to look for points of contact, which also exist between these various disciplines, keeping in mind the objective of interoperability.
CRMtex was designed as an ontological model since no existing model is able to investigate thoroughly the textual entities from antiquity, their intrinsic nature as primary sources of knowledge and their link with the archaeological, artistic and historical spheres that make them so precious for the comprehension of the ancient world. In fact, the existing models have been unable to describe the various nuances of a text, from the physical aspect, a set of features created with particular techniques, materials and tools, to the semantic and conceptual aspects, whereby it bears a message that, by means of these same features, is transmitted and disseminated through time and space.
CRMtex was created precisely in order to respond to such needs and to provide tools for modelling textual entities appearing in different contexts by means of standard tools. These justify the use of a tool that certainly requires a considerable investment in expertise, as well as the whole CIDOC CRM ecosystem [5], but which gives its data a richness and a level of interoperability that is difficult to achieve using other systems. The solid foundations of the CIDOC CRM, on which it is built, already provide the necessary top-level classes and properties to model objects, events, actors, spatial and temporal entities in a standard way, leaving full freedom to its classes and properties to focus on the issues concerning text and its material production in antiquity [2]. This same compatibility is what allows the model to define textual entities as
EpiDoc: A de facto standard for ancient texts
It should be emphasised that epigraphists and papyrologists have long since chosen TEI EpiDoc [17] as their own metadata standard, as this tool is extremely versatile for representing texts and the phenomena that typically characterise them, with particular attention to the needs of a rich and well-rendered visualisation.
EpiDoc, which for the treatment of the text is based on the Leiden conventions [30], provides a series of tags for detecting specific elements, since the text itself may contain semantically relevant information that needs to be captured in some way [16]. Interesting examples in this sense are the tags that identify temporal entities, actors and place names, which give EpiDoc the ability to bind external semantic elements starting from identifiable textual fragments.
Nevertheless, it should also be noted that EpiDoc does not offer the typical descriptive tools used by ontologies to capture the conceptual
CRMtex: An ontology for ancient texts
The need to create a new ontology for ancient texts started from the assumption that, unlike printed texts, non-mechanised written texts (including inscriptions, papyri and manuscripts) have specific features that must be taken into account for their study.
We have based our model on the solid foundations of CIDOC CRM because it constitutes one of the most widely used ontologies in the field of Cultural Heritage. In its core version, CIDOC CRM already provides most of the entities necessary to model common elements such as actors, objects, places, events and their mutual interrelations on a chronological basis.
CRMtex has its foundation in the semiotic aspects of language and text [8]; the core concept of our model is therefore the notion of “text” as the product of a semiotic process involving an encoding (“writing”) and a decoding (“reading”) process. Writing is in turn a particularly sophisticated human technology allowing the encoding of a linguistic message through a series of signs specifically selected for this purpose [27,28].
Investigating in detail the close relationship that links the text with the writing event, some considerations to clarify its nature need to be set forth.
Although every speech can be transposed into an equivalent written message, and
In this semiotic perspective, it is worth considering that even in writing, as in analysis of the linguistic system, it is necessary to distinguish the concrete level of the personal execution (i.e., the real act of tracing signs on a surface) from the abstract level, to which all the single occurrences must be taken back, on the basis of a principle of identity or sameness (e.g., identification of an “A”, independently from the peculiar shape someone may give to it).
Thus, a “text” is constituted by a number of signs physically traced (i.e.,
Because of their non-mechanised origin, ancient texts are unique and unrepeatable entities; in addition, along with their support, they form an inextricably linked, unique object of study. Thus from a conceptual point of view, whether it is painted, written in ink or engraved, a text preserves its physical nature, which is a feature deriving its existence from its strict dependence on the support on which it is located.
CRMtex provides specific entities to describe all these phenomena and, being an extension, takes advantage of the power of CIDOC CRM and its other extensions (e.g., CRMsci, the scientific observation model [11,34], CRMarchaeo, a model for archaeological excavation documentation [9,14]) to describe general, non-textual information (i.e., actors, places, objects, temporal entities, observations, archaeological contexts and so forth). In its current version (v1.0), CRMtex is composed of 9 classes and 11 properties; all of them are defined as subclasses and sub-properties of CIDOC CRM classes and properties. A brief description of each of them is provided below. The full documentation, with all the scope notes and examples, can be found on the CIDOC CRM Extensions web pages [6]. An RDFS version of the model is also provided [12].
CRMtex classes
CRMtex in its previous version provided a set of 6 classes, which we have covered extensively in [24] and [22]. In Section 4 of this paper we present the new classes (
In addition to dealing with a text as an object, our model also focusses on the research procedures, and provides classes and relationships to describe the typical operations that scholars from different disciplines perform in order to gain knowledge about textual entities. It is evident, in this perspective, that the study of ancient texts typically starts from the analysis of the physical characteristics of the text itself before moving to the investigation of their archaeological, palaeographic, linguistic and historical features. In this regard, we have defined the following classes:
CRMtex properties
CRMtex also provides adequate properties to link instances of its classes. A list of the properties is provided below. The use of the new properties in version 1.0 (

General overview of the new CRMtex model.
The overall scheme of CRMtex classes and properties is presented in Figs 1 and 2. In particular, Fig. 1 presents the modelling of the text and its production, considering the three different encoding levels: the level concerning the written text, i.e., the glyphs (physical level); the level concerning the graphemes and the other symbolic entities encoded at the time of writing (symbolic level); the level concerning the ideas and concepts intended to be expressed and communicated by the text (conceptual level). Figure 2 offers the point of view of the investigation of the text with the entities related to its reading (i.e., the accurate observation of its physical features) and its transcription.

Reading and transcription activities in CRMtex.
Written text segments
In designing the new entities of our model, we began by thoroughly investigating the interconnections existing between the text and its various components. We have also tried to establish a complete chain of connections to link these components and the whole text with the linguistic level they encode. Some elements have proved to be absolutely essential for this purpose. Concerning the reading process (i.e., the decoding of the text), and therefore the investigation of the text by scholars, one has shown particular importance, namely, the text segment element.
We therefore introduced the new
Scholars of different disciplines need to identify such segments, based on the requirements of their study, and to focus their attention on them in order to describe their physical properties (form, layout etc.), to verify their legibility or to identify particular phenomena (e.g., linguistic or palaeographic aspects) that are connected to them. When modelling, it is important to define unambiguously such segments and their relationship with the text in its entirety, so as to be able to assign specific properties to the individual segments, independently of the text as a whole. Particular production (
The relationship between a written text (

Written text and written text segment in CRMtex.
The physical signs composing a
Phonographic writing systems [40,41] represent phonological units of one size or another, but the 1:1 correspondence between sound (phoneme, syllables
Concerning the message retrieval, reading the written message presupposes the ability to read the language of the writer since each grapheme is bound to a given linguistic unit of specific languages.
In this view, the model provides two new classes to represent the units the scholars deal with:
Moving to the level of the linguistic sounds, it will be the decoders (readers, including scholars) who, from time to time, on the basis of their knowledge of the linguistic system, will attribute to each sign or group of signs the adequate phonetic value, also doing so on the basis of spelling conventions present in a given graphic system at a given historical moment, since the orthographic rules can change over time, even if less quickly than the linguistic system does. The ontological description of the link between linguistic and graphic units is under preparation by the authors.
Application scenarios
CRMtex and EpiDoc
In designing our model, we have always tried to maintain the compatibility of our entities with those of EpiDoc. A natural compatibility with our model obviously exists for the information present in the
The same information can be expressed in CRMtex by combining the
More details can be specified for each character, if necessary, by instantiating a
Diversely, an erasure indicating a text that is lost and is thus illegible, encoded in TEI EpiDoc (XML) as:
implies, according to CRMtex, the use of an
RDF notation is certainly less concise than that provided by EpiDoc, but it is also more expressive and able to specify the historical or environmental circumstances that determined a particular condition of the text, to provide a deeper level of standardisation in the formalisation of knowledge (for example, through the use of thesauri) and to link external relevant entities for implementing information enrichment in multiple stages, including after the initial encoding. The use of the
The complexity of encoding in RDF responds to the need to describe in detail all the events involved in the life of the text to be encoded. It is clear, however, that it is not necessary to choose between EpiDoc and CRMtex since the two tools respond to different research needs. CRMtex is not aimed at the digital edition of the text, for which the EpiDoc XML encoding already works well, but at capturing and describing knowledge related to the text itself in a holistic perspective. However, the two models can be used in synergy to create richer metadata and build more structured and complete information from both a descriptive and semantic point of view, thus fostering interoperability of textual information in the typical integrated scenarios of the Semantic Web.
The inscription on the Arch of Constantine
To illustrate the features of the new version of the CRMtex, we propose an epigraphic example: the inscriptions on the Arch of Constantine, one of the most famous ancient monuments in Rome.
Other examples of the application of our model are illustrated in our previous publications: in [24] it is applied to the encoding of an inscription in Oscan, a language of fragmentary attestation; in [22] is used in a different field of application, that of papyrology, for the encoding of the Derveni papyrus.
The arch, still located in its original position between the Colosseum and the Roman Forum, is a triumphal marble arch (the largest monument of this kind in Roman era) dedicated in 315/316 A.D. by the Roman Senate to the emperor Constantine after his victory over Maxentius in the Battle of the Milvian Bridge in 312 A.D. Among the other decorations (including statues, panels, reliefs and similar decorative material), the arch carries, on its attic, two identical inscriptions [7], originally inlaid with gilded bronze letters, explaining the reason for its construction.
The bronze letters are now lost and only the large cuttings remain in the marble, in which the bronze letters were fixed. The text is repeated, identically, on the South and North faces of the arch. A transcription and a translation in English of the same inscription is presented below.
Transcription of the inscription:
IMP(ERATORI) · CAES(ARI) · FL(AVIO) · CONSTANTINO · MAXIMO · P(IO) · F(ELICI) · AVGUSTO · S(ENATUS) · P(OPULUS) · Q(UE) · R(OMANUS) · QVOD · INSTINCTV · DIVINITATIS · MENTIS · MAGNITVDINE · CVM · EXERCITV · SVO · TAM · DE · TYRANNO · QVAM · DE · OMNI · EIVS · FACTIONE · VNO · TEMPORE · IVSTIS · REMPVBLICAM · VLTVS · EST · ARMIS · ARCVM · TRIVMPHIS · INSIGNEM · DICAVIT
Translation of the inscription:
“To the Emperor Caesar Flavius Constantine, the Greatest, Pius, Felix, Augustus: inspired by (a) divinity, in the greatness of his mind, he used his army to save the state by the just force of arms from a tyrant on the one hand and every kind of factionalism on the other; therefore, the Senate and the People of Rome have dedicated this exceptional arch to his triumphs”.
From the CIDOC CRM point of view, the arch is an archaeological object (i.e., an

CRMtex modelling of the inscription on the South side of the Arch of Constantine.
CRMtex can be used to describe the inscriptions appearing on the arch and relate them to the monument via the
An instance of the
The linguistic message conveyed by the inscriptions (
Over the centuries, the arch of Constantine has been investigated thousands of times by scholars from all over the world and also reproduced by famous illustrators such as Giovan Battista Piranesi. In addition, the inscriptions have been studied and transcribed several times in order to understand its nature, clarify the meaning of each section and improve its historical comprehension so as to put it in direct relation with the events that determined its creation.
For this type of activity, aimed at studying and processing the inscribed text, CRMtex provides specific classes and properties. The transcription of the text(s) present in
The
The
CRMtex was developed by adopting the best modelling principles of the ontological world and the fundamental paradigms of linguistic research: this makes it a tool capable of conferring ontological value to textual entities, offering innumerable benefits for research in many humanistic disciplines. The possibility to provide representation of cultural data on the Semantic Web, to publish them in standard formats (such as LOD) and to make them easily available, interoperable and reusable in an infinite number of contexts, certainly represents one of the most relevant features of the model.
The native ability of CRMtex to describe relationships between text and artefacts by efficiently placing the text in the context of the life and history of ancient objects, also makes it ideal for employment in projects like ARIADNEplus or in initiatives like Epigraphy.org. The perfect compatibility with EPNet, the model used by some ARIADNEplus partners to codify epigraphic information, will foster the possibility for CRMtex to become part of the Application Profile for epigraphy under definition within this project.
Nevertheless, a lot of work still remains to be done for the ontology to reach its maturity.
In 2018 CRMtex was accepted as part of the CIDOC CRM family [4], thus becoming a new tile of the CIDOC CRM mosaic of models. A process of fine tuning to make CRMtex perfectly integrated and consistent with the other extensions of this ecosystem is already under way. In particular, we will need to plan harmonisation with CRMinf [10], the importance of which we have already stressed for the interpretation of the text (see Section 5), and with FRBRoo [25], a CIDOC CRM compatible model aimed at representing the semantics of bibliographic information. Many FRBRoo classes (such as the
Despite being a relatively new model that is still under development, CRMtex is already used in many contexts where the definition of textual entities from the ancient world is fundamental, especially in the Cultural Heritage field. Thus, CRMtex is employed by various initiatives of far-reaching national and international scope for this aim. CRMtex has been selected as the ontological model for the project “Aggressive magic in the ancient world: lexicon and formulae of Greek texts” of the University of Florence (Italy) [38], focussing on the study of Greek curse tablets and magical papyri. In the ARIADNEplus framework, it has been chosen as a candidate for the encoding and integration of inscription and graffiti data in the semantic infrastructure that the project is building. CRMtex has also been selected among the basic models for the ontology in the process of definition by the community of epigraphists in the framework of the epigraphy.info initiative for the interoperability of epigraphic data.
The model is constantly expanding and is oriented towards the deepening of linguistic aspects that could enhance its skills and foster its use in other disciplines. Thus, among future activities, we aim to investigate the close correlation of graphemes with the linguistic units (such as phonemes) of which they are conceptual representations and the way in which, through phonemes, the thought of the speaker (and therefore of the writer) materialises in the form of linguistic expressions to become text. We shall then extend CRMtex with new entities that are suitable to describe such complex linguistic phenomena.
