Abstract
Terahertz quantum cascade lasers (QCLs) are semiconductor laser devices that operate in the far infrared range (frequency range from about 100 GHz to 10 THz). QCL properties can be categorized as follows: design of the laser (heterostructure properties capturing the various materials used in the laser structure and the various laser design types) and the laser Optoelectronic properties (laser performance behavior as a result of injection of current into the laser device). Maintaining ontologies with this information is useful in supporting data mining activities that seek to retrieve useful information on the various QCL designs and their respective performance, together with provenance information. This provides a platform to share and interact with QCL data by both machines and humans in a Findable, Accessible, Interoperable, and Reusable manner. The existing ontologies in the material design domain do not capture this crucial information. This is due to a lack of formal definitions for the QCL property concepts. In this paper, we address the issue of formal representation of the specified QCL properties and the relationships among them. We propose a semantically enriched ontological model of properties in the QCL domain. We evaluate the ability of ontological representation to model the QCL properties using an inheritance richness metric-based evaluation and the ontology validation technique. Experimental evaluation indicates the consistency of the ontology, its ability to answer 100% of the competency questions by QCL domain experts, and an inheritance richness metric of 0.133, indicating a detailed level of the ontology in capturing the domain requirements.
Introduction
The quantum cascade laser (QCL) is a type of semiconductor laser device that was first proposed in 1994 at Bell Laboratories (Faist et al., 1994). A semiconductor laser is an optoelectronic device made out of several materials, with at least one of which is a semiconductor material. As opposed to other typical interband semiconductor lasers whose laser radiation emission is based on the recombination of electron–hole pairs across the material band gap, the QCL is unipolar in nature and the electromagnetic radiation is emitted through the use of intersubband transitions in a repeated stack of semiconductor multiple quantum well heterostructures (Kazarinov, 1971). The devices are carefully designed by material scientists, keeping in mind a given range of performance capabilities based on the design features.
One of the promising implementations of the QCL is the terahertz quantum cascade semiconductor lasers. These lasers have been utilized in various applications, for instance, in screening various types of abnormal tissues (Vafapour et al., 2020) and in configuring high-speed networks in the electronics field (Kanno et al., 2015). The design of a QCL device with optimal performance parameters is therefore highly desired in order to maximize its application potential in various domains.
The QCL design structure is made up of complex heterostructures. In most cases, the properties of the laser are defined by the growth sheet, which gives information on the heterostructure thickness, the materials combined, and their respective combination order. The QCL properties can be broadly categorized into two: design which includes the heterostructure, that is, the material design properties capturing the various material combination used in constructing the semiconductor laser device together with the specification of the layer sequence and secondly the optoelectronic characteristics, that is, the laser performance behavior as a result of injecting current into the laser, for instance, the working temperature, power, lasing frequency, etc. Some of the laser properties are dependent on other properties and the working mode of the laser. For instance, the semiconductor laser device working temperature may vary based on whether the device is working in continuous or pulse mode.
Structured information capturing the various QCL device designs and their corresponding performance characteristics is very crucial in deciphering the complex structure of the laser. This is useful, for instance, in understanding the laser structure in relation to its performance. Information on the QCL device properties exists in varied sources. Well-structured knowledge on semiconductor laser designs and performance is important in optimizing the design process of the lasers, as there is availability of answers on various QCL device design queries. This will also address issues related to Findable, Accessible, Interoperable, and Reusable (FAIR) principles, which will enable automatic sharing and use of data in the QCL domain by machines and humans (Wilkinson et al., 2016). Further, materials data access, acquisition, representation, and sharing are also identified as critical tasks for the materials science community (Billinge et al., 2006; Hunt, 2006). There have also been attempts in developing methods for extracting structured data on QCL properties from the scientific literature (Kerre et al., 2023). This signifies potential interest in the design of optimal QCL devices based on the understanding of the various QCL design features and their impact on the device performance.
The existing ontologies could not be readily instantiated and used to capture this information. This is attributed to the lack of formal definitions of concepts capturing the QCL performance and design properties. Examples of properties not formalized include the heterostructure properties, such as the layer sequence, heterostructure materials, and the design types. There is also no formal representation of the various relationships between the QCL properties, for instance, between the working modes, optoelectronic properties, and the various QCL design types. There is no formalization for capturing the metadata and links to the provenance for the QCL properties. The specific nature of the existing ontologies also renders them unsuitable for extension to capture properties in the QCL domain. Moreover, existing ontologies related to semiconductor devices are either quite general and cannot precisely describe the complexity of the semiconductor design at the microscopic scale or are only suited to the formal description of a material (Li et al., 2024). For instance, the growth sheets or the waveguide types are features that are not captured by existing semiconductor ontologies.
There is therefore a need for an enriched, formal representation of the QCL device properties that capture the physical properties and the various designs in terms of the heterostructure properties and the laser design types. This will allow the existence of shared vocabularies for QCL properties, which will enable interoperable access to QCL properties data. This will make it possible for the community to structurally access and analyze QCL properties data from heterogeneous sources in order to understand the relations between them.
In this paper, we address the task of formal representation of the QCL properties and the relationships between them by presenting a semantically enriched ontological model of properties in the QCL domain. This is to allow a standardized access and analysis of QCL data from the various heterogeneous sources. The main focus of the ontology is to formalize the relationship between laser designs and the performance characteristics. The relation between the QCL working properties and other laser working modes is also captured. We also validate the consistency of the ontological representation with a logical reasoner and sample data mined from scientific articles using an approach proposed in Kerre et al. (2023). The main contributions of this paper are therefore as follows:
A comprehensive review of the state-of-the-art on ontologies and vocabularies in material design, in relation to our domain of interest. A semantically enriched ontological modeling of properties in the QCL domain. A comprehensive evaluation of the ontology based on a data-driven approach.
The rest of the paper is organized as follows: we briefly explain the motivation scenario for the ontological model in Section 2, an overview of ontologies and standards in material design in relation to the QCL domain in Section 3, the development approach of the QCL properties ontological representation, the concepts, relations, and the axiomatization in Section 4, the evaluation approach in Section 5, results in Section 6, technical specifications of the ontology in Section 7, and lastly the conclusion in Section 8.
The motivation for an ontological model of properties in the QCL domain arises from the need for structured access to QCL properties data captured in various heterogeneous sources for analysis. There is a need for a standard way to access and analyze the data for insights regarding the fabrication of a QCL device with desired performance characteristics. We present a scenario where a semiconductor laser engineer intends to fabricate a QCL device having a heterostructure with desired optoelectronic properties, such as working temperature and optical power for an optimized operation. Also, for the existing QCL devices, a design expert may want to quickly understand the relationship between the various designs and performance parameters in order to get insights for future semiconductor fabrication processes. The design expert may also be interested in understanding trends in semiconductor laser fabrication over a given period of time. This creates the need for valid references to various sources documenting the laser device design and performance characteristics. In the process of undertaking these tasks, the following issues emerge:
The QCL design and optoelectronic characteristics data exist in dispersed sources such as lab notebooks, manuals, and scientific articles reporting the various semiconductor devices. Decisions regarding designing QCL heterostructures with target properties are usually carried out on an experimental basis involving manual analysis of experimental data, which takes time, hence delaying the design process.
With these issues in mind, there is therefore a need for a solution that enables a structured representation of design and optoelectronic characteristics for the semiconductor laser domain and that also provides provenance information for the various properties. This will serve to provide a platform for the organization of experimental data from various sources, together with respective links to the sources of this data. This will also provide a standard way of exploring the data and understanding the inherent relationships in a quicker way, hence enhancing efficiency in the semiconductor laser fabrication process. Semantic enrichment will also provide links to other related data, such as formal definitions of QCL properties and their corresponding units on the web. The ontology model will allow FAIR access to QCL data captured in different sources, hence allowing semantic interoperability when analyzing this data by the community. Lastly, the formal representation can also be used to define a schema that can be used to organize data in order to generate a knowledge graph (KG) for the QCL data. The KG can be used to represent massive experimental data to allow a quicker analysis of the inherent relationships within the data.
In this section, we give a detailed overview of ontologies and standards in materials science. The guiding intentions in the analysis are as follows:
To analyze the ability of the ontologies to represent the relationship between design and performance properties in the QCL domain. The ability of the ontologies to be instantiated with sample data on quantum cascade semiconductor laser properties and provide answers to queries regarding the laser properties.
In order to achieve this, we use search services such as MatPortal
1
, BioPortal
2
, Linked Open Vocabularies,
3
and search engines such as Google.
In the materials science domain, there is progress in the use of semantic technologies for various applications, such as the representation of complex domain knowledge. This enables the sharing and utilization of complex information in an open and agreeable way by both machines and humans. Ontologies are one of the technologies being widely adopted, with the focus being on representing specific subdomains and general material domain knowledge.
A couple of general ontologies that represent general materials domain knowledge have been proposed. These include Chemical Entities of Biological Interest (ChEBI; Degtyarenko et al., 2008), a freely available dataset of molecular entities, for example, atom, molecular ion, etc. Basic formal ontology (BFO; Smith, 2012), descriptive ontology for linguistic and cognitive engineering (DOLCE; Gangemi et al., 2002), general formal ontology (GFO; Herre, 2010), and the elementary multiperspective material ontology (EMMO; Ghedini et al., 2020).
Other ontologies have also been developed with specific domains or particular interests in mind. The application scenarios range from giving a general representation of concepts in a domain of interest to activities such as standardization of data curation and sharing in the material design databases. We present examples of the ontologies in Table 1.
Some Ontologies in Material Science.
Some Ontologies in Material Science.
MDO: materials design ontology; MMOY: metallic materials ontology; MAMBO: materials and molecules basic ontology.
MatOnto ontology (Cheung et al., 2008), based on DOLCE, is used for representing oxygen ion conducting materials for the fuel cell domain, Materials Ontology (Ashino, 2010) for data exchange among thermal property databases, and materials design ontology (MDO; Li et al., 2020) for the materials design field, representing the domain knowledge specifically related to solid-state physics and computational materials science. An ontology for a Polymer Nanocomposite Community Data Resource is also proposed in Rawte et al. (2017). The NanoParticle Ontology (Thomas et al., 2011), based on BFO, gives a presentation of nanoparticles’ properties to design new nanoparticles, while the eNanoMapper Ontology (Hastings et al., 2015) gives an assessment of risks related to the use of nanoparticles. Also, an ontology for design patterns for modeling material transformation for the sustainable construction domain is presented in Vardeman II et al. (2017), and an ontology for representing knowledge on simulation, modeling, and optimization in molecular engineering sciences is presented in Horsch et al. (2020). MatML (Kaufman & Begley, 2003) is an extensible markup language for exchanging materials information. MatOWL (Zhang et al., 2009), based on MatXML Schema, is used to facilitate ontology-based data access. The metallic materials ontology (MMOY; Zhang et al., 2016) is used to represent materials knowledge from YAGO (Tanon et al., 2020), a knowledge base capturing many topics including material properties. The Dislocation Ontology (Ihsan et al., 2021) reuses some concepts from MDO to represent knowledge on crystalline materials. There are also platforms that function as a prototype to describe materials science experiments, for instance, the MaterialDigital Ontology (Alam et al., 2021). The materials and molecules basic ontology (MAMBO; Piane et al., 2021) integrates EMMO, CheBI, and MDO to represent concepts and relations emerging on materials with a focus on the relationships between molecular aggregation and properties of the system, and lastly, the ELSSI-EMD ontology (European Committee for Standardization, 2010), provides guidelines for material testing standardization. Ontologies have also been proposed for the additive manufacturing process domain. Examples include: laser and thermal metamodels (Roh et al., 2016), laser powder bed fusion (Li et al., 2022), and metal additive manufacturing (Gouttebroze et al., 2023; Roh et al., 2021; Sanfilippo et al., 2019). Lastly, a recent work on an ontology for the semiconductor domain dubbed SemicONTO has also been proposed in Li et al. (2024).
One of the key issues that arises is whether the existing ontologies can be able to model/represent the complex relationships between the design and performance characteristics in our domain of interest. The existing ontologies cannot be readily used to present a formal representation of properties in the QCL domain due to some reasons: Some of the general ontologies, such as ChEBI, DOLCE, BFO, GFO, and EMMO, give a more general formal representation of the concepts that provide a broad definition of terms that does not give a clear definition of the properties in our domain. The specific domain ontologies, such as MatOnto, materials ontology, MDO, nanoparticle ontology, eNanoMapper, MatOWL, MMOY, dislocation ontology, and material digital ontology, are restricted to specific domain concepts that do not fit in our scope. For the ontologies in the laser domain, such as laser and thermal metamodels (Roh et al., 2016), laser powder bed fusion (Li et al., 2022), metal additive manufacturing (Gouttebroze et al., 2023; Roh et al., 2021; Sanfilippo et al., 2019), some of the concepts such as the laser design type and the heterostructure properties such as the layer sequence are not captured. The relationships between these properties are also not captured. This is also the case with ontologies in the semiconductor domain, where formal definitions on the heterostructure growth sheet are not captured. There is also a need to formalize provenance information for the QCL properties. The focus of this work is more on the representation of “wafer fabrication” or heterostructure properties, which is a critical step in the QCL laser device development and the relation to the performance characteristics of the laser devices.
In this section, we give an overview of the development methodology for the QCL properties ontological representation and the description of the ontological representation.
Development of the QCL Properties Ontological Representation
In the development phase of the QCL properties ontological model, we adopt the NeOn ontology engineering methodology (Suarez-Figueroa et al., 2011). This methodology consists of a list of scenarios mapped from a set of common ontology development activities in the ontology engineering life cycle. These scenarios capture the various activities that are carried out in the development process of an ontology. Examples of these activities include specifying the user requirements (URs), reusing, and re-engineering ontological resources, reusing nonontological resources, etc. The scenarios adopted in a particular ontology design process depend on the activities to be carried out during the ontology engineering process, and therefore, in some cases, not all of them are adopted. We particularly focus on applying scenario i (from specification to implementation), Scenario iii (reusing ontological resources), Scenario iv (reusing and re-engineering ontological resources), and Scenario viii (restructuring ontological resources). We focus on and only adopt the stated scenarios based on our URs.
We use the Web Ontology Language (OWL 2) 4 as the representation language of the ontological representation using the RDF/XML syntax. The choice of OWL 2 DL was critical so as to enable reasoning and consistency checks on the ontological representation using the available standard reasoners. We utilize two tools in the development of the ontology: Repairing Ontological Structure Environment (RepOSE; Lambrix & Ivanova, 2013), which allows ontology debugging and proposal of additional knowledge to the ontology and OntOlogy Pitfall Scanner (Poveda-Villalon et al., 2014), for detecting some of the common pitfalls encountered during ontology development. Throughout the development process, input from domain experts in semiconductor heterostructure laser fabrication and knowledge engineers is considered. In the remaining part of this section, we detail the key aspects in the development process of the QCL properties ontological model.
Requirements Analysis
In this step, we clarify the requirements of the ontological representation in relation to Scenario i of the Neon methodology for ontology engineering. This involves proposing use cases, that is, the URs, competency questions (CQs), and additional restrictions on the knowledge representation schema.
The URs for our proposed QCL ontological model are identified through literature and from discussions with domain experts in the field of QCLs and are as follows:
UR1: The ontology model will be used to represent knowledge about the various QCL designs (in the form of the heterostructure/material design) and the optoelectronic characteristics (such as output power, working temperature, and lasing frequency) based on the various QCL designs. UR2: The ontology model will be used for representing, in addition to the QCL designs and properties, the various working modes at which the properties are achieved based on the designs. UR3: The ontology model will be used to maintain provenance information about the various QCL designs and performance characteristics. This will enable tracking of data on QCL development with information such as the developers/authors, the year in which the design was proposed, and useful, permanent links to the resources such as the DOI.
The CQs are also agreed upon based on discussions with domain experts. We propose a list of 10 CQs for the QCL ontology model. The questions are as follows:
CQ1: What are the material composition and sequence layer of a heterostructure with a particular design type? CQ2: For a particular design type, what are the possible layer sequences and material composition? CQ3: What is the material composition of a particular heterostructure with a particular sequence layer? CQ4: What are the resultant performance characteristics of a QCL laser working in a particular working mode? CQ5: For a particular heterostructure (as described by the sequence and material composition), what are the resultant laser performance characteristics? CQ6: For a particular laser performance characteristic, what are possible design properties (i.e., layer sequence and material composition)? CQ7: What is the sequence layer of a heterostructure with a particular material composition? CQ8: Who are the authors of the particular laser device having certain properties? CQ9: When was the laser device proposed? CQ10: Where is the information published?
We also provide a list of additional restrictions to help define the concepts as below:
A heterostructure corresponds to one particular design type. A property can relate to the working mode of the laser. A property corresponds to only one laser working mode. A property can also relate to the heterostructure. A layer sequence corresponds to one material combination formula. The full list of additional restrictions can be found at the GitHub repository
5
.
We adopt a pattern related to provenance information in the repository of Ontology Design Patterns that leads to the reuse of entities from the PROV-O ontology. The ontology is also developed in a modular way where the development is based on the categories of information to be represented, that is, the design, provenance, working mode, and performance properties. Our proposed ontological representation reuses some terms and concepts from well-established ontologies such as EMMO by reusing the concept “Material” and CheBI by reusing the term “Atom.” We also reuse the concepts of “Agent” the PROV-O ontology to represent provenance information (Lebo et al., 2013) and the term “Property” from the MDO in order to represent information on the various QCL properties such as working temperature, power, etc.
In order to represent the units, we reuse the terms “Quantity,” “Quantity Value,” “QuantityKind,” and “Unit” from the Quantities, Units, Dimensions and Data Types Ontologies (QUDT; Haas et al., 2014). We use the term “AcademicArticle” from the BIBO vocabulary 6 to represent an academic journal documenting the QCL properties. We also use the metadata terms from the Dublin Core Metadata Initiative 7 to represent the metadata of the ontological representation.
The ontological representation of properties in the QCL domain contains a total of 15 concepts, 23 relations, and 11 instances. The information captured by the ontological representation can be categorized as follows:
The

Upper Concepts in the Ontology.

Description Logic Axioms for the Upper Concepts in the Ontology.
In order to represent knowledge on the QCL performance/optoelecronic properties, we use the following concepts:

Ontology Section for Quantum Cascade Laser Optoelectronic Characteristics.

Description Logic Axioms for the Laser Physical Properties.
The QCL design information is captured using the following concepts:
The laser heterostructure represents the laser layer design comprising the various semiconductor materials. A laser heterostructure has a design type (D1), heterostructure materials (D2), the layer sequence (D3), and relates to a Material (D4). The design type of the laser refers to the geometrical arrangement of materials in the laser design, while the heterostructure materials represent the various materials included in the heterostructure and their respective ratio of combination. The heterostructure materials are composed of atoms (D5) and have a matFormula in a string which captures the chemical elements and their ratio of combination. The layer sequence is based on the heterostructure materials (D6), has a unit (D7), and has the sequence in a string. The laser design type has two instances, that is, BoundToContinum and LOPhonon depopulation, describing the laser design types. Figure 5 shows the ontology section describing the QCL design, and Figure 6 shows the description logic axioms for this description.

Ontology Section for the Heterostructure Design Properties.

Description Logic Axioms for the Laser Heterostructure Design Properties.
The laser working mode information is represented using the

Ontological Representation for the Quantum Cascade Laser Working Mode.
For provenance information, we use the terms

Ontology Section for Provenance Information.

Description Logic Axioms for Provenance Information.
Figure 10 shows an overview of the entire ontological representation of properties in the QCL domain with all the concepts interconnected together to model the complete relationship between the laser design and performance properties.

Overview of the Quantum Cascade Laser Properties Ontological Representation.
The evaluation of the ontological representation is done based on (i) consistency of the ontology, (ii) evaluating the success of the ontology in modeling the domain of interest (formative evaluation), and (iii) the richness of the ontology based on an ontology quality evaluation metric.
Consistency
The ontology consistency is defined as follows:
A given ontology definition is consistent if there is no contradiction in the interpretation of the formal definition with respect to the real world (Gómez-Pérez, 2001).
This implies that there should be no contradictory conclusions derived from the meaning of all the definitions and axioms in the defined ontology, and the ontologies included in this ontology.
In order to evaluate the consistency of the ontological modeling, we use the following ontology checking tools: Pitfall scanner (Poveda-Villalon et al., 2014), the HermiT Reasoner (Glimm et al., 2014), and the Pellet Reasoner (Sirin et al., 2007) embedded in protege software. 10 Any inconsistencies detected by the HermiT and Pellet reasoners were identified and resolved. For the OOPS! Scanner, critical, and important pitfalls were also considered and rectified.
Under formative evaluation, we evaluate the success of the ontology in modelling the domain of interest. This is done by carrying out ontology validation (de Almeida Falbo, 2014).
Ontology validation involves evaluating the ontology using sample test cases in order to verify its applicability to the intended problem. The validation is deemed successful if the ontology passes the test cases (de Almeida Falbo, 2014).
The role of this step is to verify the ability of the developed ontology to meet its intended purpose of representing QCL properties data extracted from scientific text and be able to provide inferences for questions regarding QCL properties.
In order to design our experiments, we use data composed of sample semiconductor QCL design and optoelectronic properties. The properties are the laser heterostructure (material composition and sequence), working temperature, lasing frequency, optical power, laser working mode, and laser design type, together with the respective units. We also include data on provenance, such as article DOI and publication URL. The individual properties are mined from a sample of scientific articles using an approach proposed in Kerre et al. (2023), except for the laser design type, working mode, and URL, which are included using the human-in-the-loop approach. This constitutes a total of 181 QCL property instances documenting 15 QCL devices.
The units are linked to specific URIs in the QUDT ontology to provide a reference to their description. For data preprocessing, we semantically enrich the data with URIs to the respective resources describing the data elements, for instance, the working mode, laser design type, and the units. Table 2 shows the statistics of the test data.
We define data mapping rules for mapping the data to the ontology schema using the RDFlib library. 11 We generate a sample KG containing a total of 831 triples, on which we run the validation scripts in the form of CQs as detailed in ontology validation in Section 5.2.2.
Ontology Validation
A set of CQs are defined by QCL design experts from the use case scenarios defined in Section 4. The CQs are set such that they capture all the information represented in the use cases and are therefore used as functional requirement specifications for the ontology. The validation process is performed with the sole aim of ensuring that the ontology fully conforms to these requirements and should therefore be able to answer all the CQs correctly. The CQs are represented in SPARQL, the formal RDF query language.
Summary Statistics for the Test Data.
Summary Statistics for the Test Data.
Table 3 shows the general classes of CQs used in this paper.
W, X, Y, and Z are place holders for any suitable values for the laser design and performance properties. For the CQs classes in Table 3, we run several possibilities of queries (in Table 4) capturing the various combinations of information needed by the CQs on the KG. We compare the responses returned by the SPARQL queries and the desired outputs for each query in order to determine the precision in query answering.
In this evaluation phase, we assess the richness/quality of the ontology schema by using the evaluation mode adopted in Tartir et al. (2005). This approach evaluates the quality of the ontology based on schema and instance metrics. These metrics provide varied information for assessing the various richness within an ontology. In this study, we adopt the inheritance richness (IR) assessment metric to assess the quality of the ontology model. This metric describes the distribution of information across different levels of the ontology inheritance tree. Formally, the IR metric for a class
Competence Questions (CQs).
Competence Questions (CQs).
Ontology Validation
For ontology validation, we run a total of 12 queries on the KG, for 12 possible specific CQs defined for the CQs specified in Section 5.2.2. This results in a total of 150 records fetched by the queries. The CQs for each of the CQ classes are as follows: CQ1 (two CQs), CQ2 (one CQ), CQ3 (two CQs), CQ4 (two CQs), and CQ5 (three CQs). The queries are designed to retrieve relevant information as per the defined CQ classes. The specific questions for the query classes and the corresponding classes and properties answering the CQs are presented in Table 4.
We compare the queries’ responses with the actual values in the data to determine the precision of query answering. All the 150 records returned by the queries match the expected data values for the specific relations of interest, hence resulting in a precision of 1. This demonstrates the ability of the ontology model to capture the domain relationships.
The CQs show the ability of the ontology model to model various laser fabrication scenarios. For instance, it is possible to analyze various trends such as ranges of semiconductor laser working temperature in various working modes, possible layer sequences for certain heterostructure materials, and their corresponding lasing capabilities, such as the lasing frequency. It is also possible to access references to documents for specific semiconductor laser properties, for instance, the DOI of a scientific document with specific properties. This presents a good step in providing a structured way of accessing all this information. The complete list of queries for the CQs and their results is published together with the ontology as specified in Section 7. Figure 11 shows the query for CQ4.2 (What are the operating temperatures of semiconductor laser devices working in a continuous wave operation?) and Figure 12 shows the result for the query.

Query for Competency Question 4.2.

Result for the Query for Competency Question 4.2.
The ontology model achieves an IR metric of 0.133. This shows the level of detail of the ontology model in capturing the domain requirements. The ontology model is therefore suitable to represent concepts in the domain at the data level, which enables efficient exploration of the data to derive useful inferences regarding the semiconductor wafer fabrication with target properties. The relationships correspond to the data population requirements, making it easier to populate and analyze the data. This is crucial in optimizing the design process of the lasers. The ontology model does not, therefore, capture a lot of general information, although it still captures definitions of high-level concepts.
Technical Specifications
A formal representation of the QCL properties is highly desired in order to formalize the relationship between the various design and working properties. This provides formal definitions of QCL properties and their respective properties, hence providing a platform that enables a structured exploration of these properties. This can also accelerate access and sharing of QCL properties information in a FAIR manner, hence enabling the interoperable access and analysis of QCL data from heterogeneous sources by the community. The existing ontologies in the materials science and laser domains do not capture this important information. This is attributed to the lack of formal definitions for the QCL properties and the relationships among them.
In this paper, we address the issue of formally representing the QCL properties by proposing a concise ontological model of properties in the quantum cascade semiconductor laser domain. The ontology model aims to formalize the representation of the relationship between the design (semiconductor heterostructure) and the optoelectronic characteristics of semiconductor laser devices. The formal representation is semantically enriched with information from relevant sources on the web. The neon design methodology is adopted for the ontology design in order to capture the various ontology development scenarios, and the FAIR data principles are also adopted for the publication of the ontology model.
The proposed ontology model is evaluated on the basis of three strategies: (i) consistency, (ii) ontology validation, and (iii) the richness of the ontology based on a richness evaluation metric. We check the consistency of the ontology and any pitfalls using the ontology checking tools. For ontology validation, we generate a sample KG from sample QCL properties data on which we run queries for the CQs. This is important in evaluating the ability of the ontology to capture the domain requirements. We compare the queries’ output with the actual data for the specific relations of interest. All the CQs are answered correctly. The ontology model richness evaluation metric also shows its ability to capture the properties of interest in detail. This provides a structured way of exploring the properties for optimizing the heterostructure fabrication process, especially when there is a need for heterostructures with target performance properties. The concepts in the proposed ontology model can also be reused/extended to represent knowledge on other related semiconductor device properties. The main limitation of this work is that it does not capture the layer sequence, doping, and barrier properties. The analysis is also based on the specific QCL properties outlined, and we endeavor to extend this in the future.
Future works will involve extending the ontology model with new concepts and relationships, for instance, the heterostructure material and layer sequence, doping, and barrier properties. Other interesting research perspectives will involve the exploration of large language models in generating a KG for the QCL properties. We also aim to define a schema based on the extension of the ontological representation to generate KGs for the properties for massive exploration of semiconductor laser properties and the relationships within them.
Footnotes
Acknowledgments
The authors would also like to thank Strathmore University, School of Computing and Engineering Sciences, and the Doctoral Academy for creating an opportunity for this work to be produced, and lastly, Dr. Gaoussou SANOU (Institute of Human Genetics, CNRS, University of Montpellier, Montpellier, France) for the insightful discussions.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the French Embassy in Kenya (Scientific and Academic Cooperation Department), Strathmore University (Doctoral Academy), and the CNRS (under the framework “Dispositif de Soutien aux Collaborations avec l’Afrique sub-saharienne”).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
