Abstract
Background
Content outlines for medical school curricula commonly rely on hierarchically structured learning objectives (LOs) at program, course, module and lecture level. At the most fine-grained level, these LOs contain specific biomedical terminology. The biomedical terms can be classified and augmented with semantic and relational information via the Unified Medical Language System (UMLS).
Methods
We analyzed the LOs in the preclinical years of spiraled MD curriculum, using natural language processing (NLP) and the UMLS database to add semantic information, to determine the progression of analytical complexity and spiral curriculum design. The complete set of lecture-level LOs for the 2 years of preclinical teaching comprised 6086 unique LOs with 6612 sentences. To analyze progression over time, the LOs were grouped by teaching module in temporal order of delivery.
Results
Six thousand one hundred eighty-nine action verbs were extracted and assigned numerical scores according to Bloom's taxonomy. Bloom scores per module showed the use of increasingly complex action verbs as the curriculum progresses. Matching the LOs against the UMLS database yielded 6454 unique biomedical concepts. Scoring each concept as novel only on first appearance showed that the proportion of novel concepts decreases over time. Using the UMLS semantic tags, the proportion of disease-related concepts increased as the curriculum progressed.
Conclusions
To our knowledge, this is the first systematic NLP analysis of a medical school curriculum, incorporating standardized medical language dictionaries. The results show a clear progression of increasingly complex analytical tasks, and increasing clinical content, in the curriculum over time. Concepts are revisited as indicated by the decreasing proportion of novel concepts, supporting the design goals of a spiral curriculum. Curriculum evaluations can improve objectivity and depth via systematic parsing of large bodies of natural language information, like the lecture-level LO content analyzed here, as well as providing evidence for accreditation.
Keywords
Introduction
Current curriculum mapping methods do not include substantive quantitative analysis, which hinders systematic evaluation. While curriculum mapping has proven valuable for visualizing spatial relationships in medical curricula, 1 there is still no widely used method for parametrically describing and quantitatively analyzing curricular content. 2 In this publication we set out to demonstrate how systematic quantitative analysis of curricular maps can be achieved using natural language processing (NLP) and biomedical content analysis.
The mapping of curricular content is intended to show the links between content, teaching strategies, assessment, and the higher-level program objectives. Qualitative analysis tends to focus on specific themes and keywords, which may not reflect their weight in the overall curricular content. The approach presented here aims to capture the entire body of the curriculum outline, as described by detailed lecture-level learning objectives (LOs), to enable systematic quantitative analysis of LO design and content development throughout the curriculum. The present work focuses on measures of the progression of a medical school curriculum across the preclinical courses.
Ideally, from a curricular governance point of view, review and adaptation of a curriculum requires high-resolution insight into curricular content, which the proposed approach aims to deliver. The outcome objectives along with course and lecture objectives define key curriculum design goals and work well with natural language analysis. These methods are widely applicable across any medical curriculum that provides high resolution objectives in natural language format, and can also be applied to more granular curricular content information, such as lecture slides, assigned reading and small group materials.
This work describes new methods for the systematic, quantitative analysis of a medical school curriculum in natural language format. Three novel measures are used to evaluate alignment with the curriculum design goals:
Cognitive complexity: measured by the position of LO action verbs in Bloom's taxonomy Novelty versus revisiting of topics: measured via the proportion of novel concepts per lecture Clinical content: assessed via the proportion of disease-related concepts per lecture, as defined by semantic tags in the UMLS corpus
The core data used to derive the above outcomes included all LOs from didactic content (lectures, assigned reading activities, small groups). Additional data required for the 3 outcomes was added in a subsequent data augmentation step, described below.
Methods
Bloom's Taxonomy
To provide a quantification of cognitive complexity, the action verbs for each LO were assigned numerical scores according to Bloom's taxonomy.3,4 Bloom's Taxonomy is a widely recognized and influential framework in the field of education, as it classifies cognitive skills and LOs into hierarchical levels, arranged in ascending order of complexity. While Bloom's taxonomy is commonly used to design educational goals and structure curricula, most applications occur at the course or program level rather than at the level of individual educational activities, 5 limiting granular assessment of cognitive progression.
However, the original and revised versions of the taxonomy focus on the broader classification of cognitive tasks and include only a few example verbs, making it difficult to apply the taxonomy to the classification of larger datasets, which may include many other verbs. The set of LOs analyzed in the present work includes 118 unique action verbs, most of which are not included in the example verbs listed in the original and revised lists.3,4 In addition, there are many versions of Bloom's taxonomy in use across higher education, with different rankings for specific action verbs. To increase verb coverage and reduce reliance on any single version of the list, we incorporated the more recent work of Newton et al, who compiled Bloom's ranking information on 401 verbs from 47 different lists, based on 35 textbooks and higher education publications. 6 This allowed for complete coverage of all action verbs found in our dataset, and each verb was given a score based on the average across the 47 individual lists, adding robustness to the cognitive complexity estimate. Nevertheless, action verbs with high variance across the source lists may contribute some uncertainty to the Bloom's level scores used in this study.
Medical Language Dictionaries
The Unified Medical Language System (UMLS) is a project started in 1986 by the National Library of Medicine, with the aim of providing a standard for medical terminology. 7 It includes the metathesaurus, a large integrated thesaurus that includes concepts and terms from a variety of biomedical vocabularies, and the semantic network, a collection of over 100 high-level categories and relationships that organize the concepts. 8 Each LO was submitted via the UMLS online interface to MetaMap, which provides NLP and recognition of bio-medical entities (BMEs) matching entries in the metathesaurus. Specific settings included the 2023AB vocabularies, word sense disambiguation, and the lexicons MESH (MSH), UWDA (Digital Anatomist), SNOMED CT (SNOMEDCT_US), UMLS Metathesaurus (MTH), and ICD-10 (ICD10CM). BME recognition is returned as scored matches to concepts in selected vocabularies, including syntactic category (part of speech of the trigger term) and UMLS semantic category. 9 We then filtered the returned results to retain only trigger words that were nouns or adjectives, because medically relevant concepts should only be contained in these syntactic categories. The base concept and semantic category for each BME were added to the concept list for each LO, and the semantic category “dsyn” (disease or syndrome) was used to identify disease-related concepts for Outcome 3. Each concept was scored as novel on first occurrence, and as repeated on any subsequent appearance later in the curriculum, which formed the basis of the novel proportion scores used for Outcome 2.
The 12 concepts with the highest frequency were removed because they are common words outside the biomedical domain (see file “concepts.csv” in the repository). Two incorrect matches were manually removed: a match of the term “images” to the concept “Intrauterine growth restriction, metaphyseal dysplasia, adrenal hypoplasia congenita, and genital anomaly syndrome,” and a match of “autosomal dominant” to concept “Autosomal dominant multiple pterygium syndrome.” However, this filtering step is not essential because of the large volume of correct concept recognitions. The results remain qualitatively the same even if such spurious concept matches are included.
Curriculum Design Cycles
To analyze curriculum progression over time, the didactic teaching events from 2 years of preclinical courses that integrate discipline-specific content in an organ-systems based spiral curriculum, were grouped into 3 major teaching cycles. Spiral curricula, which periodically revisit material from previous blocks with increasing complexity, have been shown to support retention and integration of knowledge. 10
Cycle 1 comprises around 35 weeks of teaching across 2 courses that colocalize gross anatomy, histology, biochemistry, genetics, nutrition, physiology, neuroscience, and behavioral sciences. Teaching is delivered in an organ systems-based approach that sequences with foundations to medicine, musculoskeletal system, cardiopulmonary renal systems, endocrinology and reproduction, digestion and metabolism, neuroscience and behavioral sciences. The primary focus in Cycle 1 is on normal form and function, with an introduction to the foundational knowledge for every system and case-based clinical examples.
Cycle 2 lasts for 24 weeks, with a change of focus toward abnormal processes, disease/conditions and treatment, and the curriculum cycles back through the systems a second time. It incorporates the disciplines pathology, microbiology, immunology, pharmacology, pathophysiology and clinical skills.
Cycle 3 comprises the final 12 weeks of preclinical teaching, revisiting all organ systems with no new preclinical material, but providing an emphasis on case presentations and integration of multidisciplinary clinical scenarios, with a parallel track in symptom-based case presentations. The curriculum was designed to set foundational knowledge in Cycle 1 and then layer it with more advanced clinical knowledge with integration between disciplines and spaced repetition in the final 2 cycles.
Results
All LOs from all 792 learning activities in the 2-year basic sciences program were included in the analysis. The resulting dataset comprised 6086 unique LOs, containing 6612 fully formed sentences (some LOs contain more than one sentence). For the first outcome, cognitive complexity, the action verb was extracted from each sentence. A total of 6189 action verbs were identified, covering 94% of all sentences. The remaining 6% of sentences did not include an identifiable action verb, for example where list enumerations were used in multisentence LOs. The entire set included 118 unique action verbs, each of which was assigned a numerical Bloom's score as detailed in section “Approach” above. The leftmost panel of Figure 1 shows the mean and 95% confidence interval of the Bloom's scores of all action verbs in each curriculum cycle. There was a highly significant progression towards more complexity, with lower Bloom's scores in Cycle 1 relative to Cycle 2 (mean difference = 0.42, Wilcoxon rank sum test W = 2044567, P < .0001), and lower in Cycle 2 compared to Cycle 3 (mean difference = 0.84, W = 120920, P < .0001).

Bloom's score averages for all action verbs from the learning objectives within each curriculum cycle (left). The bar charts to the right list the top 10 action verbs used in each teaching cycle, as the proportion of all action verbs per cycle.
The 3 right-hand panels in Figure 1 show the top 10 action verbs in each cycle, ranked by their overall proportion of all action verbs in the cycle. This allows a more detailed and qualitative view of LO design. In Cycle 1, the verb “describe” dominates with a proportion of 40%, suggesting most LOs are geared toward acquiring and memorizing factual information, consistent with the curriculum design that designates Cycle 1 largely toward foundational knowledge acquisition. In Cycle 2, the verbs “explain,” “distinguish” and “discuss” become more prominent, suggesting more emphasis on explanations and connections between facts. Finally, a greater diversity of high-level action verbs becomes prominent in Cycle 3, again aligning with the curriculum design goal of connecting prior knowledge and using it for analytical tasks in this final teaching cycle.
The second outcome is the proportion of novel concepts per lecture, which is shown in Figure 2. The left-hand panel shows mean and 95% confidence intervals of novelty proportions per cycle, indicating a clear progression toward fewer novel concepts in the later teaching cycles. This reduction in novelty was also highly significant with Chi-square tests, with a reduction from 30% novel concepts per lecture in cycle 1% to 18% in cycle 2 (χ2 = 390, P < .0001), and a further reduction to just 8% novel concepts in Cycle 3 (χ2 = 181, P < .0001). The density plots in the right-hand panel illustrate that these mean differences arise from a significant shift in the distribution of novelty proportions for most lectures in each cycle, rather than outlier or floor/ceiling effects.

The proportion of novel concepts per lecture, grouped by curriculum cycle. Novelty proportions for each cycle are shown as means and standard deviations (based on binomial Wilson score intervals) in the left-hand plot, and the distribution densities are shown in the right-hand plot.
The proportion of disease-related concepts per teaching activity, as a proxy for clinical content, is shown in Figure 3. The proportions are again summarized as means and confidence intervals in the left-hand plot, indicating a progression toward a higher fraction of clinical content as the curriculum progresses. Teaching activities in Cycle 1 only contained 4% disease-related concepts, which increased to 11% in cycle 2 (χ2 = 295, P < .0001), and increased further to 16% in Cycle 3 (χ2 = 140, P < .0001).

The proportion of disease-related concepts per lecture, grouped by curriculum cycle. Proportions for each cycle are shown as means and standard deviations (based on binomial Wilson score intervals) in the left-hand plot, and the distributions are shown via density plots on the right.
Discussion
The methods detailed above provide several key advantages over traditional curriculum mapping. The action verb Bloom's level quantification demonstrates progression of cognitive complexity in learning tasks, the decreasing proportion of novel concepts over time shows that concepts are being revisited, and the increasing proportion of specific disease terms indicates increasing clinical focus over time. These findings align with recent implementations of spiral curricula in medical education, which demonstrate improved knowledge retention through spaced repetition and iterative revisiting of content at increasing levels of complexity. 11 Furthermore, faculty authoring the LOs were not formally trained on Bloom's taxonomy verb selection, which suggests the action verb complexity gradient is an emergent property of the curriculum's content-based structure, reflecting increasingly complex content over time.
Limitations
Several limitations should be acknowledged. First, this study analyses curriculum data from a single institution; multiinstitutional studies would strengthen generalizability. Second, LOs represent only one layer of curriculum documentation and may not fully capture the breadth of content delivered in lectures, readings, and small-group activities. Furthermore, action verb Bloom's levels are just one source of cognitive complexity in the LOs, the actual content is probably a more significant contributor. Nevertheless, it is reassuring that curriculum design goals are already detected reliably at the level of LOs. Third, BME recognition using MetaMap produces some spurious matches requiring manual verification, and recognition accuracy may vary with writing style and terminology choices. Accurate BME recognition is an evolving field, and we have purposely focused on outcomes that either do not require this step (the action verb analysis), or only use it in a very limited sense (novelty detection, disease term counts). Detailed BME recognition, indexing of relationships between related terms, and semantic vector similarity scoring are more complex methods that future research should focus on, but this is hindered at present by the lack of reliable and standardized BME detection methods.
BME Recognition Methods
MetaMap provided initial quality filtering through part-of-speech tagging, identifying nouns and adjectives as BME candidates while excluding common stopwords. There were only 2 cases of an incorrect match to a syndrome (see the medical language dictionaries section), which is a very low error rate considering that more than 97,000 words were scanned. The UMLS metathesaurus also contains some rather generic terms, like “Two” and “Use of,” which lead to somewhat overzealous concept identifications with questionable biomedical relevance. Such generic terms tend to be quite frequent in the body of the analyzed text, allowing for targeted checking after ordering recognized terms by frequency, and inspecting the high-frequency items closely. We removed the top 12 high-frequency concepts, because they were common words outside the biomedical domain. The specific concept list, including frequency counts, can be found in file concepts.csv in the supplementary repository.
For generalizability across institutions, we suggest a BME recognition quality control approach with a 3-tier verification system: a blacklist of terms known to produce spurious matches (the medical language dictionaries section), a whitelist of validated terms that reliably represent biomedical concepts (via the UMLS corpus, or building on the concept list identified in the current work), and human verification for terms appearing in neither list. This semiautomated pipeline balances computational efficiency with recognition accuracy. We recommend that such exclusion and inclusion lists be developed as shared resources across institutions, promoting standardization and enabling cumulative improvement in BME recognition quality for curriculum analysis applications. To this end, a repository of the underlying computer code and key data tables (blacklists, whitelists, action verbs) have been made available at this github URL. Mixed methods combining automated recognition with human verification may represent an optimal approach for well-defined and terminology domains such as biomedical language at this time.
Generalizability Across Institutions
The methods presented here should generalize to learning materials at other institutions, provided they are written in full-sentence format amenable to NLP. Most medical schools develop LOs based on professional body content outlines—for example, the American Physiological Society content outline for physiology—suggesting reasonable structural similarity across institutions. While curriculum-specific data from other medical schools was not available for this study, the methods pipeline presented here provides a reproducible framework that other institutions can apply to their own curricula to verify alignment with design goals. To aid adoption, a flowchart of the main processing steps from raw LOs to the outcomes is shown in Figure 4.

Schematic overview of the data processing pipeline from raw learning objectives to the final outcomes.
Application to Different Types of Curriculum Information
LOs represent an intermediate level of curriculum description, providing more granularity than course titles but less detail than lecture slides or assigned materials. While they may vary in consistency across instructors, they remain valuable for systematic analysis due to their standardized format and role in student navigation of learning goals. Extending these methods to more fine-grained materials presents additional challenges: lecture slides and bullet-point content often lack the syntactic structure required for traditional grammar-based NLP. Medically focused small language models trained on biomedical literature (eg, BioBERT, PubMedBERT) offer potential solutions, but are limited by research-focused training vocabularies (PubMed abstracts) that differ from educational content (more textbook-oriented). Large language models strike a better balance in recognition coverage, but do not map recognized terms back to standardized base forms (eg, UMLS concepts) reliably. Methods in this area are evolving rapidly; the key principle is clarity about mapping goals and validation approaches, rather than commitment to any single technical implementation. The new curriculum outcome measures presented in this work (action verb analysis, novelty counts, use of specific disease terms) are independent of the underlying BME recognition model, as long as it produces consistent matches of reasonable quality.
An advantage of the current approach is the breadth of curriculum information that can be fed into the analysis. The present work relies on lecture-level LOs, which average about 5 per lecture, hence providing a concise but reasonably detailed narrative of the teaching material. However, the information in lecture slides, associated reading materials and group activities is far more granular. To capture biomedical concepts and their relationships across the curriculum with better resolution, such detailed data sources can be easily incorporated. Future work will aim to include lecture content and reading materials, enabling significantly more granular BME recognition and richly connected concept maps. The growing integration of artificial intelligence and large language models in medical education12,13 suggests that future iterations of this approach could leverage advanced NLP techniques for even more sophisticated curriculum analysis and personalization.
Avenues for More Detailed Biomedical Concept Tracing in the Curriculum
Beyond these broad outcomes across the curriculum, the toolset can also be applied to many other types of specific concept analysis, for example quantitative analysis of specific topics/concepts through the curriculum. The BMEs recognized in each LO can be used as tags, with matching tags across different LOs implying a semantic link. The resulting connectivity matrix can be used to visualize related didactic sessions and LOs across the curriculum via structured graph methods, providing interactive tool for tracing topics and how they are addressed from different angles over time.
The approach and results presented here are the initial steps of applying NLP and biomedical terminology analysis methods to curriculum information, based on LOs. Future work should expand both the breadth and depth of data sources, as well as the methods for analysis and semantic connectivity visualization. The resulting interactive visualization tools could also be used to support student learning, for example, to identify prerequisite concept relationships, anticipate revisitation of foundational concepts in later cycles, and to orient themselves within the broader curriculum structure. As educational data mining continues to evolve with applications in curriculum development and learning personalization, 14 the methods described here provide a foundation for more comprehensive, data-driven curriculum management and continuous improvement.
Conclusions
This study demonstrates the feasibility of systematic, quantitative analysis of medical curriculum content using NLP and biomedical terminology databases. Analysis of over 6000 LOs revealed clear progression in cognitive complexity, decreasing novelty of concepts, and increasing clinical content across curriculum cycles—findings consistent with spiral curriculum design principles. These methods enable objective, scalable verification of curriculum design goals that complements traditional qualitative mapping approaches. The analytical framework presented here can be applied by other institutions to their own curricula and extended to more granular content sources. As methods for biomedical language processing continue to advance, curriculum analytics offers promising opportunities for evidence-based curriculum development, accreditation support, enhanced transparency and personalized learning support in medical education.
Footnotes
Acknowledgments
This work was carried out using several tools and datasets from the NIH UMLS site via a registered account.
Ethical Approval
Not applicable. This study involved analysis of institutional curriculum documents and did not involve human subjects research.
Consent
Not applicable.
Author Contributions
Author 1: conceptualization, methodology, data curation, data analysis, and writing and Author 2: conceptualization, writing, review, and editing.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
