Abstract
This study recognizes the international need for a broadly applicable lifecycle model to facilitate efficient and systematic digital curation. Consequently, it has developed a generic digital curation lifecycle model, titled the d-KISTI model. This model was developed by applying content analysis and thematic coding to data collected through a two-year review of relevant literature, existing conceptual lifecycle models, and empirical investigations of KISTI’s digital curation practice. It was then refined further through consultations with many international digital curation experts. The d-KISTI model presents actions and their relationships with one another that have gone previously unacknowledged in the DCC curation lifecycle model and other existing curation models. These actions and relationships, which are articulated at length within the study, reflect the rapidly changing nature of the global digital curation landscape and offer more representative curation activities to information organizations. Moreover, through its investigation and analysis of KISTI’s digital curation practices, this study contributes to existing literature on digital curation in Korea. Ultimately, the d-KISTI model seeks to optimize digital curation strategies and practices, both within Korea and internationally, and, moreover, hopes to serve as a foundational touchstone for future studies on digital curation.
Introduction
Digital curation and lifecycle models
As the scale and diversity of data being produced have increased exponentially, digital curation, a field of information management, has become increasingly necessary. In defining digital curation, much literature focuses on its acts of digital preservation, data curation, and the management of data and other digital assets throughout their lifecycles (Lee and Tibbo, 2007; Poole, 2016; Yakel, 2007), to protect their integrity, ensure their authenticity, enhance their value, and make them usable in the future (Bhaskar, 2016; Pennock, 2007; Rhee, 2020). Additionally, digital curation involves appraising, selecting, and preserving data and other digital assets for their reproducibility and reuse over their lifecycles (Lee and Tibbo, 2007). Notably, most attempts at a definition assume a lifecycle approach. As such, within this study, digital curation can be thought of as encompassing practices that involve appraising, managing, preserving, adding value to, and making use of data throughout its lifecycle.
Indeed, this lifecycle approach not only is embraced as a central concept of digital curation (Poole, 2015), but also benefits digital curation practices, by representing the workflow, activities, relationships, processes, and stages of the main components of information management systems (Higgins, 2007; Humphrey, 2006), and facilitating data and information services in organizations (Cox and Tam, 2018; Humphrey, 2006; Pennock, 2007). Lifecycle models can also provide frameworks for curation planning, as well as checklists for the development and implementation of digital curation strategies (Harvey, 2010; Higgins, 2007; Humphrey, 2006).
The DCC curation lifecycle model (hereafter “the DCC model”) is arguably one of the most influential curation lifecycle models, being cited as the foundation of other notable models in Choudhury et al. (2020b), Cox and Tam (2018), and Huang et al. (2020), and was developed by the UK’s Digital Curation Centre (hereafter “the DCC”). Since its official presentation in 2008, the DCC model’s usefulness has been evidenced by its impact on subsequent lifecycle models. However, given how rapidly the digital curation landscape has changed since the DCC model’s initial presentation, its modern applicability is dubitable. Higgins (2008: 136), who worked to develop the model and has published two articles and a book chapter on it, unambiguously asserts that the DCC model is “not definitive and will undoubtedly evolve”. Notably, the necessity of updating the DCC model appears to be a contemporary concern, with the idea being presented at the International Digital Curation Conference (see Choudhury et al., 2020a) and published in a journal (see Choudhury et al., 2020b) in 2020. Such an update has become necessary due to the ever-growing “scale and complexity of data” and rapidly changing “context of data production and use, and the associated impact on data archives and repositories” (Choudhury et al., 2020b; Huang et al., 2020: 2).
Research background, purpose, and significance
Despite its widespread adoption elsewhere, digital curation remains an underdeveloped discipline in South Korea. Digital curation within the nation is led by the Korea Institute of Science and Technology Information (KISTI), a government-funded research institute that collects, manages, and distributes scientific and technical data for both the general public and government bodies. Notably, Korea’s first independent department for curation was only established by KISTI in 2018, with the aim of imposing order on KISTI’s previously disorganized curation practices. Rather ironically, it was erroneously named the Content Curation Center (hereafter the “CCC”) for three years, until finally being corrected to the Digital Curation Center in 2021. In its first year, the CCC investigated KISTI’s curation practices and utilized these findings to develop the KISTI curation lifecycle model (hereafter the “former KISTI model”), which was, significantly, intended for use only within KISTI, rather than being more broadly applicable.
In 2019, CCC staff began to note limitations within the former KISTI model: not only was the model’s inception informed by the curation practices of the CCC alone, rather than KISTI’s other departments, but it also lacked sufficient actions and processes. As such, by 2019, KISTI was in need of a new digital curation lifecycle model to not only represent but inform a more systematic and efficient digital curation practice. Furthermore, the COVID-19 outbreak prompted the Korean government to accelerate its digital transformation and placed even greater emphasis on the importance of data management. This was embodied within the Korean Digital New Deal’s emphasis on effective data management and digital curation in June 2020. Consequently, digital curation has become increasingly relevant to a wider range of Korean information practitioners.
Both Korea’s and KISTI’s need for improved digital curation induced this study, which intends to develop a generic digital curation lifecycle model that is able to represent the actions and processes required for efficient and systematic digital curation. This model is termed “the d-KISTI digital curation lifecycle model,” hereafter the “d-KISTI model.” Here, the lowercase “d” references both “digital curation” and “data,” and has been adopted to both allow for easy differentiation from the former KISTI model, and prevent misunderstanding regarding the subject of the model by foregrounding not KISTI as an organization, but rather “data” and “digital curation”. Indeed, this model is intended for use not only within KISTI, but by both domestic and international researchers and practitioners.
Further, through developing the d-KISTI model in the specifically Korean context of KISTI, this study is able to present new knowledge about Korean digital curation. Moreover, KISTI collaborates with diverse national and international organizations, which increases the d-KISTI model’s ability to attract significant national and international attention, which it aims to leverage to further the development of the digital curation field, both within Korea and internationally. On a national level, it will seek to not only inform digital curation practices, but moreover draw domestic attention to the practical benefits of digital curation. More broadly, this study will contribute to global society by providing an internationally applicable model for digital curation that can not only be tailored to a diverse array of information organizations (e.g., libraries, archives, data centers), but also encourage those organizations to become more aware of the actions and processes required for effective and efficient digital curation.
Additionally, while the DCC model remains a foundational touchstone within the global digital curation landscape, this study now seeks to expand upon and learn from criticisms levied at it by utilizing it as the d-KISTI model’s conceptual foundation. Specifically, studies have articulated that the DCC model’s actions and stages were not fully described within Higgins’s (2008) original study (Cox and Tam, 2018). Although Higgins (2012) did later provide descriptions of some of the DCC model’s actions, much still remains unknown about this model’s development, with only superficial information being given about research methods adopted, casting into doubt their scientific rigor. This study will, therefore, provide a more detailed description of both the d-KISTI model’s inception and how various research processes contributed to the development of its digital curation actions. This study assembled representative descriptions of its actions by drawing upon both existing literature and investigations into KISTI’s digital curation practices, utilizing both scientific research methods and empirical investigation in doing so.
Literature Review
Lifecycle approach to data and information management
Many researchers and practitioners across a number of disciplines have studied the lifecycle approach, applied it to data and information management, and proclaimed its advantages, predominantly in facilitating data and information planning and management. Specifically, it serves as a framework for planning and managing all aspects of data, including its treatment in specific domains, throughout its lifecycle (Carlson, 2014; Higgins, 2007; Sinaeepourfard et al., 2016a). Accordingly, the lifecycle approach can help to delineate the necessary stages of data curation and their most appropriate order (Choudhury et al., 2020b; Sinaeepourfard et al., 2016a, 2016b).
More precisely, the lifecycle approach can provide a visual representation of the processes, operations, activities, and relationships that must be conducted at each life stage (Ball, 2012; Faundeen et al., 2013; Kowalczyk, 2017). This facilitates the proper sequencing of all required stages and definitions of the relationships between stages (Choudhury et al., 2020b; Higgins, 2008; Pennock, 2007). Moreover, such visualization can help digital curators to map out the tasks, requirements, challenges, and issues encountered in data management and curation (Pouchard, 2015; Sinaeepourfard et al., 2016a).
The implementation of a lifecycle approach yields positive results in a number of more specific fields and contexts. Firstly, it benefits the preservation of data and information by providing a useful framework for not only the concept of preservation, but also its causes and barriers (Beagrie, 2006; Kowalczyk, 2017), by accounting for the “digital preservation requirements at each stage in the lifecycle of a digital object” (Wheatley and Hole, 2009: [4]). Further, this approach enables calculations regarding the cost of preservation (Davies, 2008).
Secondly, the lifecycle approach provides specific advantages to data and information services. It facilitates, for instance, the planning, maintenance, explanation, and support of services in organizations (Cox and Tam, 2018; Humphrey, 2006; Pennock, 2007). Additionally, it can be applied as a framework to “contextualize and communicate what kinds of data services could be provided to whom and when” (Carlson, 2014: 63).
Thirdly, the lifecycle approach can strengthen relationships and enable deeper collaboration between the stakeholders of data and information by identifying stakeholders involved at different stages and the significance and nature of relationships between those stakeholders. This approach can also function as a framework for providing strategic guidance to stakeholders at different stages of the lifecycle and, thus, as a means of determining which tasks stakeholders should complete (Beagrie, 2004, 2006; Beagrie and Greenstein, 1998; Pouchard, 2015).
In essence, the approach comprehensively ensures the continuity, authenticity, reliability, integrity, accessibility, and usability of data (Higgins, 2007, 2008; Pennock, 2007).
Despite these advantages, several researchers and practitioners have argued that developing and managing data and information lifecycles presents a range of challenges, namely the creation of additional institutional structures and actions that may place unwelcome strain on already limited resources. A lifecycle approach, for example, necessitates considerations regarding what data should be preserved and for how long that many organizations might otherwise have sidelined to focus more on the practical realities of determining staff duties, resource allocation, and funding (Lynch, 2008). Similarly, organizations might not be used to documenting decisions regarding data at each stage of its lifecycle (Borgman, 2007; Higgins, 2008; Kowalczyk, 2017).
Further, the effective and streamlined application of a lifecycle model will often require organizations to provide a broader institutional framework for their curatorial staff and educational resources on the specialized knowledge, methods, tools, standards, and skills, including the capability to transform data and add checksums to datasets, all of which comes with an associated additional workload (Cox and Pinfield, 2014; Faundeen et al., 2013; Heidorn, 2011). Lifecycle models necessitate considerable input and buy-in from stakeholders, including data creators and curators, who must take responsibility for data at different stages of the lifecycle, which again may be too difficult for certain organizations to realistically achieve (Pennock, 2007).
Although researchers and practitioners may debate the specific advantages of and drawbacks to the lifecycle approach, the simple fact that the approach has nevertheless been popularly implemented in data and information management suggests that its advantages outweigh its disadvantages (e.g., Faundeen and Hutchison, 2017; InterPARES 2 Project, 2007: UCL and British Library, [2021]). Many cases of the successful application of the lifecycle approach appear in literature pertaining to data and information science (e.g., Faundeen and Hutchison, 2017; Pouchard, 2015; UCL and British Library, [2021]).
Curation lifecycle model
A globally popular and influential curation lifecycle model is the DCC model, which provides a graphical high-level overview of the actions and stages required for successful digital curation, showing the connections and relationships among stages, actions, and objects of digital curation (see Figure 1). The model is composed of multiple rings around a core circle that represents the data itself. The rings that surround this data represent three sets of actions: full lifecycle actions, sequential actions, and occasional actions (Higgins, 2007, 2008, 2012).

DCC Curation Lifecycle Model.
Beyond what is self-evident within the DCC’s visualization of the DCC model, as shown in Figure 1, the model itself notably has a greater emphasis “not on the life of data from the researcher’s point of view, but on its preservation,” (Cox and Tam, 2018: 150) with a disproportionally large number of actions pertaining to acts of “preservation”. More specifically, although curation includes or necessitates preservation, the DCC model notably foregrounds the role of preservation in the title of the curate and preserve action, rather than opting for a simpler title, like curate. Similarly, the title preservation planning indicates that this model sees its core goal as “preservation” rather than “curation”. This somewhat skewed nature of the DCC model stems from the fact that the model was heavily influenced by the Open Archival Information System (OAIS) Reference Model (hereafter the “OAIS reference model”), which explicitly foregrounds “Long Term digital information preservation and access” over curation (ISO, 2012: 1).
Additionally, factors influencing digital curation—particularly, the development of technology, data science, and the data-intensive environment—have continued to change and evolve rapidly since its inception in 2008. Notably, just one year after the DCC model was published, Constantopoulos et al. (2009) proposed an extended digital curation lifecycle model, the DCC&U curation lifecycle model (hereafter the “DCC&U model”), achieved by combining the DCC model with the digital curation processes of the Digital Curation Unit (hereafter the “DCU”) of Athena Research Centre (see Figure 2).

DCC&U Curation Lifecycle Model.
Although the DCC&U model still has three actions relevant to “preservation”, it has added a user experience sequential action that explicitly foregrounds the role of the user. This reflects the DCC&U model’s acknowledgment of tools that track user experience (e.g., Web 2.0, social tags) and the necessity of understanding how user experience informs data access and use. Moreover, the DCC&U model refined and stratified the curate and preserve action of the DCC model within its new preservation, curation, and knowledge enhancement actions. Perhaps the most novel of these three actions, knowledge enhancement involves generating “new knowledge about the real-world entities, situations and events represented by digital resources” by employing Semantic Web technologies (Constantopoulos et al., 2009: 39). This action derives from the fact that the DCU’s digital curation processes consider “contextual information resources as an object of curation” (Constantopoulos et al., 2009: 39) and, therefore, is likely less relevant to other models curating different sorts of objects. Moreover, this same action often occurs implicitly within other curation actions such as the description and representation information and transform actions and is, consequently, rarely found in other models.
When developing the former KISTI model, which was intended to monitor and reflect KISTI’s curation practices contemporary to the creation of the model in 2018, its project team was influenced heavily by both the DCC and the DCC&U models (see Figure 3). Indeed, the former KISTI model was more influenced by the DCC&U model than the DCC model, indicating the DCC&U model’s iterative improvements to the foundation provided by the DCC model.

[Former] KISTI Curation Lifecycle Model.
The sequential actions of the former KISTI model are the same as those of the DCC&U model, with the former KISTI model representing the knowledge enhancement action of the DCC&U model as enhancement. More significantly, this model expands upon the description and representation information action of both previous models of the DCC and the DCC&U models through its acronym “SEMANTIC”, which integrates the semantics of the DCC&U model within a specialized completive descriptive action tailored to the specific curatorial needs of KISTI.
Inversely, the former KISTI model reflects a newfound focus on “curation” rather than “preservation” by retitling the preservation planning action as curation planning. Additionally, it expands the remit of the original community watch and participation action within its stakeholder observation action, which encompasses all stakeholders relevant to KISTI’s curatorial practices.
Despite the significant modifications made to previous models, the former KISTI model is also inherently limited. Upon the completion of the model, KISTI staff members retroactively discovered that some actions conducted contemporary to the creation of the model, including migrate and reappraise, had been missed. Further, the model was intended for use only within KISTI and based on the organization’s then inconsistent and not fully investigated digital curation practices. Consequently, KISTI quickly found itself in need of a new digital curation lifecycle model that would be able to guide optimal digital curation practices, both within the organization and Korea as a whole.
More broadly and on an international level, given that most modern curation models take their inspiration from the DCC model and the OAIS reference model in turn, they tend to include a disproportionally large number of actions pertaining to acts of “preservation” despite being presented as “curation” models. As is implicit within its name, a curation model ought to foreground the act of “curation” as much, if not more, than that of “preservation.” Much existing literature seeks to achieve this by establishing “preservation” as an inherent subsection of “curation”. Yakel (2007), for instance, writes that digital curation is “the active involvement of information professionals in the management, including the preservation, of digital data for future use.” Notably, “preservation” is here classified as a subsection of “curation.” Yakel goes on to make this relationship explicit, presenting “digital curation [as] an umbrella concept that includes digital preservation, data curation, electronic records management, and digital asset management” (2007: 335).
Yakel is by no means alone in conceptualizing digital curation and preservation as such. Lee and Tibbo (2007) state that digital curation involves “a commitment to long-term preservation.” Further, Poole defines digital curation as embracing “digital preservation, data curation, and the management of assets over their lifecycle” (2016: 962). Abbott (2008) similarly defines digital curation as “the management and preservation of digital data over the long-term.” Notably, Constantopoulos and Dallas define “digital preservation” as a “necessary condition for achieving the objectives of digital curation,” and include “preservation” as one of digital curation’s processes (2016: 2). The National Research Council (2015) also states that digital preservation is just one of many aspects of digital curation. Indeed, the California Digital Library even went so far as to change the name of its Digital Preservation Program to the UC Curation Center (hereafter the “UC3”), in recognition of its mission to address the “broader terms of digital curation, rather than preservation” (2010: 1). In simple terms, the fact that the DCC’s Digital Curation model includes acts of “preservation” indicates that “curation” is a larger concept than that of “preservation”. The persistent conceptualizing of digital curation as an “umbrella term” that encompasses preservation indicates a dominant trend within the field that consequently ought to be represented in models that focus on digital curation rather than preservation.
Research Methodology
Data collection
This study utilized both conceptual and empirical approaches. Conceptually, it reviewed three curation lifecycle models and fifty-four lifecycle models, which were discovered by searching for “curation lifecycle,” “data lifecycle,” and “information lifecycle” on Library and Information Science Abstracts, Library & Information Science Source, and Google Scholar. After conducting this conceptual research, this study ultimately selected the DCC model, despite the shortcomings listed above, as the conceptual foundation of the d-KISTI model for the following reasons. Firstly, as evidenced in the “Curation lifecycle model” section, the DCC model has significantly impacted both the theoretical and practical development of digital curation (Choudhury et al., 2020b; Cox and Tam, 2018; Huang et al., 2020). Further, many organizations around the world have adopted, adapted, or applied the DCC model (e.g., Constantopoulos et al., 2009; HDML model, [n.d.]; O’Donoghue and van Hemert, 2009). Beyond using the DCC model as its foundation, this study consulted other relevant models including the OAIS reference model. Despite not being a curation model and having informed the predominance of “preservation” in the DCC model, the OAIS model remains an international standard (originally ISO 14721:2003; now revised to ISO 14721:2012), which accounts for its inclusion in this data collection phase. Empirically, this study implemented document analysis, interviews, and participant observation to investigate digital curation practices at KISTI.
This study chose to focus on KISTI’s practices for the following three main reasons: firstly, KISTI leads the information and data field in Korea; secondly, this study is conducted and authored by a KISTI staff member, who was able to facilitate data collection and analysis within the organization; finally, KISTI is the only Korean organization with an independent department for digital curation, which is partially why its digital curation practices seem more developed than those of other Korean information organizations.
Given the amount of data curated by KISTI each year, this study was unable to thoroughly investigate every data type. Given that Korea’s Regulations on the Management of National Research and Development (R&D) Projects have assigned KISTI as the only manager and distributor of national R&D data, data produced from national R&D projects is crucial to both KISTI and the nation and was, therefore, selected as the data investigated throughout this study.
To deepen its understanding of KISTI’s digital curation practices, this study collected manuals, reports, presentation slides, project proposals, information system diagrams, and guidelines produced by KISTI from 2019 to 2020. Through analysis of these resources, 12 interviewees were selected, realistic and customized interview protocols were created, and questions were tailored to the interviewee’s curation role and tasks.
The 12 interviewees selected were all KISTI staff members involved in the curation of national R&D data from 2019 to 2020, many of whom had been interviewed in early data collection efforts for the development of the former KISTI model in 2018. Given that KISTI improved its digital curation practices following the former KISTI model’s launch in 2019, this study sought to build upon pre-existing interview data from 2018 by conducting further interviews in 2019 and 2020 to identify developments in KISTI’s digital curation practices and confirm findings from the 2018 interview data, whenever further insights were needed. Nine of the interviewees worked within the Content Curation Center, two within the National Science & Technology Information Service Center, and one within the Convergence Service Center. They were all first interviewed in 2019, and some later being asked further questions in more informal contexts to supplement interview data in 2019 and 2020.
Alongside and outside of these interviews, this study also observed participants and KISTI’s digital curation practices to further determine patterns of curation activities and contextual factors that might impact interview responses. Through these observations, which were recorded in the form of observation notes in 2019 and 2020, this study was able to gain further insights into changes in KISTI’s digital curation practices over time, as well as relevant interactions among KISTI staff members.
Data analysis
Data collection and data analysis were conducted concurrently throughout this study, with collected data being analyzed in an iterative coding process throughout the data collection timeline. The data analysis in this study involved content analysis and thematic coding with the assistance of NVivo12, software that helps researchers “develop a model from a tentative conceptual framework” (Smyth, 2006: 137).
In this context, the DCC model became the tentative framework for initial coding; within NVivo12, components representing curation actions formed preliminary categories. Within the coding structure of NVivo12, these categories were represented as tree nodes, with each having four child nodes (literature review, observation, document, interview).
After creating the preliminary structure, this study compared each lifecycle structure gathered from the literature review and KISTI’s data in turn to the nodes of the structure in an iterative coding process via NVivo12, which facilitated the incorporation and importation of data. More specifically, relevant text on lifecycle models from the literature review, KISTI’s documents, observation notes, and interview transcripts was imported into the most relevant corresponding node of NVivo12. At the same time, node names in the preliminary structure were changed accordingly. For instance, a part of an interview transcript about KISTI’s description practice was imported into “interview” node under “description, identification, and linkage” node, the name and role of which had changed from the DCC model’s original “description and representation information” node.
Significant changes in the names and structures of nodes include the deletion of DCC model components that were not supported by other lifecycle models and empirical data, alongside the introduction of new components found in other lifecycle models or empirical data. Whenever a digital curation action was given multiple names across various lifecycle models, the most popular name was selected as the according node name. The final nodes formed the components of the d-KISTI model. Given that the coding structure evolved iteratively before stabilizing, several versions of this model were produced over the course of the study. This paper refers to these variations as “development versions,” and uses “the d-KISTI model” to refer only to the final version of the model.
Both development versions and the final version of the d-KISTI model were refined through a variety of means for over two years. They were presented at meetings, workshops, and conferences both domestically and internationally, and reviewed in-person and via email by approximately 70 worldwide digital curation experts, including both researchers and practitioners, from Australia, Canada, Korea, Poland, South Africa, UK, US, etc. More specifically, in 2019 at a KISTI-UK DCC workshop, two UK DCC members gave feedback on a development version of the d-KISTI model; in 2021, a broad range of experts was consulted and provided positive feedback, prompting this model’s finalization.
Further, this study took several steps to limit potential bias and its impact on findings. Firstly, the author of this study chose to collect and analyze data by herself rather than co-working with other KISTI staff or researchers to avoid being influenced by the institutional and personal desirability of participants of this study, an element of bias that she had observed when working alongside the project team that developed the former KISTI model. Equally, she, herself a KISTI staff member, also remained aware that her role could potentially influence her collection and analysis of data. To ensure her objectivity, she employed triangulation (interviews, participant observation, and document analysis). Moreover, to prevent KISTI’s curation actions having a disproportionate impact on the final components of the d-KISTI model, she consulted and analyzed existing lifecycle models and curation actions by reviewing relevant literature alongside her work with KISTI’s data. Additionally, given that this study was conducted in an institution in a nation, it was also aware of the potential of cultural bias. Consequently, to counter such potential bias and enhance the validity of this study, the author of this study received feedback on both development and final versions of the d-KISTI model from experts based in many different countries.
Research Results
This study presents a new digital curation lifecycle model, the d-KISTI model, which is a high-level graphical overview of the actions required for efficient and systematic digital curation, and their relationships to one another (see Figure 4). These actions fall into three categories: full lifecycle actions, sequential actions, and occasional actions. “Full lifecycle actions” last for the full lifecycle of the data and are represented in the model by five concentric ovals around the occasional and sequential actions, with blue arrows pointing inwards, indicating the activities of these full lifecycle actions inform all other actions. These actions are presented in no specific order, with the shade of blue used in each differing only to facilitate visual differentiation. “Sequential actions” are actions conducted in a specific order during the lifecycle of the data and represented by green block arrows. “Occasional actions” are only taken under specific conditions and represented by rectangles connected to sequential actions by dotted line arrows. The actions present in the d-KISTI model comprehensively represent the processes and stages of digital curation.

d-KISTI Digital Curation Lifecycle Model.
The following subsections describe each digital curation action in the d-KISTI model in detail, encompassing the concept and scope of each action and how such actions are supported by existing models, reviewed literature, and empirical data from KISTI. These descriptions are supported by comparative tables that present the similarities and differences in the DCC, DCC&U, former KISTI and d-KISTI models’ approach to curatorial activities (see Appendix 1). Parts of KISTI’s digital curation practices on national R&D data are briefly described within these subsections to provide examples of how each action is conducted in the field. Significantly, these are not intended to limit the applicability of the model, but rather to enable a deeper understanding of the concept and scope of each action. Given that this study aims solely to develop a new lifecycle model, rather than comprehensively discuss and appraise KISTI’s digital curation practices, the following subsections neither are exhaustive, nor present supporting text from the literature review, KISTI documents, observation notes, or interview transcripts. A more thorough description of KISTI’s curation of national R&D data will follow in a subsequent paper.
Full lifecycle actions
“Full lifecycle actions” are actions that organizations must conduct consistently throughout the lifecycle of data. They include curation planning and management; description, identification, and linkage; stakeholder observation and collaboration; user and use investigation; and technology watch. These actions are detailed in the subsections that follow.
Curation planning and management
Curation planning and management encompasses not only planning the comprehensive curation, administration, and management of data throughout the curation lifecycle, but also reviewing, modifying, and carrying out these plans. It encompasses and expands upon the concepts of preservation planning and curate and preserve from the DCC model. The preservation planning action included within the DCC model is widely considered a core action in digital management, curation, and preservation (e.g., California Digital Library, 2010; Higgins, 2008; Strodl et al., 2007). Moreover, many data lifecycle models include planning (e.g., Ball, 2012; DataONE, 2019; Pouchard, 2015). The d-KISTI model borrows from the former KISTI model in reframing preservation planning as curation planning, to better center the d-KISTI model’s core goal of curation, of which preservation is seen as a component activity.
The d-KISTI model deviates from the foundations of the DCC model by including “curation management” within this action rather than separating it out into a distinct action, as was the case with the DCC’s curate and preserve action. This change was informed by KISTI’s empirical data, which indicated that the administrative tasks of planning and management were deeply interrelated within KISTI. The organization generated just one plan that collectively identified, planned for, and managed staff members, budgets, and information systems before it embarked upon the curation of national R&D reports and journal articles. It also continuously monitored and updated the plan, performing administrative and managerial tasks accordingly.
This empirical data, coupled with reviewed literature and existing lifecycle models, has informed this action’s core activities:
gaining organizational agreement on the need for digital curation
identifying and preparing human resources, budget, and facilities necessary
assigning roles and responsibilities to staff members
developing a holistic digital curation plan
implementing, monitoring, and updating this curation plan
carrying out the administrative and managerial work necessary to accomplish this curation plan.
Description, identification, and linkage
Description ensures appropriate control of data during its lifespan. Similarly, identification is facilitated by description and provides a unique key, which allows data to be found and linked to other related data through acts of linkage. In conjunction with one another, these three activities enable an organization to manage its data over time, create new datasets, and curate content by, for instance, linking different sets of data regarding a specific event to each other for clarity and ease of access. This action is, therefore, related to the transform action (see “Transform” subsection). This relationship is supported by the OAIS reference model where the “Representation Information” plays an important role in “Transformations” and additional “Associated Description” sometimes needs to be provided for data and datasets that are transformed (CCSDS, 2012). Similarly, the UC3 micro-services include the “Transformation” service which provides “a means to transcode digital object representations (that is, sets of files) from existing forms to newly required forms” (California Digital Library, 2010: 16).
Furthermore, the d-KISTI model is influenced by the DCC&U model, in which the importance of linkage is emphasized, with the DCU stating that “linking documents to other documents that support or contradict them” is “knowledge enhancement” (Constantopoulos et al., 2009: 40).
The activities included within these models are supported by empirical data from KISTI, which indicates that the three components of the description, identification, and linkage action are directly related to each other, and this relationship also applies to national R&D data. KISTI has made many efforts to accurately and effectively describe, identify, and link its data. As a result, KISTI’s description, identification, and linkage activities and processes are relatively complex and well-developed compared to those of other Korean information organizations. Although the former KISTI model characterized these actions within its “SEMANTIC” acronym, this was deemed too specific to KISTI’s curatorial needs, which prompted the genesis of a broader term to describe this action, which encompasses the former KISTI model’s SEMANTIC description, while also providing space for different information organizations to adopt their own description, identification, and linkage practices as best suit them.
This empirical data, coupled with examinations of the DCC model and reviewed literature, has informed this action’s core activities:
developing, documenting, and implementing policies for description, identification, and linkage
identifying what information about data should be provided in order to enable future users to understand the data
creating descriptive, administrative, structural, and preservation metadata with appropriate standards
determining and using identifiers that will be utilized in an organization
determining how data will be linked to other information resources (e.g., datasets, journal articles, books)
automating the processes as much as possible.
Stakeholder observation and collaboration
Stakeholder observation and collaboration signifies an information organization’s observations and identifications of its stakeholders (e.g., upper-level organization, publisher, funding organization), their needs, and the relations between those stakeholders and its own curation actions. Many studies indicate that stakeholder collaboration is necessary and useful for digital curation and digital preservation (e.g., Day, 2008; Latham and Poe, 2012; Macdonald and Martinez-Uribe, 2010). Collaboration is “one of the keys to effective curation” and “in fact, firmly embedded in digital curation practice” (Oliver and Harvey, 2016: 96). Ultimately, stakeholder observation and collaboration in the d-KISTI model extends the concept of community watch and participation from the DCC model, in acknowledgment of the fact that some communities, such as the digital preservation community, serve as both stakeholders and participants of digital curation projects.
The d-KISTI model acknowledges that some users are also stakeholders by paralleling the structure of both the stakeholder observation and collaboration action and user and use investigation action, which will be discussed below. In KISTI’s case, for example, the organization values close collaboration with government agencies, scholarly societies, and researchers, all of whom function as both stakeholders and users. As such, both this action and the user and use investigation action below comprise activities of observing, receiving, and responding to user feedback.
This empirical data, coupled with examinations of the DCC model and reviewed literature, has informed this action’s core activities:
identifying stakeholders relevant to digital curation and relations with them
collecting information about stakeholders’ digital curation activities
sharing digital curation knowledge with stakeholders
observing and collaborating with stakeholders to maintain up to date digital curation practices by staying abreast of rapidly developing technologies
determining the specific nature of each stakeholder’s role in any given collaboration
collaborating with digital curation communities to develop standards, tools, and technologies for digital curation.
User and use investigation
User and use investigation encompasses activities that investigate and monitor the interactions between users and organizational resources (e.g., data, curators, information systems), and it extends to include reflecting upon and applying the results of these investigations to enhance digital curation. Many studies have stressed the necessity of information organizations investigating users and uses to create effective digital curation and data lifecycle models (e.g., ANDS, 2013; Beaujardière, 2016). Case studies from the Institute of Museum and Library Services, for instance, show that effective digital curation requires an analysis and understanding of user needs and behaviors (see Lee et al., 2016). Moreover, the OAIS reference model, which lists consumer interaction as an entity in the OAIS environment, notes that there are a variety of interactions between the consumer and the OAIS reference model. In addition, its preservation planning entity includes a monitor designated community function (CCSDS, 2012), all of which points to the importance of an ongoing investigation of multiple user interactions.
The user and use investigation action expands upon the concept of user experience present within the DCC&U model. In this earlier model, user experience can be defined as “the interaction between users and resources, as well as the effects of this interaction” (Constantopoulos et al., 2009: 40). Here, it is presented as a sequential action positioned after access, use, and reuse, on the assumption that an organization captures user experience data after users access, use, and reuse resource.
In contrast, however, KISTI’s empirical data indicates that organizations consider and investigate not only actual users after they have accessed, used, and reused resources, but also potential users and their information needs. While KISTI does investigate the users of its national R&D reports and journal articles by conducting user studies, including surveys and focus group meetings, more significantly, it also utilizes the results of the investigations to improve user satisfaction, its user-centered services, and its own information systems across the lifespan of its data. In recognition of this empirical evidence, the d-KISTI model proposes situates this action as a full lifecycle action.
Though the DCC model does not include user and use investigation, in the UK DCC’s review of a development version of the d-KISTI model, it welcomed the addition of user and use investigation to the d-KISTI model and is keen to explore whether the activity could be integrated into its own model (DCC, 2019).
KISTI’s empirical data, coupled with examinations of the DCC model and reviewed literature, has informed this action’s core activities:
preparing methods, tools, and channels that can investigate an organization’s own users and their use of data
identifying both current and potential users
investigating and monitoring users’ information needs, information-seeking behavior, information use, and user feedback, through annotation, social tags, Web 2.0, user studies etc.
applying the results collected from investigating users and use to digital curation practices.
Technology watch
Technology watch encompasses activities that monitor changes in technology and that involve adaptations to keep up with such changes. Many studies assert the necessity of monitoring technologies, given their significant impact on digital curation, archiving, and preservation (e.g., Lee and Tibbo, 2007; Rosenbaum, 2011; Yakel et al., 2011). The Generic LIFE Preservation Model contains technology watch as a main cost element of digital preservation costing (Davies, 2008; McLeod et al., 2006). The preservation planning entity of the OAIS reference model includes a monitor[ing] technology function to track emerging technologies (CCSDS, 2012).
In acknowledgment of the fact that technological development is rapid, ongoing and directly influences digital curation, the d-KISTI model includes technology watch as a full lifecycle action. Moreover, empirical data from KISTI suggests that, in order to remain attentive to rapidly developing technologies, the institute engages in many activities that bring it into close contact with its stakeholders, including attending workshops and conferences, and communicating and collaborating with experts (e.g., information technologists) and communities (e.g., digital preservation communities) from relevant fields (e.g., science and engineering). Hence, technology watch is related to and can consequently foster stakeholder observation and collaboration.
Similarly, such activities will also introduce organizations to new file formats that may offer improvements to existing file formats being utilized. As such, technology watch is often leveraged to provide insights into emergent file formats that either are becoming the industry standard or provide improved safety or stability of data. Consequently, technology watch informs the preserve action and can trigger the transform and migrate actions.
This empirical data, coupled with examinations of the DCC model and reviewed literature, has informed this action’s core activities:
monitoring rapidly developing technologies
preparing methods to advance the adoption of new technologies within the organization
accepting and implementing new technologies to enhance digital curation practice
collaborating with other organizations and stakeholders to track and adopt new technologies.
Sequential actions
Sequential actions are conducted in the following specific order during the lifecycle: conceptualize; create and/or collect; appraise and select; ingest; preserve; store; access, use, and reuse; and transform. If organizations have already completed the activities contained within earlier sequential actions in their independent digital curation practices, they can still adopt the model by starting with the most relevant sequential action.
These actions and their titles were derived from the synthesis of common or shared components from several lifecycle models and are accordingly based on the usual functions and processes of information organizations (e.g., DataONE, 2019; Higgins, 2008).
Conceptualize
Conceptualize encompasses the conception and planning of the creation and/or collection of data, with curation processes and outcomes in mind. The remit of this action is expanded from the DCC model’s conceptualise, in order to facilitate digital curation actions at later stages of a curation lifecycle model.
Although Higgins’s paper (2008) on the DCC model only briefly mentions the conceptualize action in an abstract capacity, KISTI’s empirical data shows the material need within an organization to conduct diverse conceptualization activities. More specifically, KISTI delineated the concepts of national R&D reports and journal articles, and planned methods and processes by which they might be collected from relevant stakeholders.
Additionally, KISTI planned and developed two information systems to collect curation-friendly file formats of reports and journal articles from national R&D projects. This indicates that conceptualize may, in turn, rely on information gleaned within the ongoing technology watch full lifecycle action to establish a plan or structure for future best-practice transform or migrate activities.
This empirical data, coupled with examinations of the DCC model and reviewed literature, indicates that activities within this action should include:
determining the most effective standards and methods for both creating and collecting data
establishing safe and secure storage options
both developing new information systems and establishing the functionality of existing information systems, to be used by data creators, curators, and users
delineating curation-friendly file formats
calculating costs for creating and/or collecting new data.
Create and/or collect
Create encompasses the production of new data and its associated administrative, descriptive, structural, and technical metadata. Collect encompasses the acquisition of data, in accordance with legal regulations and institutional collection policies, from data creators (e.g., researchers, scholarly societies, data centers), which is referred to within some lifecycle models under different names, such as “acquire” and “obtain” (e.g., Faundeen et al., 2013; FGDC, 2010; Pouchard, 2015). Create and/or collect shifts the focus from passive to active data collection activities by foregrounding “collecting” rather than “receiving” data. In doing so, it proposes that curators may both create and collect data.
At this stage of the lifecycle, the appropriate metadata is assigned to the collected data. In both cases, data should be either collected or created within standardized file formats and types determined by the organization during the conceptualize action, to best prepare for the ingest, transform and, potentially, migrate actions.
Although KISTI’s empirical data indicates that KISTI creates and collects national R&D reports and journal articles, KISTI does not conduct other activities encompassed within this action, such as developing and documenting policies on creating and collecting data. In addition, KISTI staff members independently create national R&D reports and journal articles, although these are processed through the same channels utilized by external researchers, and therefore require no additional collect activities.
Although KISTI’s create and/or collect needs may not be representative of most information organizations, when viewed in conjunction with reviewed literature, they still indicate that core activities that should be conducted at this stage include:
establishing, documenting, and applying policies on creating and collecting data
creating and collecting data and digital objects in standardized formats and file types
establishing and implementing standards for content, syntax, and structure of data
determining and conducting the procedure of collecting data
defining and documenting the ownership of new data copyright when it is created and collected
establishing a data quality assurance process.
Appraise and select
Appraise and select comprises the evaluation and selection of data to be curated in the long term. The act of determining which data will be curated by the organization must be carried out in line with the organization’s documented policies, criteria, or legal requirements. In comparison to the DCC model’s appraise and select, which relates only to preservation action via reappraise, the d-KISTI model’s appraise and select is related to both preserve and access, use, and reuse via reappraise.
According to KISTI’s empirical data, KISTI is legally required to collect all reports and journal articles from national R&D projects. In this specific instance, it does not need to engage with appraisal and selection activities or document the policies and criteria employed within the appraise and select action. Despite this, other data both within KISTI and in other information organizations will still need to be appraised and selected.
Furthermore, reviewed literature and the DCC model all stress the growing necessity for appraisal and selection, given the exponential growth of data being produced (e.g. Bhaskar, 2016; Niu, 2014; Whyte and Wilson, 2010). As such, the d-KISTI model contains the appraise and select action, which encompasses the following core activities:
identifying what data should be held and for whom
developing, documenting, and applying policies and criteria for data appraisal and selection
determining whether to hold data by evaluating it against appraisal criteria
determining the retention period of selected data.
Ingest
Ingest comprises the activities involved in preparing and then adding data to the curating organization’s database, information system, or digital archive, in line with the organization’s policies, procedures, and processes. Ingesting data into an information organization’s managed environment is a prerequisite for effective curation (Harvey, 2010); accordingly, ingest is one of the six functional entities in the OAIS reference model. Moreover, many data and information lifecycle models include ingest activities, although some represent these activities using the term “processing” (e.g., Beaujardière, 2016; USGS, 2014).
During ingestion, KISTI converts collected MS Word and Hangul Word Processor (HWP) files into PDF and XML formats, according to KISTI’s rule on normalizing file formats. This activity provides a practical example of the file format conversions that were proposed in the ingest functional entity within the OAIS reference model (CCSDS, 2012). This supports Oliver and Harvey’s claim that transformation activities may occur during the ingestion process (2016: 153). As such, within the d-KISTI model, the sequential actions of ingest and transform are viewed as tightly connected to one another and with the occasional action, migrate, which is reflected in the language surrounding the discussion of their activities (see “Transform” and “Migrate” subsections). In contrast, the DCC model does not contain this connection.
This empirical data, coupled with examinations of reviewed literature, has informed the main activities contained within the d-KISTI model’s ingest action:
listing data to be transferred
assuring data quality
cleaning data
checking whether data is in the required format and medium
checking whether the metadata records are accurate
assessing whether the transform or migrate actions must take place.
Preserve
Preserve represents processes that ensure the long-term preservation and retention of data. The DCC model features both preservation action as a sequential action, and preserve as a full lifecycle action. In contrast, the d-KISTI model chose to situate preserve as a sequential action only, which is broadly parallel to the sequential preservation action of the DCC model. The d-KISTI model chose to not include a preserve full lifecycle action, as it instead relocates many of the activities contained within this action in the DCC model to its curation planning and management full lifecycle action. This change was made to place more focus on the act of curation than preservation, with preservation being conceptualized as a component of healthy curation.
Empirical data collected from KISTI reports that it does not have preservation policies, assign preservation metadata, or determine or document the preservation period of data. Although KISTI does not have established structures of preservation in place, it does clean and validate data and ensure acceptable data structures or file formats. This lack of established preservation activities within KISTI may be attributed to a lack of national legislature on this front: the Korean government’s Regulations on the Management of National R&D Projects does not specify how long KISTI or any other information organizations must preserve national R&D reports and journal articles. That being said, in interviews, KISTI staff members believed that KISTI had to continuously keep these documents, and expressed that they had not thought about the duration of this preservation.
Core activities of the preserve action in the d-KISTI model are largely derived from a comparative analysis of existing models, empirical data collected from KISTI, and reviewed literature. They are designed to ensure that the ingested data remains authentic, reliable, and usable, and simultaneously maintain its integrity, and include:
developing, documenting, and applying preservation policies
ensuring acceptable data structures or file formats
updating existing preservation metadata
adding high-quality new preservation metadata
assessing and managing risk.
Store
Store encompasses activities that securely save data by adhering to relevant standards. While some data and information lifecycle models combine store and preserve into only one action (e.g., DataONE, 2019; USGS, 2014), many, including the DCC model, contain both. The d-KISTI model follows the lead of the DCC model given that, as has been evidenced in much literature, the store and preserve actions encompass different activities, and notably have significantly different definitions (e.g., Merriam-Webster, 2022a, 2022b; Oxford University Press, 2022a, 2022b). Despite this, within the d-KISTI model, the store and preserve actions remain highly interrelated, with activities within the store action potentially triggering a return to the preserve action.
Additionally, KISTI’s empirical data indicates that the organization conducts mainly store activities, with the lack of developed preserve activities in place. According to KISTI’s empirical data, KISTI’s store activities are mainly conducted by KISTI’s Department of Information System Management, which ensures systemic and physical security, and maintains technical infrastructure and data recovery procedures. Copies of national R&D reports and journal articles and their associated descriptive information are securely stored, in line with relevant standards, in two geographically-distributed backup systems away from KISTI’s headquarters.
This empirical data, the DCC model, and reviewed literature inform the core activities contained within the d-KISTI model’s store action:
regularly checking the integrity of stored data and its description
documenting data storage and storage location
backing up data
monitoring instances, such as file corruption, that might trigger a return to the preserve action
developing and implementing, if necessary, a data recovery procedure and manual in preparation for disasters (e.g., earthquake, flood, fire).
Access, use, and reuse
Access, use, and reuse encompasses activities that ensure data is accessible to authorized customers for use and reuse. Although both the DCC and DCC&U models adopt the same title for this action, similar activities may also be encountered in data and information lifecycle models under actions variously termed “distribute,” “share,” and “discover” (e.g., Christopherson et al., 2020; EPA et al., 2011; Structural Reform Group, 2004).
KISTI’s empirical data indicates that KISTI works to make national R&D reports and journal articles both searchable and accessible via not only first-party information service platforms but also third-party commercial search engines and academic databases. KISTI also helps its customers access, use, and reuse its data by formally and informally instructing how to utilize first-party information service platforms and providing customer services.
This empirical data, coupled with examinations of the DCC model and reviewed literature, has informed the main activities contained within the d-KISTI model’s access, use, and reuse action:
applying standards that make data discoverable
providing sufficiently appropriate descriptive metadata to make data searchable
ensuring legal permissions to enable access to and use of data
implementing access controls
applying authentication procedures that allow only authorized users to access data
supporting customers’ access to, use and reuse of data.
Transform
Transform encompasses activities that create new data from original data, including the migration of data and the creation of new subsets of an original dataset. It also encompasses any activities conducted to convert the original format to the standard format used by a curating organization. In this sense, it is deeply interrelated with the ingest and migrate actions.
New data is created as an outcome of transform activities. This new data is fed into the create and/or collect action, leading to the generation of a new curation lifecycle. By creating new data from an existing dataset, researchers can create not only a transformed subset for their publications, but also a new version of the existing dataset through the process of reprocessing or correcting it, or by appending additional data to it. Linking these datasets can enhance awareness of relevant topics, events, and situations and, in doing so, provide valuable insights. As such, this action encompasses many of the same activities as the knowledge enhancement full lifecycle action of the DCC&U model, which is why the d-KISTI model does not include knowledge enhancement as a separate action.
As aforementioned, the transform action also occurs in the ingest stage, which is evidenced within empirical data drawn from KISTI. This organization collects national R&D reports and journal articles in a range of formats, including MS Word, PDF, or HWP files, which are then all converted into PDF files. Then, it creates an XML file of each original file to extract its contents. For their final deposition, national R&D reports are stored as PDF/A files. Moreover, KISTI creates new datasets using existing datasets to provide valuable insights to its stakeholders and users.
KISTI’s empirical data and reviewed literature have informed the core activities of the transform action:
preparing software that can change file formats
identifying data to be transformed
determining a method—including emulation and migration—of transforming data
confirming that data, including annotations and metadata, is not damaged or lost when its format is changed
updating knowledge on new data formats informed by technology watch
creating a subset of a dataset by query or selection.
Occasional actions
Occasional actions are only required within certain conditions and will not necessarily be conducted for all data. The d-KISTI model includes the occasional actions of migrate, reappraise, and dispose.
Migrate
Migrate represents the process of transferring data from one storage system to another, one hardware or software configuration to another, or one format to another, to preserve the intellectual content of data and enable it to be retrieved and used. Migration is required for digital preservation in the circumstances of developing technologies and Designated communities’ needs (Beagrie and Jones, 2001; CCSDS, 2012).
Although the migrate action only links the preservation action and transform actions within the DCC model, the ingest subsection of this study presents empirical evidence from KISTI of transformation activities occurring by migration during the ingest process to conform to data formatting standards of the curating organization. This, in turn, affirms the claims of the Consultative Committee for Space Data Systems (hereafter the “CCSDS”) (2012) and Oliver and Harvey (2016). Moreover, migrated objects must be ingested, which is not only referenced in the OAIS reference model (see CCSDS, 2012) but also represented as “re-ingest” in the LIFE model v2 (see McLeod et al., 2006). As such, although not universally essential, migrate is a common component of the ingest action.
On the other hand, data in the preserve stage migrates only occasionally. Migration activities are motivated by improved cost-effectiveness due to hardware and software evolution, customers’ new needs, and media decay (CCSDS, 2012). Consequently, migration often necessitates transformation activities, to ensure data structures or file formats are more recoverable in the future, counter obsolescence, and to conform to standardized data formats (Higgins, 2012). Indeed, this relationship between transformation and migration is present in the OAIS reference model, where transformation is presented as a migration type (CCSDS, 2012: 5-6). Similarly, the transformation service can be deployed in the context of migration in the UC3’s micro-services (California Digital Library, 2010). Migration “changes and transforms the digital object so that it can be used with new hardware or software” (Oliver and Harvey, 2016: 162). In short, it is an essential component of digital preservation and often encompasses activities related or leading to or contained within the transform action. Consequently, within the d-KISTI model, the migrate action links the ingest and transform actions, as well as the preserve and transform actions.
KISTI’s empirical data shows that, although KISTI has occasionally migrated national R&D report and journal article files during the ingest process, migration has not yet had to be conducted as part of the preserve process within the organization. Despite this, KISTI interviewees reported that if migration were necessary during the preserve process in the future, KISTI was prepared and equipped to act accordingly.
In addition to KISTI’s empirical data, reviewed literature indicates core activities that should be conducted in this stage include:
determining which data will be migrated and the formats to the data will be migrated
developing, documenting, and implementing policies about migration
documenting migration methods and processes
monitoring obsolescence of file formats and, if necessary, transforming them into new file formats
checking the accuracy and authenticity of the migrated data.
Reappraise
Reappraise encompasses activities undertaken to re-evaluate the long-term curation value of data, identify data no longer of value for curation, and make decisions that may trigger the dispose action. Reappraise is conducted at a later stage than appraise and select, often in response to a specific trigger, such as a sharp increase of backlogged data or changes to an organization’s data collection policies. Despite this, its process is either similar to or the same as the appraise and select process. Data that was not selected for curation in the reappraise stage, as was the case in the appraise and select stage, may trigger the dispose action.
In contrast to the DCC model, where reappraisal only occurs in the preservation action stage, within the d-KISTI model, reappraise occurs in both the preserve stage and the access, use, and reuse stage because the value of data being used and reused changes over time. This is evidenced in the case discussed in the dispose subsection of this paper, in which KISTI reappraised articles from journals subject to predatory hijacking, ultimately triggering their disposal.
KISTI’s empirical data and reviewed literature indicate that the core activities of the reappraise action include:
developing, documenting, and implementing policies and criteria prompting reappraisal
documenting procedures and methods used to conduct reappraisal
evaluating data marked for reappraisal against the appraisal criteria.
Dispose
Dispose encompasses activities that transfer data to other archives, repositories, or custodians. It also involves destroying data, sometimes securely. Dispose is closely related not only to appraise and select, which is a sequential action, but also to another occasional action, reappraise. More specifically, the disposal of data is a possible outcome of decisions made during the appraisal, selection, and reappraisal processes (Higgins, 2012; van Bussel et al., 2015): data that is not selected for long-term curation at the appraise and select or reappraise stages is disposed of.
According to KISTI’s empirical data, KISTI rarely disposes of its national R&D reports and journal articles because it is legally mandated to preserve and curate them. There are, however, some exceptions: KISTI must destroy national R&D journal articles and their associated metadata upon the expiration or breaking of agreements with scholarly societies and journal publishers that allow KISTI to accumulate their articles in its database and distribute them. Similarly, in 2018, in a highly unusual case, KISTI destroyed the files of the journals of the World Academy of Science, Engineering, and Technology (WASET) and their associated metadata, following the predatory hijacking of these journals.
KISTI’s disposal activities and reviewed literature indicate core activities of the dispose action including:
developing, documenting, and implementing organizational policies, criteria, and processes for disposal
reviewing regularly disposal
determining the factors triggering disposal
checking and following the legal requirements related to the permanent disposal of data.
Relationships between actions
The d-KISTI model presents the idea that numerous digital curation actions and processes are intricately interrelated, which reinforces the notion that efficient and systematic digital curation requires the harmonious engagement of all curation actions. It proposes relationships between actions not only within the same category but also across different categories, perhaps leading to several actions occurring concurrently. More specifically, relationships between actions can occur both sequentially and occasionally, and also over the entire lifecycle of data.
All full lifecycle actions impact each other, as has been discussed within the subsections related to the stakeholder observation and collaboration and user and use investigation actions (see “Stakeholder observation and collaboration” subsection); and the technology watch and stakeholder observation and collaboration actions (see “Technology watch” subsection).
Additionally, not only has the specific order in which sequential actions occur been derived from a thorough review of existing models and the current global digital curation landscape to best clarify the relationship between sequential actions, but it has also been made more nuanced by such investigations and the application of empirical data from KISTI. Consequently, sequential actions such as preserve and store are deeply interconnected, to the extent that activities within store can trigger a return to preserve, which moves counter to the standard progression of sequential actions (see “Store” subsection and Appendix 2). More specifically, data that were already either only stored for archiving or used actually in a curating organization can need preservation activities because of the technological obsolescence and the lack of accessibility over time.
Full lifecycle actions continue to influence sequential actions over the lifespan of data, as they occur simultaneously to one another. For instance, the metadata generated in description, identification, and linkage impacts all aspects of curation. What’s more, given that it encompasses examining patterns of customers’ access, use, and reuse, user and use investigation can occur concurrent to access, use, and reuse. Technology watch can influence transform (see “Technology watch” subsection and Appendix 3).
Further, full lifecycle actions continue to influence occasional actions over the lifespan of data, as they occur simultaneously to one another. Occasional actions like migrate are informed holistically by full lifecycle actions like technology watch (see “Technology watch” subsection and Appendix 4). This study also reinforces the DCC model’s findings that, although most occasional actions are not connected to one another, reappraise activities sometimes lead to dispose activities. That being said, the relationships between occasional actions and sequential or full lifecycle actions have been expanded dramatically from the DCC model. The sequential action of ingest, for instance, is newly tied to the occasional action of migrate. Similarly, the sequential action of access, use, and reuse is also now related to the occasional action of reappraise. Beyond these explicit relationships between occasional actions and sequential actions, which have been delineated in the graphic of the d-KISTI model, occasional actions are also influenced more holistically by full lifecycle actions.
This subsection highlights just a few examples of a nexus of interrelations that exist within the d-KISTI model, but does not claim to be comprehensive. In reality, too many relationships exist between curation actions to be codified in the visual model of the d-KISTI model (Figure 4). In short, the d-KISTI model comprises multiple actions and processes, which have complex relationships with one another that can occur both simultaneously and sequentially, and impact one another.
Discussion and Conclusion
At its core, this study presents a new and effective lifecycle model for digital curation—the d-KISTI model—in order to globally contribute to research and practice in the digital curation field. It employed a conceptual foundation, conceptual and empirical approaches, and consultation with worldwide digital curation experts to create the d-KISTI model, which is not a discipline- or content-specific model but rather a broadly applicable digital curation lifecycle model intended to enable diverse information organizations to operate in line with best practices. Equally, the similarities between the UK’s DCC model, Greece’s DCC&U model, and Korea’s d-KISTI model indicate that digital curation practices—and thus, information organizations—around the world share commonalities, all of which can be addressed by the d-KISTI model. Although any model’s capacity for generalization is of course contingent on the broad variability of digital curation practices within an organization, this study anticipates that the d-KISTI model will be able to be used effectively by a wide range of specific information organizations around the world.
The d-KISTI model differs most from previous models in its explicit focus on “curation”, rather than “preservation”, and its endeavor to balance the value and contributions of its various curatorial actions, rather than letting any single activity outweigh the others. In contrast, previous models, such as the DCC, DCC&U, and the former KISTI models, placed a noticeably larger emphasis on “preservation” than any other stage of the curatorial process, in part due to the influence of archival science and the OAIS reference model on the DCC model. Whereas these models contained references to “preservation” in three separate actions (preservation planning, preserve and curate, and preservation action), the d-KISTI model clarified, refined, and condensed the activities within these actions, resulting in just one action, the preserve sequential action. Within the d-KISTI model, curation planning and management encompasses many of the same preservation-focused activities contained within preservation planning and curate and preserve, yet reframes these activities as facilitating better standards of curation. In essence, the d-KISTI model envisions preservation as a means, rather than an end, in keeping with its identity as a “curation” model, rather than a “preservation” or “archival” model.
Beyond the d-KISTI model itself, this study has also built upon the foundation of the DCC model by not only verifying several of its pre-existing actions and relationships through supporting empirical data from KISTI, but also identifying and delineating new digital curation actions and relationships not included in the DCC model. Specifically, this study situated reappraise (occasional action) between appraise and select and access, use, and reuse (sequential actions), which highlights a relationship between these actions that have been discussed theoretically in previous studies (e.g., Greene, 1998: Huggard and Jackson, 2019; Rhee, 2011) and observed in practice within KISTI, yet was notably absent from the DCC model. These findings reaffirm the importance of using empirical data collected from contemporary digital curation practice to inform the generation of any new model, and the necessity for ongoing adaption and evolution within all lifecycle models, including the DCC model, in light of new findings and the development of information organizations. Indeed, the d-KISTI model will no doubt similarly continue to evolve in response to changes in the digital curation landscape in the future.
Similarly, the new and complex interrelationships of digital curation actions established within this study indicate the scale of the challenge of developing any digital curation lifecycle model that accurately represents the often varied practical applications of digital curation methods. Gathering empirical data from KISTI allowed this study to not only become aware of the gap between previous conceptual models and practical reality, but also work to bridge it through the introduction of more representative interrelations between actions. Inversely, this study also serves to connect assertions having previously been made in theoretical investigations of digital curation practices with the practical reality of an information organization. This is especially true of the newly established relationship between migrate (occasional action), ingest, and transform (sequential actions), which had been discussed theoretically by CCSDS (2012) and Oliver and Harvey (2016: 153), and is now supported by empirical evidence from KISTI.
Indeed, through a more thorough explanation of its research approaches, methods, and processes than many studies on pre-existing models, this study can provide insights to future researchers working to develop new lifecycle models or apply and adapt pre-existing ones. More specifically, in articulating how empirical data collected from KISTI’s digital curation practices, existing lifecycle models, reviewed literature, and feedback from global digital curation experts fed into the creation of the d-KISTI model, this study intends to help other researchers verify or re-build the d-KISTI model according to their own, differing contextual needs.
This study provides clear value to both researchers and information organizations. On a practical level, the d-KISTI model will allow information organizations to better cultivate best-practice digital curation. On a theoretical level, future research on such organizations can provide insights into developments within the digital curation landscape, which might ultimately be adapted into new lifecycle models. Indeed, all future studies and uses of the d-KISTI model will, in turn, help to verify and refine both the model itself and its applicability, allowing it to facilitate greater diversity and range within future lifecycle models and the field of digital curation more broadly.
Ultimately, this study hopes to serve as a foundational touchstone for future studies. Specifically, future studies could employ quantitative research methods to either further verify this model, or construct new lifecycle models on its foundations by, for instance, using components of the d-KISTI model to generate a survey questionnaire that would enable researchers to quantitatively investigate digital curation practices in a number of information organizations. Similarly, future qualitative studies will be able to use this study’s methodology to inform and support their own investigations of digital curation practices. This, in turn, will facilitate a better understanding of the practical reality of digital curation practices within information organizations and drive more representative future studies and lifecycle models.
Footnotes
Appendix 1
Acknowledgements
Though I cannot name all of them here, I deeply appreciate all those who helped me conduct this research. Above all else, I would like to recognize and thank my KISTI colleagues, who not only shared with me details regarding KISTI’s digital curation practices, but moreover offered me their invaluable support throughout the research process. I also express my sincere gratitude to the many other researchers and practitioners involved in this field, particularly, Joy Davidson, Sarah Jones, and Thordis Sveinsdottir, for their constructive feedback on the d-KISTI model. Finally, my thanks to the reviewers of this journal (Journal of Librarianship and Information Science) for their considerate comments which helped me to improve the quality of this article.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the Korea Institute of Science and Technology Information [grant numbers K-19-L01-C01-S01, K-22-L01-C01-S01].
