Abstract
This paper examines optical character recognition (OCR) through the lens of archival ethics as outlined in the Society of American Archivists (SAA) Core Values Statement and Code of Ethics, given the current debates surrounding artificial intelligence (AI). A literature review highlights persistent challenges of authenticity and integrity, transparency and accountability, access and equity, and responsible stewardship and sustainability, as well as new concerns about bias, sustainability, and accountability using large language models (LLM). A case study describes systematic testing of LLM, transformer model (TM), and neural network (NN) architectures and examines the challenges in creating a reliable, scalable in-house OCR tool named Opticolumn. This case study finds that NN approaches better align with archival ethics than do LLM tools, which may generate fabrications, but that OCR tool choice will depend on the capacities and preferences of individual institutions.
Keywords
Get full access to this article
View all access options for this article.
