Abstract
Background:
The efficient management of consent information is essential for the ethical and legal handling of biobank resources in accordance with participant consent. However, many traditional biobanks rely on paper-based consent forms, which are often illegible and unsuitable for processing at scale. This study aims to automate the reading and quality control of paper-based consent forms.
Methods:
We optimized a proprietary optical character recognition (OCR) model to recognize handwritten Korean characters in a standard paper-based consent template. We generated 1000 synthetic consent documents for training. The test dataset, comprising synthetic standard consent forms (n = 192), was used to estimate recognition accuracy. Then, this model was further trained with synthetic nonstandard consent forms (n = 1000) to optimize for the unstructured consent forms. The final model was then applied to the routine consent management process of the biobank using 3,790 pages of consent forms for the performance evaluation.
Results:
This optimized OCR model showed an accuracy of 88.94% and 91.88% when tested on the 192-page standard and 1000-page nonstandard test datasets of paper-based consent forms, respectively. Moreover, when this OCR model was applied to consent forms in a routine of biobanking processes, it showed an accuracy of 91.25% and an F1-score of 0.91, indicating the model’s high overall performance and excellent generalization capability for data.
Conclusions:
We optimized a proprietary artificial intelligence-based OCR tool to develop a highly efficient and reliable OCR-based consent management model for paper-based consent documents. This approach could contribute to the digital transformation of traditional biobanking processes of paper-based consent forms.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
