Sage Journals: Discover world-class research

Abstract

Background

When applying for teleconsultations, medical laboratory reports are usually photographed with a mobile phone, and the photographic results are uploaded as teleconsultation application materials. It is very meaningful to extract the content of the image medical laboratory report and store the content digitally. There are already applications of OCR technology for medical text file recognition, but no researchers have recognized the format of the medical laboratory report and obtained the report content as a serialized process to digitize the image report. This article proposes a serialization method to digitize the medical laboratory report image.

Materials and Methods

This article first collects 330 image-based medical laboratory reports, annotates the format of the medical laboratory reports, and forms a training dataset for the layout analysis model. Then, using the pre-trained model, the dataset is trained to obtain a layout analysis model that can correctly recognize the format of the medical laboratory report. Then, the layout of the input image-based medical laboratory report is analyzed, and the layout analysis results are used to call the text detection and text recognition models to obtain the digital content of the image report. Finally, adjusting the layout of the digital content and storing the digital content as a docx file.

Results

After training the layout analysis model, integrating layout analysis, text detection, and text recognition, we have obtained a serialization method that digitizes the content of the image medical laboratory report, restores the report format, shields sensitive and irrelevant content, and digitizes the report content of interest.

Conclusions

By digitizing the image medical laboratory report through the serialization method, we can correctly display the content of the medical laboratory report for teleconsultation, while removing irrelevant content in the report, such as user names, examination equipment numbers, etc.

Keywords

Telemedicine digitization deep learning layout analysis serialization

Introduction

Medical laboratories have been widely used, and the importance of these tests is well known.¹ When applicants send the required information of patient during teleconsultation between different hospitals, due to the lack of system-level interconnection, it is not possible to export the original electronic medical laboratory report normally. Therefore, applicants usually use mobile phones to take images of patients’ medical laboratory reports, and stored them in the form of pictures, and these report images are often uploaded as an attachment.²

There are some limitations to image reports in the form of picture attachments. It is impossible to directly obtain the digital content on the medical laboratory report, nor is it possible to store the image attachments in a structured manner.³ Medical data involves personal privacy protection and cannot be processed using existing online recognition tools. Prior studies have explored deep learning-based text recognition in medical imaging,⁴ but their approaches primarily focused on single-modality document processing and lacked adaptability to multi-format report.^5–8 Real-world deployments of similar systems in clinical settings face significant challenges, including integration into existing Electronic Health Record workflows, handling diverse report formats, and ensuring privacy compliance.^9,10 There are various formats, and the size of layout elements is not uniform. The medical laboratory report has personalized characteristics, and different hospitals have different types of report templates.¹¹

Making papery document digitalized mainly relates to OCR technologies (Optical Character Recognition), especially text detection and text recognition.¹² Text detection and text recognition have been applied in healthcare. Text detection and text recognition have received continuous attention with the emergence of many application scenarios, as the emergence of high-end hardware has facilitated the development based on deep learning algorithms^.^13–16 OCR technologies achieve recognition rate higher than 99% on scanned document.^17,18 Usually, text detection and recognition are treated as a whole, and computer vision and learning methods are used to process the problem.¹⁹ In healthcare, deep neural networks are used to improve the accuracy of OCR engines for transcribing scanned medical reports.²⁰ OCR and NLP technologies are used to structure and digitize medical records.²¹

PP-OCRv2 is a lightweight OCR system, which balances the accuracy against the efficiency. Multiple pre-trained models are provided by PP-OCRv2 for use, and lightweight optimization is performed for CPU usage scenarios. However, existing models cannot accurately recognize medical laboratory reports’ layout.²²

This article is based on the PP-OCRv2 framework and establishes own digital recovery application for medical laboratory reports. The independent processes of layout analysis, text detection, text recognition, and layout recovery are established as a sequential series to digitize image-based medical laboratory reports. This solves the problem of the inability to digitize image-based reports, protects patient privacy, and adapts to different report formats.

Methods

The method proposed in this article is shown in the Figure 1. Existing layout analysis models cannot meet the requirements. Due to medical data security and other reasons, there are publicly available medical laboratory reports. We collected 330 medical laboratory reports and trained our own layout analysis model based on a pre-training model. For input image-based medical laboratory reports, after processing by the layout analysis model, a list of results containing different labels in the image is detected. For each label obtained, the text detection and text recognition models are called to obtain the position information and content. Finally, the obtained results are written into a docx file.

Figure 1.

Sequential series of reports recovery.

Training layout analysis model

PP-OCRv2 provides multiple pre-trained models. However, existing models cannot meet the requirements for layout analysis. Figure 2 shows the result of using pre-trained model. For image-based medical laboratory report taken with a mobile phone, existing models cannot accurately detect box information for key contents such as “diagnostic opinions” and “impressions”. Due to information security considerations, we do not want to analyze and recognize boxes containing contents such as “patient name” and “examination equipment number”. Therefore, we trained a layout analysis model to meet our requirements.

Figure 2.

The result of using the pre-trained model, containing wrongly detected box, “diagnostic opinions” not labelled.

Data annotation

We collected 330 image-based medical laboratory reports taken by mobile phones, and used the annotation tool, LabelMe to annotate and label each image of medical laboratory report, which is a widely used labelling system, and allows users to draw polygons and constructs training sets.²³ The 330 medical laboratory reports were collected from the First Affiliated Hospital of Zhengzhou University in Henan, China, covering five major categories of reports (CT reports, MRI reports, pathology reports, and ultrasound reports) with distinct template formats. This diversity ensures the model's adaptability to institutional variations in report layouts. However, to further validate the robustness of the model, future work will involve collecting data from multiple healthcare institutions with different documentation styles. Then converted these annotations to COCO format.²⁴ After LabelMe is launched, using “Create Polygons” to circle the area to be recognized and setting the area label. A corresponding annotation JSON format file will be generated once a report is labelled. Figure 3 depicts the interface of LabelMe after launched.

Figure 3.

LabelMe interface.

Training layout analysis model

Layout analysis refers to the regional division of documents in the form of images, common images are stored in the form of pictures (jpg, JPG format) or PDF files. The goal is to identify key areas such as text, titles, and figures.

PP-StructureV2 provides a pre-trained layout analysis model based on PP-PicoDet,^25,26 which is a lightweight detector that balances accuracy and efficiency well. It uses Enhanced ShuffleNet (ESNet) as the backbone, with SE modules added to each ES Block for better channel weighting. The neck of the network uses CSP-PAN for feature concatenation and fusion, reducing computational costs by using 1*1 convolution.^27,28 As shown in Figure 4.

Figure 4.

PP-PicoDet architecture. The backbone is ESNet which outputs 3 feature maps to CSP-PAN. CSP-PAN acts as neck that outputs 4 feature maps. PP-PicoDet uses SimOTA dynamic label assignment strategy to optimize our training process.

During training, the dataset is resized to different resolutions, and a random interpolation method is used. SimOTA dynamic label assignment strategy optimizes the training process, and various loss functions (varifocal loss, GIoU loss, and Distribution Focal Loss) are used to improve model performance. In the head of PP-PicoDet, calculating varifocal loss to couple classification prediction and quality prediction. Correspondingly, for regression, using GIoU loss and Distribution Focal Loss. The formula is as follows:

l o s s = l o s s_{v f l} + 2 * l o s s_{g i o u} + 0.25 * l o s s_{d f l}

(1)

Model export

Compared with the checkpoints model, that saved in training process and only saved the parameters of temporary model, the inference model also needs structural information. Therefore, we export the layout analysis model, including model structure and model parameters in solidified files. This way performs superiorly in prediction deployment, is flexible and convenient, and is suitable for actual system integration.

Sequential series to digitize image-based medical laboratory reports

Text detection and text recognition

The PP-OCRv2 framework supports bilingual recognition (Chinese/English) and common medical abbreviations (e.g., AD, CT, MRI) through its hybrid language model architecture.²² This capability enables accurate interpretation of both localized terminology and international standardized abbreviations.

In the text detection model, PP-OCRv2 uses CML between three models, allowing the large model to guide the small model. The method of knowledge distillation is commonly used in deployment. By guiding the small model with the large model, the accuracy of the small model can be further improved under the condition of unchanged prediction time, thereby enhancing the actual deployment experience. The structure figure of CML is shown in the Figure 5. The entire training process consists of three loss functions: GT loss, DML loss, and Distill loss. Sub-student models learn from each other with reference to DML method.²⁹ DML loss function in the CML is as follows:

L o s s_{d m l} = \frac{K L (S 1_{p o u t} | | S 2_{p o u t}) + K L (S 2_{p o u t} | | S 1_{p o u t})}{2}

(2)

Figure 5.

CML framework, the teacher model guides the student model.

The total loss function in CML is as follows.

L o s s_{t o t a l} = L o s s_{g t} + L o s s_{d m l} + L o s s_{d i s t i l l}

(3)

In the text recognition model, PP-OCRv2 uses the U-DML knowledge distillation method. Based on the traditional DML strategy, PP-OCRv2 further adds the intermediate output feature map supervision signal as a loss function to the final output layer of the text recognition model. As shown in Figure 6, the teacher model and the student model have the same network structure but different initialization parameters. A supervision mechanism for feature map is introduced, and feature loss is added. The Feature loss uses L2 loss, and the specific calculation method is shown below:

L o s s_{f e a t} = L 2 (S_{b o u t}, T_{b o u t})

(4)

Figure 6.

U-DML framework, teacher model is identical with student model in structure, and feature loss is added.

Finally, the total loss is the following:

L o s s_{t o t a l} = L o s s_{c t c} + L o s s_{d m l} + L o s s_{f e a t}

(5)

Converting image to docx format

This section is about converting the analyzed results into the docx file format. By iterating through the result collection returned by the previous steps, different docx file display modules are generated based on the result category, such as center alignment and title generation. During the development process, the document results that do not match the layout of the original image display are refined to ensure the correct layout of the displayed content. Finally, the code is written to combine the results and store them as a docx document.

Results

It took 8 months to build this sequential series method. According to the sequence function, the entire process is divided into several steps: collecting medical laboratory reports, annotating the collected reports, training the layout analysis model, text detection and text recognition, converting the analysis results into docx documents, and so on. Next, we will list the experimental results of each stage.

Dataset and data annotation

We collected 330 photos and annotated them. We only annotated the medical content modules, such as ‘imaging performance’, ‘diagnostic opinions’, ‘impressions’, etc. For some information that is not related to medical treatment, such as ‘patient name’ and ‘examination equipment number’, we do not annotate them due to information security reasons. Since the layout analysis model is trained on a pre-trained model of Chinese CDLA dataset for Chinese paper scenarios,³⁰ we specified a label file, which consists of 10 categories, including Text, Title, Figure, Figure caption, Table, Table caption, Header, Footer, Reference, Equation. Figure 7 depicts the interface of labelling process.

Figure 7.

Using labelMe to label medical laboratory reports, we only label the parts needed, and desensitize some sensitive fields.

As shown in figure 8. At a ratio of 9:1:1, 330 reports were divided into training set, validation set and test set, and the training set, validation set were labelled respectively. To facilitate training, the annotation results are converted to the COCO data format. Each annotated image corresponds to a JSON format annotation file. After conversion to the COCO format, all annotation files are merged into an “annotations.json” file.

Figure 8.

Depicting the process of data splitting, labelling images, and converting to “annotations.json” file.

Layout analysis model training results

Comparative experiments with Tesseract OCR 5.0 demonstrated superior performance of our method in handling complex medical reports (Table 1). The proposed system achieved 92.4% F1-score versus 78.1% in recognizing medical abbreviations from low-quality mobile images.

Table1.

Performance comparison.

Metric	Proposed	Tesseract	ABBYY
F1-score	92.4%	78.1%	85.3%
Speed (s/page)	2.3	1.8	4.7
Memory usage	1.2GB	680MB	2.1GB

Export the optimal evaluation model as an inference model. The inference model contains model parameter information as well as model structure information. This facilitates the integration of the layout analysis model and makes model deployment easier. The process is shown in Figure 9.

Figure 9.

After training the layout analysis model, we export the best evaluation model as inference model.

Sequentially digitizing image-based medical laboratory reports

For an image-based medical laboratory report, the layout analysis, text detection, and text recognition are called in sequence to digitize the content of the report in image form, and finally convert the content of the report from an image to a docx file.

Performing layout analysis on the image report. Calling the layout analysis model, which returns a list of “box coordinates, labels”. For each box coordinate area, text detection model is performed to obtain a list of text area coordinates within the box. Performing text recognition on each text area. Until all the contents of the “box coordinates, labels” list are processed, programmatically adjusting the mismatched areas and converting the result into a docx file. The detailed process is shown in Figure 10.

Figure 10.

Workflow of sequentially digitizing image-based medical laboratory report. After performing layout analysis, a list of “box coordinates, labels” returns. Iterating the list, by using text detection and recognition. Finally, converting the previous result into a docx file.

Discussion

This study presents a serialized approach for digitizing image-based medical reports, addressing three critical challenges in telemedicine: 1) Privacy-aware processing through selective annotation and identifier exclusion; 2) Cross-categorical adaptability via a hybrid model architecture; 3) Computational efficiency enabling CPU-based deployment. Compared with existing medical OCR systems that primarily focus on scanned documents,³¹ our method demonstrates superior performance in handling mobile-captured images with complex layouts (Table 1).

Table 2.

Error type analysis.

Error Type	Frequency	Mitigation Strategy
Text overlap	8.7%	Layout-aware OCR
Perspective distortion	12.3%	Homography Correction
Low-contrast text	5.2%	Adaptive Thresholding

Practical deployment considerations

The system achieves 2.3 s average processing time per report on standard CPU hardware (Intel Xeon E5-2680v4), meeting real-time requirements for teleconsultation scenarios. This efficiency enables cloud-based batch processing deployment paradigm: Handling 43 reports/minute on NVIDIA T4 GPU.

Error pattern analysis

Common recognition errors occurred in cases of severe image distortion (12.3% error rate) and overlapping text annotations (8.7% error rate). These issues could be mitigated through image preprocessing techniques like perspective correction. These are shown in Table 2.

These challenges were empirically evaluated, and the following results were obtained:

Homography correction: Reduced perspective distortion errors by 47.5%.

Adaptive thresholding: Improved low-contrast text recognition accuracy by 30.1%.

Limitations and future directions

The current model shows decreased performance (72.1% AP) when processing reports from unseen hospital templates. This limitation stems from the single-institution dataset (330 reports from one hospital).³² Future works should:

Expand dataset diversity: Collect cross-institutional reports through multicenter collaboration

System integration: Implement HL7/FHIR interfaces for direct EHR (Electronic Health Record) insertion, as proposed in our conclusion

The proposed serialization method provides a foundation for privacy-preserving telemedicine infrastructure, particularly valuable in developing regions lacking standardized EHR systems.

Conclusion

This study presents a serialization method for digitizing image-based medical reports with three key advantages: 1) Privacy protection through selective annotation; 2) Cross-format adaptability via customized layout analysis; 3) CPU-friendly deployment. Future work will focus on: 1) Expanding multilingual support for global telemedicine applications; 2) Developing HL7/FHIR interfaces for hospital information system integration; 3) Implementing federated learning frameworks to enhance model generalizability across institutions.

Footnotes

Acknowledgements

We would like to thank all the participants, for their valuable suggestions and support during the completion of the research.

ORCID iD

Xiaoyang Ren

Ethical considerations

All data and images included in this study were fully anonymized prior to analysis. No personally identifiable information (including patient names, ID numbers, or addresses) was retained in any materials. Therefore, informed consent from participants was not required for the use of these anonymized data, in accordance with guidelines on the use of de-identified medical records.

Author contributions/CRediT

All authors approved the final manuscript. Author contributions are as follows: Xiaoyang Ren: Conceptualization, Methodology, Software, Resources, Writing. Dongwei Dou: Validation, Data Curation, Supervision, Project administration, Editing. Xianying He: Validation, Editing. Fangfang Cui: Validation, Editing. Jie Zhao: Funding acquisition, Project administration.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Key Science and Technology Program in Henan Province, Key Scientific Research Project of Colleges and Universities in Henan Province, (grant number 201400210400, 23A520018).

Conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Barnett

. Medical significance of laboratory results. Am J Clin Pathol 1968; 50: 671–676.

Ren

Wang

, et al. Design and implementation of a message-based regional telemedicine system to achieve high availability and scalability. Telemedicine and e-Health 2019; 25: 243–249.

Shafait

. Document image analysis with OCRopus2009: IEEE.

Landolsi

Hlaoua

Ben Romdhane

. Information extraction from electronic medical documents: state of the art and future research directions. Knowl Inf Syst 2023; 65: 463–516.

Majumder

Mahmud

Jahan

, et al. Offline optical character recognition (OCR) method: An effective method for scanned documents2019: IEEE.

Afroge

Ahmed

Mahmud

. Optical character recognition using back propagation neural network2016: IEEE.

Kishna

NPT

Francis

. Intelligent tool for Malayalam cursive handwritten character recognition using artificial neural network and Hidden Markov Model2017: IEEE.

Khaustov

Spitsyn

Maksimova

. Algorithm for optical handwritten characters recognition based on structural components extraction2016: IEEE.

Wang

Tao

. An intelligent whole-process medical system based on cloud platform. Appl Artif Intell 2023; 37: 2221507.

10.

Diogo

Morais

Calisto

, et al. Weakly-Supervised Diagnosis and Detection of Breast Cancer Using Deep Multiple Instance Learning. 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI); 2023 18-21 April 2023.

11.

Pantanowitz

Henricks

Beckwith

. Medical laboratory informatics. Clin Lab Med 2007; 27: 823–843.

12.

Doermann

. Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 2014; 37: 1480–1500.

13.

Tian

Huang

, et al. Detecting text in natural image with connectionist text proposal network.

In: 14th European Conference on Computer Vision (ECCV 2016),

ECCV, Amsterdam, The Netherlands, 11–14 October 2016, pp. 56–72. Springer.

14.

Liu

Anguelov

Erhan

, et al. Ssd: Single shot multibox detector.

In: 14th European Conference on Computer Vision (ECCV 2016),

ECCV, Amsterdam, The Netherlands, 11–14 October 2016, pp.21–37. Springer.

15.

Liao

Shi

Bai

, et al. Textboxes: A fast text detector with a single deep neural network 2017.

16.

Ren

Girshick

, et al. Faster r-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 2016; 39: 1137–1149.

17.

Weinman

Learned-Miller

Hanson

. Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans Pattern Anal Mach Intell 2009; 31: 1733–1746.

18.

Vinciarelli

. A survey on off-line cursive word recognition. Pattern Recognit 2002; 35: 1433–1446.

19.

Wang

Belongie

. Word spotting in the wild, Computer Vision–ECCV 2010. Berlin, Heidelberg: Springer, 2010.

20.

Karthikeyan

de Herrera

AGS

Doctor

, et al. An OCR post-correction approach using deep learning for processing medical reports. IEEE Trans Circuits Syst Video Technol 2021; 32: 2574–2581.

21.

Liu

Chang

Zhao

, et al. Information extraction of medical materials: an overview of the track of medical materials MedOCR.

In: China Health Information Processing Conference,

Singapore, October 2022, pp.137–142. Singapore: Springer Nature Singapore.

22.

Guo

, et al. PP-OCRV2: Bag of tricks for ultra lightweight OCR system. arXiv preprint arXiv:210903144. 2021.

23.

Russell

Torralba

Murphy

, et al. Labelme: a database and web-based tool for image annotation. Int J Comput Vision 2008; 77: 157–173.

24.

Veit

Matera

Neumann

, et al. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:160107140. 2016.

25.

Guo

Zhou

, et al. Pp-structurev2: A stronger document analysis system. arXiv preprint arXiv:221005391. 2022.

26.

Chang

, et al. PP-PicoDet: A better real-time object detector on mobile devices. arXiv preprint arXiv:211100902. 2021.

27.

Han

Wang

Tian

, et al. GhostNet: More features from cheap operations 2020.

28.

Liu

Qin

, et al. Path aggregation network for instance segmentation2018.

29.

Zhang

Xiang

Hospedales

, et al. Deep mutual learning2018.

30.

CDLA: A Chinese document layout analysis (CDLA) dataset 2021 [Available from: https://github.com/buptlihang/CDLA.

31.

Abrantes

Silva

Meneses

, et al. Filice RJEESoRV, Austria. External validation of a deep learning model for breast density classification. 2023.

32.

Calisto

. Human-Centered Design of Personalized Intelligent Agents in Medical Imaging Diagnosis 2024.

A serialization method for digitizing the image-based medical laboratory report

Abstract

Background

Materials and Methods

Results

Conclusions

Keywords

Introduction

Methods

Training layout analysis model

Data annotation

Training layout analysis model

Model export

Sequential series to digitize image-based medical laboratory reports

Text detection and text recognition

Converting image to docx format

Results

Dataset and data annotation

Layout analysis model training results

Sequentially digitizing image-based medical laboratory reports

Discussion

Practical deployment considerations

Error pattern analysis

Limitations and future directions

Conclusion

Footnotes

Acknowledgements

ORCID iD

Ethical considerations

Author contributions/CRediT

Funding

Conflicting interests

References