Abstract

Dear Editors
We read with interest the article by Soydan et al, 1 which presented a deep learning (DL) model for lumbar disc degeneration classification. While the study demonstrated promising results, we identify several critical limitations and suggest future directions for improvement.
First, we noticed a discrepancy in the data presented in Table 4 of the study. 1 The total number of discs reported is 566, which is not a multiple of 5 (the number of lumbar discs per patient). This inconsistency raises concerns about data processing or inclusion criteria. For instance, if 113 patients contributed to the test set (5 discs each), the total should be 565. Such discrepancies may affect the reliability of the reported accuracy metrics and warrant further clarification.
Second, the study evaluates discs individually, but it ignores the correlation of multi-segmental disc degeneration within the same patient. For example, a patient with multilevel degeneration may exhibit interdependent disc changes. Future research should incorporate patient-level analysis to capture global patterns, which could enhance model generalization and clinical relevance.
Third, the Pfirrmann 5-level grading system employed in this study has inherent limitations that may affect its reliability in clinical applications. 2 Specifically, this system exhibits high within-level variability, meaning that discs within the same grade can display significant differences in degenerative features. Additionally, it has relatively low discriminatory power, making it challenging to distinguish between adjacent grades, particularly in cases of subtle degenerative changes. These limitations are well-documented in previous studies on automated disc diagnosis, where the traditional Pfirrmann 5-level system has been predominantly used.3-6
Recent advancements in the field have introduced an 8-level classification system, which offers a more nuanced approach to capturing the spectrum of degenerative changes. 7 This refined system is better equipped to detect subtle differences in disc degeneration, thereby enhancing diagnostic precision. By incorporating this improved grading system, future models could achieve greater accuracy and clinical relevance, potentially leading to more effective patient management and treatment planning.
In conclusion, while the DL model proposed by Soydan et al shows potential, addressing these limitations—data consistency, patient-level analysis, and updated classification—would significantly strengthen future research. We commend the authors for their contributions and encourage further exploration in these areas.
