Abstract

We would like to sincerely thank the authors for taking the time to read our article and for providing thoughtful and constructive feedback. We greatly appreciate their interest and the opportunity to address their comments.
Regarding the Number of Discs in the Test Set
It is important to clarify that the number of discs does not have to be a multiple of the number of patients (ie, five lumbar discs per patient), since in our study, each disc was treated as an independent unit for both classification and evaluation. Therefore, it is not necessarily expected that the total number of test discs would reflect a patient-based multiple.
However, the observation that the test set includes 566 discs rather than 10% of the total (ie, 568.5 discs from a pool of 5685) is valid. This discrepancy arises due to the nature of Python’s train_test_split function, which includes a stratify parameter to maintain class distribution. The exact number of test samples is affected by this process, and the result is sometimes slightly below the specified proportion due to class-based rounding.
On Evaluating Discs Independently
We fully acknowledge the interdependence of discs within the same patient. In this study, we intentionally chose to evaluate each disc separately to establish a focused and scalable framework for automated analysis. Nevertheless, we agree that patient-level analysis could provide valuable insights and would complement the current approach. This remains an important area for future research, and we plan to explore it further.
On the Use of the Pfirrmann Grading System
While the 5-level Pfirrmann grading system is currently the most widely cited and utilized standard in the literature for lumbar disc degeneration classification, we are aware of its limitations, including within-level variability and limited discriminatory power. We have acknowledged these issues in our article. To address this gap, we relied on consensus-based labeling from experienced radiologist, neurosurgeon and two orthopaedics. Moreover, as a step toward addressing these limitations in the literature, we proposed a new grading system—referred to as the Soydan classification—in our study. 1
On Extended Classification Systems
We appreciate the mention of more detailed systems such as the 8-level classification. While these methods offer promising refinements, their overall adoption and validation remain limited. Some of these alternative approaches are still under evaluation, including in our own ongoing research.
In conclusion, we thank the authors again for their valuable comments. We agree that the points raised are meaningful, and we will carefully consider them in future studies.
