Abstract
Background
With the development of a consensus agreement on the essential clinical diagnostic criteria for Achilles tendinopathy (AT), there is scope to investigate the diagnostic accuracy of clinical tests, their relationship to structural changes observed on ultrasound imaging (USI), and the potential role of USI in the clinical diagnosis of AT. Objectives. To evaluate the relationship between clinical tests and tendon structure via USI.
Methods
A pilot cross-sectional study of 23 individuals (14 male, and 9 female) with unilateral, symptomatic AT were recruited from physiotherapy clinics. Assessment included subjective measures and patient-reported outcome measures (pain with loading, stiffness, self-reported function, Victorian Institute of Sport Assessment-Achilles [VISA-A], Pain Catastrophizing Scale, and 12-Item Short Form Survey). Objective tests included palpation, arc sign, Royal London Hospital Test, single-leg heel raise, and hopping. USI subcategorized tendon structure as normal or abnormal. Diagnostic accuracy (positivity rates, sensitivity, specificity, positive predictive value [PPV], negative predictive value [NPV]), receiver operating characteristic analysis, area under the curve (AUC), and effect sizes were calculated.
Results
Stiffness and pain on palpation demonstrated high positivity rates (82.6%). Hopping pain demonstrated high positivity rates (78.3%). Pain on palpation and pain during hopping had high sensitivity (0.94) and PPV (0.84) for detecting structural abnormalities. VISA-A and self-reported function had excellent AUC (0.89) for detecting structural change. No test distinguished the degree of structural change.
Conclusions
Pain on palpation and hopping pain are promising indicators of structural tendon pathology. Patient-reported outcome measures may aid in imaging decisions. Further studies are needed to validate findings.
Level of Evidence
Level IV
“Pain on palpation and hopping pain are promising indicators of structural tendon pathology.”
Introduction
Achilles tendinopathy (AT) is defined as the clinical presentation of pain localized to the Achilles tendon, and the associated loss of function with tendon loading activities.1 -3 AT is primarily diagnosed with a combination of patient history and clinical tests. 4 However, there is a lack of consistency in the methods used for the clinical diagnosis of AT.5 -7 Clinical tests are commonly compared with a reference standard, such as ultrasound imaging (USI), 6 where the presence of structural change can be evaluated.8,9 Although USI has demonstrated both sensitivity and accuracy for detecting pathological structural tendon change,10 -12 it does not always correlate with clinical symptoms and function,13 -15 with asymptomatic Achilles tendons demonstrating structural change. 15 However, pathological change identified on USI has been shown to be a risk factor for developing symptoms in asymptomatic AT.16,17
A recent scoping review identified the most common methods for clinically diagnosing AT. 7 Similarly, a recent Delphi study provided consensus guidance for the clinical diagnosis of AT, including differential diagnosis. 6 Consensus for essential diagnostic criteria included pain location, pain during activity, tests that provoke pain and palpation to assess pain. 6 The diagnostic domains of “pain onset,” “morning or arising pain or perceived stiffness,” and “self-reported function” that were included in the scoping review, 7 reached consensus agreement but were considered not essential to achieve a clinical diagnosis of AT. 6 Notably, imaging tendon pathology reached consensus, although it was not essential for diagnosing AT if the essential diagnostic criteria were met. 6 Of note, there was debate and discussion within the expert panel as to whether the stage of disease, the lack of a precise definition of normal imaging, the poor sensitivity of imaging modalities, and whether tendinopathy can be excluded in the presence of normal imaging was presented. 6
To determine the validity of diagnostic tests, imaging is commonly used as the reference standard, 6 with USI the preferred method for assessing tendon structure.18,19 However, a lack of consistency in the application of USI methods has been suggested as a contributing factor in the uncertainty as to the relevance of USI in the diagnosis of AT.10,16,20 Standardization of the application of USI in the assessment of AT has been demonstrated as a reliable method for assessing tendon structural change,21,22 and may offer a more consistent reference standard for determining the diagnostic accuracy of clinical tests. Recently, a reliable method for staging AT based on USI features was developed, 21 subclassifying AT as “Normal,” “Reactive/Early Dysrepair” (R/ED), and “Late Dysrepair/Degenerative” (LD/D). This provides a unified method for describing the degree of structural change identified on USI. It has been proposed that identifying subclassifications of AT may allow clinicians to tailor treatment programs to the appropriate stage of tendinopathy and more accurately monitor progress.1,2,13
With the development of a consensus agreement on the essential clinical diagnostic criteria to diagnose AT, 6 and a consistent method for describing the degree of structural change on USI, 21 there is scope to investigate the diagnostic accuracy of clinical tests, their relationship to structural change observed on USI, and the potential role of USI in the clinical diagnosis of AT. Thus, the primary aim of this pilot cross-sectional study was to explore methodological issues in assessing the diagnostic accuracy of previously identified 7 clinical tests in AT, as determined using USI as the reference standard, in participants with symptomatic clinically diagnosed AT. Furthermore, this study aimed to provide guidance for future larger research studies in developing a more uniform method for diagnosing AT and defining the role of USI in both research and clinical practice.
Methods
The reporting of the pilot diagnostic accuracy study followed the Standards for Reporting Diagnostic Accuracy Studies checklist. 23
Participants
A convenience sample of participants was recruited between March 2022 and March 2023 from local physiotherapy clinics on the Gold Coast, Australia. All potential participants were seeking treatment for symptomatic AT, with a diagnosis of AT determined by their treating physiotherapist, not involved in this study. Potential participants were screened by the lead researcher (WM) to ensure they met the inclusion criteria (Table 1). They were provided with an explanatory statement and gave consent to participate in the study. The same physiotherapist (WM) completed both the clinical assessment and USI assessment. The clinical assessment process is provided in Figure 1. Ethical approval for the study was provided by Bond University Human Research Council (WM00028).
Participant Inclusion and Exclusion Criteria.

Clinical assessment process.
The index tests for this study were extracted from a recent scoping review. 7 Table 2 provides an overview of the subjective assessment, patient-reported outcome measures (PROMs), and objective index tests used.
Index Tests for Achilles Tendinopathy.
Abbreviations: PROMs, patient-reported outcome measures; AT, Achilles tendinopathy; VAS, Visual Analogue Scale; VISA-A, Victorian Institute of Sport Assessment—Achilles; PCS, Pain Catastrophizing Scale; SF-12, 12-Item Short Form Survey.
Subjective Assessment
The subjective assessment included questions on the location of pain, pain with tendon loading and tendon stiffness.
Patient-Reported Outcome Measures
Victorian Institute of Sport Assessment—Achilles (VISA-A)
The VISA-A is a valid and reliable tool that has been widely used to assess severity and disability for people with AT.7,24 -27 The VISA-A contains eight questions covering the domains of pain (questions 1-3), function (questions 4-6), and activity (questions 7-8). 25 The maximum score is 100, with healthy subjects scoring a minimum of 96. 25
Pain Catastrophizing Scale
The PCS is a valid and reliable instrument used to measure catastrophic thinking related to pain, 28 is one of the more common instruments used to assess a person’s pain experience with AT,7,24 and has been used as a method to assess the psychological effect of AT. 29 Participants are asked to indicate the degree to which they have specific thoughts and feelings when experiencing pain, using a scale from 0 (not at all) to 4 (all the time). A total score is yielded (ranging from 0 to 52), with a score of 30 or higher indicating a clinically relevant level of catastrophising. 28
12-Item Short Form Survey
The 12-Item Short Form Survey (SF-12) is a questionnaire commonly used to assess quality of life in people with AT.7,24,30 It contains 12 questions that assess eight key domains: physical limitations due to health difficulties, social limitations due to physical or emotional difficulties, limitations in activities in daily living due to physical difficulties, physical pain, general mental health, limitations in activities in daily living due to emotional difficulties, vitality, and general health perceptions. 31 Scoring was completed using version 1.0 of the SF-12. 32
Objective Assessment
Objective assessment of the Achilles tendon was conducted after USI using commonly reported measures.7,24,33 -35 Tests were performed in a standardized order: palpation, the arc sign, and Royal London Hospital Test (RLHT), followed by clinician demonstration and verbal instruction for the single-leg heel raise (SLHR) and hopping tests. Each test was completed on the non-affected limb first, with rest periods to allow symptoms to return to baseline before proceeding.
Palpation
Tendon palpation (pain, swelling, and thickness) is the most commonly utilized clinical test to diagnose AT. 7 Palpation has previously demonstrated moderate specificity (73%-85%) and sensitivity (58%-84%) when compared with USI in the diagnosis of AT.33,34 Palpation was performed as previously reported.7,33,34
The arc sign
The arc sign has been shown to demonstrate low to moderate sensitivity (25%-53%) and high specificity (83%-100%) for assessing AT when compared with USI.33,34 The arc sign was performed as previously reported.7,33,34 If intratendinous swelling was not present, the thickest portion of the tendon was palpated.
Royal London Hospital Test
The RLHT has previously demonstrated moderate sensitivity (51%-54%) and high specificity (91%-0.93%) when compared with USI in assessing AT.33,34 The RLHT was performed as previously reported.7,33,34
Single-leg heel raise
The SLHR is commonly utilized in the diagnosis of AT to assess pain during tendon loading and functional capacity.7,24,33 It has demonstrated low sensitivity (22%) and high specificity (93%) compared with USI. 33 The test was performed as previously described,8,36 on both the affected and non-affected legs, and ceased when full range could no longer be achieved or pain prevented continuation. Pain was recorded using a Visual Analogue Scale (VAS), with clinicians documenting total repetitions, repetitions before pain onset, and pain test completion. 7
Hopping
Hopping was utilized to assess tendon pain and functional capacity under high loads.7,24,33 It has previously demonstrated moderate sensitivity (43%) and high specificity (87%) compared with USI for AT diagnosis. 33 The test was performed as described by Silbernagel et al,36,37 with participants hopping rhythmically on the affected then non-affected leg. Testing ceased, consecutive hops were no longer possible due to pain or after 50 repetitions. Previous studies stopped at 25 repetitions, 36 however, this was extended to account for fitness variability. Pain was recorded using a VAS, and clinicians documented total repetitions, repetitions before pain onset, and pain test completion.
Ultrasound Assessment
USI was utilized as the reference standard for assessing the accuracy of the index tests. USI was completed by the same researcher (WM) who completed the subjective assessment, PROMs, and objective assessment. USI was completed using a Phillips Lumify L12-4 Linear Array Transducer (Koninklijke Philips N.V., Amstelplein 2, 1096 BC Amsterdam, The Netherlands). All participants were assessed according to the European Society of Musculoskeletal Radiology guidelines for the ankle. 38 The ultrasound images captured during the examination were not assessed at this time. Images were assessed following a 48-hour wash-out period to minimize clinician bias. As demonstrated in Table 3, the images were assessed according to the previously developed criteria 21 and sub-grouped as structurally normal or structurally abnormal (R/ED and LD/D).
Ultrasound Imaging Criteria for Diagnosing and Sub-Grouping Achilles Tendinopathy.
Statistical Analysis
General patient characteristics were reported using mean (SD) or median (interquartile range, IQR) for continuous variables, depending on data distribution. Normality was checked using a combination of histograms, normal Q-Q plots, and the Shapiro-Wilk test. Data was pooled for structurally abnormal tendons. Variables, such as AT stiffness, AT loading stiffness, pain with SLHR, and pain with hopping were dichotomized as present or not present to facilitate statistical analysis. Positivity rates for these categorical variables were calculated with 95% confidence intervals (CIs) using the Clopper-Pearson exact method, and the exact binomial test was used to determine statistical significance. Diagnostic performance metrics including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), were derived from 2 × 2 contingency tables with 95% CI calculated using Clopper-Pearson exact intervals; Fisher’s exact test was used to evaluate statistical significance for these comparisons. Continuous variables were analyzed using receiver operating characteristic (ROC). The area under the curve (AUC) was estimated with bootstrapped 95% CI to assess discriminative ability. Youden’s index was calculated to identify optimal cutoff values for continuous variables, with a value >0.80 considered clinically significant. 39 Effect sizes were reported as odds ratios (ORs) with 95% CI and Fisher’s exact test for categorical variables, and Hedges’ g with bootstrapped 95% CI for continuous variables. Exact tests and bootstrapping were used due to the small sample size (N = 23) and non-normal distribution of some variables, ensuring more reliable estimates and valid inference without relying on parametric assumptions. Results were deemed to be statistically significant at the P < .05 level. Statistical analysis was completed using R (Version 4.5.1) 40 and IBM SPSS Statistics (IBM Corp. Released 2021. IBM SPSS Statistics for Windows, Version 28.0. Armonk, New York: IBM Corp.).
Results
General characteristics are presented in Table 4. All 23 participants (14 male and 9 female) had unilateral AT. USI assessment determined 6 participants had structurally normal AT, with 17 participants demonstrating structurally abnormal AT (12 R/ED and 5 LD/D). There were some differences in the number of repetitions between the affected and non-affected sides in the hopping test; however, the differences were not statistically significant. Overall, participants with abnormal AT demonstrated a lower number of median repetitions, 15.0 (IQR = 5.0-40.0), with participants with both R/ED AT (18.5, (IQR = 5.0-37.8) and LD/D AT (6.0, IQR = 3.0-50.0) demonstrating lower repetitions compared with their non-affected limb (R/ED = 50.0 [48.0-50.0]; LD/D = 48.0 [40.0-50.0]).
General Characteristics of Included Participants.
Abbreviations: USI, ultrasound imaging; R/ED, reactive/early dysrepair; LD/D, late dysrepair/degenerative; n, number; %, percentage (percentages under USI groups are of the grand total for the variable); SD, standard deviation; AT, Achilles tendon.
Patient-Reported Outcome Measures
Overall, participants with clinically diagnosed AT recorded lower self-reported function (66.2 [SD = 25.3]) and lower VISA-A scores (61.9 [SD = 19.1]) (Table 5). While not statistically significant, there were larger differences in self-reported function between participants with structurally normal AT (88.3 [7.5]) and structurally abnormal AT (58.4 [SD = 24.8]). Similarly, VISA-A scores were lower in participants with structurally abnormal AT (55.4 [SD = 16.2]) and structurally normal AT (80.2 [SD = 14.8]).
Patient-Reported Outcome Measures.
Abbreviations: USI, ultrasound imaging; R/ED, reactive/early dysrepair; LD/D, late dysrepair/degenerative; n, number; SD, standard deviation.
Objective Assessment
The positivity rates for self-reported stiffness, stiffness with activity, pain on palpation, the arc sign, the RLHT, self-reported pain with SLHR and self-reported pain with hopping were calculated. Four tests demonstrated statistical significance (Figure 2): 3 tests—self-reported stiffness, stiffness with activity, and pain on palpation—each showed a positivity rate of 82.6% (95% CI = 0.61-0.95; P = .003) while 78.3% of participants reported pain with hopping (95% CI = 0.56-0.92; P = .011).

Positivity rates of clinical tests for the presence of Achilles tendinopathy, accompanied by 95% CI (as determined by the exact binomial test, a rate is statistically significant if the 95% CI do not include 0.5).
Diagnostic Accuracy
Patient-reported outcome measures
The ROC analysis in Figure 3 evaluated the ability of PROMs to discriminate between the presence or absence of structural change on USI. The VISA-A demonstrated the highest accuracy, with an AUC of 0.92 (95% CI = 0.79-1.00), followed by self-reported function with an AUC of 0.89 (95% CI = 0.76-1.00). Youden’s index identified optimal thresholds of 60.5 for the VISA-A and 77.5 for self-reported function. The PCS demonstrated moderate accuracy in detecting the presence of structural change (AUC = 0.77, 95% CI = 0.50-1.00); however, no participant scored higher than 30, which would indicate a clinically relevant level of catastrophising. 28

Receiver operating characteristic (ROC) curves differentiating between Achilles tendinopathy with structural change on ultrasound imaging using patient-reported outcome measures.
Objective tests
Figure 4 demonstrates the diagnostic accuracy of tests to differentiate between AT without structural change on USI (normal) and with structural change on USI (R/ED or LD/D). Of the 7 tests assessed, only hopping pain (P < .01) and pain on palpation (P = .04) demonstrated statistical significance. Hopping pain demonstrated high sensitivity (0.94; 95% CI = 0.71-1.00), and high PPV (0.89; 95% CI = 0.65-0.99), with pain on palpation also demonstrating high sensitivity (0.94; 95% CI = 0.71-1.00) and high PPV (0.84; 95% CI = 0.60-0.97). Self-reported stiffness and stiffness with activity demonstrated high sensitivity (stiffness = 0.88, 95% CI = 0.64-0.99; stiffness with activity, 0.82, 95% CI = 0.57-0.96); however, results were not statistically significant. The arc sign (1.00; 95% CI = 0.54-1.00), RLHT (0.83; 95% CI = 0.36-1.00) and SLHR pain (0.83; 95% CI = 0.36-1.00) demonstrate high specificity and PPV (arc test = 1.00, 95% CI = 0.59-1.00; RLHT = 0.91, 95% CI = 0.59-1.00; SLHR pain = 0.92, 95% CI = 0.62-1.00); however, none were statistically significant.

Heatmap showing diagnostic accuracy metrics for 7 clinical tests to detect structural changes identified on ultrasound imaging.
Effect Size
Given the pilot study’s low sample size (N = 23), effect sizes were calculated to evaluate associations between the clinical tests, PROMs and USI-identified structural change. Figure 5 summarizes the ORs of the categorical variables. Pain on palpation and hopping demonstrated the strongest associations with structural change, with statistically significant results (P = .04 and P < .01, respectively). The RLHT and SLHR pain demonstrated elevated ORs, but the associations were not statistically significant. The arc sign produced an undefined OR due to sparse data. Confidence intervals were wide across all tests, reflecting the limited sample size and should be interpreted with caution. All of the PROMs demonstrated a medium to very large effect size (Figure 6).

Effect size of clinical tests measured by odds ratios.

Effect size of patient-reported outcome measures assessed with Hedges’ g.
Discussion
This pilot study examined methodological challenges in evaluating the diagnostic accuracy of clinical tests for AT, specifically in distinguishing between AT with and without structural changes. All participants were clinically diagnosed with AT by their treating physiotherapist, without strict criteria, reflecting the lack of consistency in AT diagnosis.5 -7,33 Recent consensus identified essential diagnostic elements of AT, with pain location, pain during activity, tests that provoke pain, and palpation to assess pain considered essential, while measures, such as pain onset, stiffness, and self-reported function, were considered supportive but not essential. 6 A scoping review 7 further recommended incorporating PROMs addressing disability, psychological factors and quality of life, aligned with the core health-related domains of tendon pathology. 24
The results of this pilot study largely align with recent consensus recommendations. 6 Pain during activity was assessed through self-reported pain during Achilles tendon loading tests (SLHR and hopping), though it can also be captured via subjective history. 6 Pain on palpation demonstrated the high positivity rate (82.6%), followed by hopping (78.3%), both reinforcing the prevalence of pain with tendon loading and palpation in symptomatic AT. 6 In contrast, perceived stiffness, while reaching consensus, was not deemed essential for diagnosis. 6 Our results suggest similar positivity rates for self-reported stiffness and stiffness during activity as for palpation, indicating that tendon stiffness may warrant consideration in the diagnosis of AT. The AT-specific tests, such as the arc sign and RLHT, which consensus excluded from essential diagnostic criteria, 6 demonstrated the lowest positivity rates in this study, supporting their limited diagnostic value.
Although imaging reached consensus as supportive rather than essential for AT diagnosis, 6 debate persists regarding its role, particularly when imaging appears normal, uncertainties surrounding disease staging, defining “normal,” and the limited sensitivity of imaging modalities. 6 Inconsistent application of USI has been cited as a contributor to this uncertainty.10,16,20 However, standardized USI protocols have demonstrated reliability in assessing tendon structural change,21,22 and may offer a consistent reference standard for evaluating the diagnostic accuracy of clinical tests. Therefore, this study examined the diagnostic accuracy of clinical tests for detecting USI-confirmed structural change. Pain on palpation and pain during hopping both demonstrated high sensitivity, high PPV, and large ORs, suggesting these tests not only serve as essential diagnostic criteria for AT 6 but may also indicate structural changes. However, their significant positivity rates in participants without structural change highlight the need for additional assessments before recommending USI. While not statistically significant, SLHR demonstrated high specificity and elevated ORs, indicating pain during SLHR testing may warrant imaging. Self-reported stiffness, while not considered essential diagnostic criteria, 6 also demonstrated high sensitivity for structural change and may be a useful adjunct when combined with pain-provoking tests, such as pain on palpation and pain with hopping. Overall, stiffness may help guide decisions on USI rather than serve as a standalone diagnostic marker.
Although PROMs are not considered essential for diagnosing AT, 6 they have been proposed as useful adjuncts for AT diagnosis and management.7,35 Self-reported function reached consensus agreement for the diagnosis of AT, but was not considered essential. 6 This study identified that participants with structurally normal AT reported higher mean scores for self-reported function (88.3) and VISA-A (80.2) compared with those with structural changes (58.4 and 55.4, respectively). Receiver operating characteristic analysis demonstrated clinically significant AUC values for both measures in detecting USI-confirmed structural change, with thresholds of 77.5 for self-reported function and 60.5 for the VISA-A. Effect size analysis further supports their potential as adjuncts for guiding imaging decisions. Incorporating PROMs alongside pain-provoking tests may help clinicians identify patients most likely to exhibit structural changes on USI, enabling more targeted imaging and efficient resource use.
The secondary aim of this study was to inform future research by promoting a more standardized approach to diagnosing AT and clarifying the role of USI in both research and clinical settings. While the essential diagnostic criteria established by consensus 6 align with these findings, this study adds insight into non-essential tests and PROMs that may help determine when USI is warranted (Figure 7). These results provide a foundation for developing a structured decision-making model that integrates pain-provoking tests, self-reported stiffness, and PROM thresholds to guide imaging decisions. Future research should validate this model through a larger, adequately powered diagnostic accuracy study, enabling refinement of test combinations, threshold values, and predictive algorithms.

Clinical decision flow chart to determine the role of ultrasound imaging in the diagnosis of Achilles tendinopathy.
Limitations
The main limitation of this study was the small sample size, limiting the ability of statistical tests to detect small or meaningful differences that may be present in larger sample sizes, and needing to interpret the results with caution. Although prior research 41 suggests a sample size of 65 is needed for statistical significance, this pilot study, primarily aimed to evaluate the diagnostic utility of clinical tests for AT and explore the role of USI to inform a larger study. Standardizing clinical test methods is essential for determining their relevance, yet considerable variability exists.5 -7,33,34 Silbernagel et al 36 described the SLHR protocol for endurance and hopping used in this study, demonstrating excellent reliability (intraclass correlation coefficient = 0.76-0.94). Standardization of SLHR testing parameters, including body position, height of raise, tempo, and termination criteria, has been recommended,42 -44 but consensus on its purpose, optimal parameters, adequate outcome measurements, or the appropriate normative values remains lacking.45,46 Similarly, hopping has been proposed as a method for assessing muscle-tendon elastic function,46,47 however, methods vary with recommendations to use objective measures of hopping height and distance to quantify AT-related function. 46
Conclusion
Pain on palpation, pain during hopping, and self-reported stiffness exhibited the highest rates of positivity among individuals with symptomatic AT. Notably, pain on palpation and pain with hopping may serve as useful clinical indicators of underlying structural changes detectable via USI. The diagnostic value of these findings appears to be enhanced when combined with self-reported stiffness, functional status, and VISA-A scores. To validate these preliminary findings, larger-scale studies are warranted to evaluate the diagnostic accuracy of clinical tests and their association with tendon structural pathology in both symptomatic and asymptomatic populations. Future research should incorporate USI as the reference standard to clarify its role in the clinical diagnostic process and its utility in identifying potential differential diagnoses.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by an Australian Government Research Training Program Scholarship.
Ethical Approval and Informed Consents
Ethical approval for this study was provided by the Bond University Human Research Ethics Council (WM00028). Informed consent was obtained from all participants both verbally and in writing.
Trial Registration
Not applicable, because this article does not contain any clinical trials.
Data Availability Statement
All data produced in the present study are available upon reasonable request to the corresponding author.
