Abstract
Background:
Gastrocnemius tightness contributes to several foot and ankle pathologies. The Silfverskiöld test, traditionally being performed in supine or sitting positions, differentiates isolated gastrocnemius and combined gastrocnemius-soleus tightness. The senior author of this study has performed a prone-position modification in clinical practice for simplicity, practicality, and ability to provide bilateral comparison. This diagnostic reliability study aims to establish the prone-position modification of the Silfverskiöld test and compare dorsiflexion measurements with established techniques.
Methods:
Prone-position modification of the Silfverskiöld test is performed with feet extending beyond the couch. Both feet are tested together with passive dorsiflexion first with knees in full extension and then in 90 degrees flexion. After initial evaluation in 5 healthy volunteers, the technique was compared with supine, sitting, and lunge tests in 15 patients with diverse foot and ankle pathologies. A consultant foot and ankle surgeon and musculoskeletal physiotherapist each performed the tests twice. Interobserver and intraobserver reliability were assessed using ICCs (2-way random effects model). Dorsiflexion differences among prone, supine, and sitting positions were examined using repeated measures analysis of variance (ANOVA) with Tukey post hoc testing. Each patient was asked to grade the 4 tests as per degree of comfort during these tests.
Results:
The prone-position modification showed good to excellent reliability. Interobserver ICCs were 0.89 (95% CI 0.77-0.97) in full extension and 0.89 (95% CI 0.79-0.97) in 90 degrees flexion. Intraobserver ICCs ranged from 0.94 to 0.97. ANOVA demonstrated no significant differences in dorsiflexion among prone, supine, and sitting positions in full extension (P = .89) or 90 degrees flexion (P = .80), with post hoc testing confirming equivalence. Majority of patients (86.7%) rated the prone test as the most comfortable.
Conclusion:
The prone-position modification of the Silfverskiöld test provides reliable, clinically comparable dorsiflexion measurements and is well tolerated. Although findings support its utility, the modest sample size suggests that further validation in larger cohorts is warranted.
Level of Evidence:
Level II, diagnostic reliability study.
Keywords
Introduction
Accurate assessment of ankle dorsiflexion is fundamental in the diagnosis of a range of foot and ankle conditions, especially those precipitated by isolated gastrocnemius contracture. Limited dorsiflexion, known as ‘equinus’, contributes to abnormal forefoot loading and altered gait mechanics and is recognised as a causal factor in conditions such as plantar fasciitis, hallux valgus, Achilles tendinitis, and metatarsalgia.1 -6
The Silfverskiöld test, originally described by Nils Silfverskiöld in the early 20th century, remains the gold standard for differentiating gastrocnemius contracture from soleal or Achilles tendon–related equinus.6 -8 The test is traditionally performed in supine or sitting positions, relying on passive measurement of dorsiflexion with the knee extended and then flexed. Despite its routine use, the Silfverskiöld test demonstrates notable limitations. Multiple studies have highlighted considerable interobserver and intraobserver variability in the clinical application of the test, reporting poor-to-moderate interobserver and intraobserver reliability.8 -10 These findings mirror the broader dorsiflexion measurement literature, in which non–weight-bearing goniometric methods frequently show only fair to moderate interrater agreement for both passive and active ankle motion.11 -13
The prone-position modification of the Silfverskiöld test has been proposed as a potential means of addressing several sources of measurement variability associated with traditional testing positions. Positioning the patient prone with the feet extending beyond the examination couch may facilitate consistent examiner access to the hindfoot, improve control of subtalar and forefoot motion, and allow bilateral assessment without repositioning. Although prone positioning may plausibly reduce voluntary resistance during passive dorsiflexion, the contribution of muscle relaxation to measurement consistency has not been established, and passive ankle range of motion is known to be influenced by multiple factors including examiner technique and perceptual end-point determination.14 -16 Given these uncertainties, systematic evaluation of interobserver and intraobserver reliability is essential before the prone modification can be considered for broader clinical adoption.
This study aims (1) to assess the interobserver and intraobserver reliability of the prone test within routine clinical practice and (2) to compare dorsiflexion measurements derived from the novel prone-position modification of the Silfverskiöld test with the standard supine, sitting, and lunge test techniques. This work seeks to increase the reproducibility and clinical confidence of gastrocnemius contracture assessment by establishing the prone position measurement characteristics.
We hypothesise that the prone modification will demonstrate improved reliability, both interobserver and intraobserver, compared with traditional positions.
Materials and Methods
Study Design and Participants
A diagnostic reliability study was conducted at a tertiary foot and ankle clinic in a university teaching hospital by 2 independent examiners—a senior foot ankle orthopaedic consultant and a musculoskeletal physiotherapist. This was initially conducted in a 5-patient healthy control group, and then 15 adult patients presenting with various foot and ankle pathologies were enrolled. Inclusion criteria comprised adults (≥18 years) with non-neuromuscular foot and/or ankle conditions for which the Silfverskiöld test evaluation was indicated. Exclusion criteria included acute fracture, soft tissue injuries, fixed deformity of the ankle or previously fused ankle, or neurologic disease. Both unilateral and bilateral pathologies were included. Each foot was analysed independently. All 15 patients and 5-patient healthy control group were analysed for interobserver reliability and intraobserver reliability (n = 40 feet). Differences in dorsiflexion angles among positions (sitting, supine, lunge, prone) were analysed using the 15 patients (n = 30).
The study protocol was approved as a quality improvement project by the University Hospital Quality Improvement department, and all participants provided informed consent.
Silfverskiöld Test Techniques
Four Silfverskiöld test techniques were performed on each subject (in random order to minimise measurement bias):
For all passive manoeuvres, the examiner applied dorsiflexion force until a firm but non-painful end-range resistance was perceived. No external force gauge or torque device was used. Instead, examiners followed a predefined description of the endpoint as the first point at which a clear resistance was felt, without heel lift, midfoot collapse, or patient guarding, and dorsiflexion was not forced beyond this point. All measurements were performed at both knee extension (0 degrees) and flexion (90 degrees).

Prone-position modification demonstrating the measurement of maximum passive dorsiflexion at 0 degrees knee flexion using goniometer.

Prone-position modification demonstrating the measurement of maximum passive dorsiflexion at 90 degrees knee flexion using goniometer.

Prone-position modification demonstrating the ability to be performed simultaneously bilaterally.
Subtalar Joint Neutral Positioning
The subtalar joint was placed in neutral before each measurement using a standard clinical palpation method. The examiner palpated the talar head on both the medial and lateral aspects of the ankle and adjusted the foot until the talar head was felt equally on both sides, indicating neutral alignment (Figure 4). The calcaneus was then stabilised manually to maintain this position throughout dorsiflexion. Figures 5 and 6 demonstrate varus and valgus positions of the hindfoot respectively in which a hindfoot axis line was drawn by bisecting the posterior calcaneus, and its deviation was used to identify non-neutral subtalar alignment.

Hindfoot axis line bisecting the posterior calcaneus demonstrating neutral position.

Hindfoot axis line bisecting the posterior calcaneus demonstrating subtalar joint varus alignment.

Hindfoot axis line bisecting the posterior calcaneus demonstrating valgus alignment.
Observers and Reliability Assessment
A consultant orthopaedic foot and ankle surgeon and an experienced musculoskeletal physiotherapist independently performed the prone-position modification of the Silfverskiöld test, twice each, on 5 healthy controls. They further performed each Silfverskiöld test variation twice on all patients. Each observer was masked to the other’s measurements.
Intraobserver reliability was assessed by repeating the prone modification test twice by each observer, under an identical setup, on a 5-patient control group and 15 patients. Both left and right sides were included, giving n = 40. Interobserver reliability was assessed by comparing dorsiflexion measurements between the 2 assessors for the prone modification test technique on the 5-patient control group and 15 patients, giving n = 80 (Figure 7).

Bland-Altman plots showing interobserver agreement for dorsiflexion. (A) Supine-knee extension. (B) Supine-knee flexion. (C) Sitting-knee extension. (D) Sitting-knee flexion. (E) Prone-knee extension. (F) Prone-knee flexion. (G) Lunge-knee extension. (H) Lunge-knee flexion.
Measurement Tools
All angular measurements were obtained using a universal 66Fit, UK, plastic goniometer with 1-degree graduation markings. Before data collection, both goniometers used in the study were checked for alignment by confirming 0 degrees against a flat reference surface and 90 degrees against a calibrated square. No drift or misalignment was detected.
Statistical Analysis
Interobserver and intraobserver reliability was quantified using the intraclass correlation coefficient (ICC), calculated using a 2-way random effects model of absolute agreement. 18 CIs for ICC values were derived using 2000 bootstrap resamples. ICC values were interpreted per established guidelines (ICC < 0.50, poor; 0.50-0.75, moderate; 0.75-0.90, good; >0.90, excellent). 19
All analyses were performed with each eligible limb treated as an independent observational unit for both reliability and comparative analyses. Dorsiflexion range and gastrocnemius tightness commonly vary between sides, making the limb the appropriate clinical and analytic unit. Although this approach may introduce mild non-independence between limbs from the same participant, it preserves limb-specific variation that would be lost through averaging and reflects how these tests are used in practice. This approach may slightly underestimate SEs and produce modestly liberal P values, which was considered when interpreting statistical significance.
A comparative analysis by assessing differences in dorsiflexion angles among positions (sitting, supine, lunge, prone) using repeated measures analysis of variance (ANOVA), with Tukey honestly significant difference (HSD) for post hoc pairwise comparisons, given its ability to control for family-wise type I error when conducting multiple pairwise comparisons. 20
The weight-bearing lunge test relies on patient-generated muscle activation and bodyweight forces; hence, its biomechanical profile differs from the passive, examiner-controlled supine, sitting, and prone assessments. Including the lunge test in the repeated measures ANOVA would violate the assumption that all conditions share identical force application and end-point criteria; hence, it was excluded from inferential analysis. The weight-bearing lunge test was analysed descriptively as a contextual functional dorsiflexion measure.
Patient Comfort Ranking
After completing all 4 test positions, the 15 participants were asked to rank the test positions in order of discomfort, from least to most uncomfortable. This provided a relative comparison of perceived tolerability between test positions, rather than a numerical pain or comfort score.
No subgroup or interaction analyses were conducted, as all participants were assessed under identical conditions using a within-subject design focused on measurement reproducibility.
No missing data occurred, as all measurements were obtained prospectively and verified at the time of testing. Consequently, no imputation or data exclusion procedures were necessary.
All participants were recruited from outpatient clinics using a consecutive sampling approach. As this cross-sectional reliability study was conducted during a single testing session, no follow-up loss occurred. Analytical methods accounted for within-subject correlation through the use of repeated measures ANOVA, and no sampling-weight adjustments were required.
No formal sensitivity analyses were conducted. All participants completed the full testing protocol. No analytical assumptions required sensitivity testing.
Sample Size Justification
An a priori sample size calculation was performed for the primary reliability outcome. Reliability was to be assessed using a 2-way random effects intraclass correlation coefficient (ICC, absolute agreement). In accordance with the interpretive framework adopted in this study (ICC <0.50, poor; 0.50-0.75, moderate; 0.75-0.90, good; >0.90, excellent), an ICC of 0.75 was defined as the minimum acceptable level of reliability (transition between moderate and good), and an ICC of 0.90 as the anticipated level for the protocol.
Sample size was estimated using Walter et al.’s 21 method for ICC reliability studies. We targeted an ICC of 0.75 as the minimum acceptable level and 0.90 as the anticipated reliability, with α = 0.05, 80% power, and 2 raters. This calculation indicated that approximately 18 participants were required; we recruited 20 to allow for attrition, which met and slightly exceeded the a priori requirement. The final sample (15 patients and 5 healthy controls), therefore, met and exceeded the a priori requirement.
Results
Patient Demographics and Baseline Characteristics
The study cohort comprised 15 patients, with a total of 30 feet, and 5 healthy controls, with a total of 10 feet. Pathologies were varied among patients: foot and ankle arthritis (six limbs), flatfoot deformities (four limbs), metatarsalgia (three limbs), foot and ankle tendinopathy (three limbs), and plantar fasciitis (one limb).
Dorsiflexion Angle (degrees) for Each Test Position (Mean ± SD)
Table 1 presents dorsiflexion angles for each test position. Values were lowest across all positions in full knee extension and increased consistently in 90 degrees knee flexion and in the weightbearing lunge test.
Mean ± SD Dorsiflexion Angles (degrees) for Each Test Position (Pooled Limb-Level Data).
In full knee extension, dorsiflexion was 3.56 ± 4.89 degrees in supine, 3.30 ± 3.83 degrees in sitting, and 4.80 ± 4.11 degrees in prone. With the knee flexed to 90 degrees, dorsiflexion increased to 9.06 ± 4.75 degrees (supine), 8.18 ± 4.43 degrees (sitting), and 14.70 ± 4.73 degrees (prone). Weightbearing lunge dorsiflexion measured 9.39 ± 3.33 degrees in knee extension and 15.39 ± 5.16 degrees in knee flexion.
Interobserver and Intraobserver Reliability of the Prone Test
The prone test demonstrated good to excellent agreement (Table 2). Interobserver reliability was 0.89, and intraobserver reliability was 0.94 to 0.97.
Intraclass Correlation Coefficients for the Prone Position Test.
Abbreviation: ICC, intraclass correlation.
Comparative Analysis of Dorsiflexion Angles
With the numbers available, a 1-way repeated measures ANOVA showed no significant differences across supine, sitting, and prone positions in both full extension and 90 degrees flexion (Table 3).
Repeated Measures ANOVA for Dorsiflexion by Position.
Abbreviation: ANOVA, analysis of variance.
Post hoc Tukey HSD comparisons confirmed equivalence of the prone test with supine and sitting positions (Table 4).
Pairwise Tukey Post-Hoc Comparisons.
Patient Comfort Assessment
Patients ranked the test positions in order of discomfort, from least to most uncomfortable, as presented in Table 5. Overall, the prone test position was most frequently ranked as the least uncomfortable, by 13 of the 15 patients (86.7%). On the other hand, the weightbearing lunge test was most often ranked as the most uncomfortable. Supine and sitting positions were typically ranked as intermediate in perceived discomfort. These rankings demonstrate clear differences in patient tolerability between test positions.
Participant-Reported Ranking of Test Positions According to Discomfort (Least to Most).
Discussion
This study evaluated the reliability and measurement characteristics of a prone-position modification of the Silfverskiöld test and compared its dorsiflexion values with those obtained in traditional supine and sitting positions.
Variability in dorsiflexion measurement has been attributed to inconsistent torque application, difficulty controlling forefoot motion, and patient guarding, which are factors that similarly affect performance of the Silfverskiöld test.8,9 Although torque-controlled instruments and digital measurement devices can reduce examiner-dependent error, such systems are not easily incorporated into routine outpatient practice.10 -12 Furthermore, subjectivity, lack of standardised force application, and variable anatomical landmarks contribute to variability. An example of this is the weight-bearing lunge modification of the traditional Silfverskiöld technique which generally yields higher dorsiflexion angles than non-weight-bearing methods. This reflects the additional force generated by body mass and functional engagement of peri-ankle tissues. 22 Consequently, there remains a need for a simple, reproducible clinical method for assessing ankle dorsiflexion and identifying gastrocnemius tightness.
With the sample size available, the prone-position modification of the Silfverskiöld test yields dorsiflexion values equivalent to the traditional supine and sitting techniques, with no significant differences in full extension (P = .89) or 90 degrees flexion (P = .80). Pairwise comparisons reaffirmed the prone test’s equivalence to supine (P = .91 extension; P = .90 flexion) and sitting (P = .97 extension; P = .64 flexion) measurements. These results suggest that the prone technique can generate clinically comparable dorsiflexion values within the context of this study.
For the prone test, interobserver reliability was assessed across all 15 patients and the 5-patient control group, pooling both test occasions and both sides (n = 80 per angle). Agreement between both examiners was good for both full extension (ICC = 0.89, 95% CI 0.77-0.97) and 90 degrees flexion (ICC = 0.89, 95% CI 0.79-0.97) (Figure 7). Intraobserver reliability, assessed in the same manner with both sides combined, n = 40 per examiner, was also consistently excellent. Examiner 1 achieved ICCs of 0.95 (95% CI 0.90-0.98) for full extension and 0.94 (95% CI 0.89-0.97) for 90 degrees flexion, whereas examiner 2 achieved ICCs of 0.97 (95% CI 0.93-0.99) and 0.95 (95% CI 0.90-0.98), respectively.
These values surpass the moderate reliability often reported for manual Silfverskiöld assessments8 -12 and approximate those achieved with instrumented devices. However, reliability was assessed only for the prone position, and no formal reliability estimates were generated for the supine or sitting techniques within this study. As such, these findings support feasibility rather than superiority and should not be assumed to generalise beyond similar clinical settings. In addition, the limited sample size and examiner profile mean that these reliability estimates should be interpreted as preliminary.
The mechanism underlying the observed reliability of the prone-position modification cannot be definitively established from the present data. The reliability observed with the prone-position modification should not be attributed to a single mechanism. Although prone positioning may plausibly reduce voluntary resistance or guarding during passive dorsiflexion, muscle activity was not directly measured. Existing evidence indicates that passive ankle range of motion largely reflects stretch tolerance and examiner-dependent end-point perception, suggesting that procedural factors such as stabilisation and control of compensatory motion may be more influential than muscle relaxation alone.15 -17
Importantly, 13 of the 15 patients (86.7%) reported the prone test to be most user-friendly and causing the least discomfort of all 4 positions (Table 5). Two patients had no preference, and all positions were well-tolerated. This finding highlights differences in patient comfort that may be relevant when selecting dorsiflexion assessment techniques in routine clinical practice. However, the ranking approach does not quantify discomfort or allow statistical comparison, and conclusions regarding patient preference must therefore remain cautious. These findings indicate that, within this small cohort, patients tended to view the prone test favourably relative to other positions. This enhanced patient acceptability for the prone-position modification of the Silfverskiöld test facilitates efficient bilateral and repeated assessments without prolonged rest intervals.
Clinically, the modification offers practical advantages, including simultaneous bilateral testing without repositioning, simplified examiner alignment, and enhanced patient comfort.
Several important limitations must be acknowledged. The sample size was small, and reliability estimates derived from small cohorts may overestimate reproducibility. The study population included heterogeneous foot and ankle pathologies, which may influence passive dorsiflexion and limit applicability to specific diagnostic groups. Examiners were not masked to their own prior measurements, and all testing occurred within a single session, introducing the possibility of measurement bias or recall effects. Additionally, treating each limb as an independent observational unit introduces a risk of pseudo-replication. Although this approach reflects limb-specific clinical assessment and preserves side-to-side variability, it may slightly underestimate SEs and inflate precision estimates.
It is important to note both examiners were highly experienced clinicians (a consultant surgeon and a senior musculoskeletal physiotherapist). Reliability may be lower among trainees or less experienced clinicians, limiting generalisability to broader clinical practice. Goniometric measurement remains operator-dependent, and force application was not quantified.
Future research should focus on validating these findings in larger and pathology-specific cohorts, incorporating clinicians with varying levels of experience, quantifying force application, and examining associations with functional and clinical outcomes.
Conclusion
The prone-position modification of the Silfverskiöld test may represent a valuable addition to the clinical assessment for gastrocnemius contracture and ankle dorsiflexion measurement. It demonstrates high interobserver and intraobserver reliability and offers a practical compromise between non-weight-bearing, functional (lunge) methods, and simultaneous bilateral testing without repositioning. Given these strengths, the prone-position modification of the Silfverskiöld test shows promise as a reliable alternative to traditional supine and sitting methods for assessing gastrocnemius tightness. Although these findings support the prone-position modification as a useful, comparable alternative of the traditional Silfverskiöld test, considering our study’s modest sample size, further validation by multiple examiners would be beneficial.
Supplemental Material
sj-pdf-1-fao-10.1177_24730114261417693 – Supplemental material for Prone-Position Modified Silfverskiöld Test: Feasibility and Reliability
Supplemental material, sj-pdf-1-fao-10.1177_24730114261417693 for Prone-Position Modified Silfverskiöld Test: Feasibility and Reliability by Maneesh Bhatia, Aditya Dhiran and Annette Jones in Foot & Ankle Orthopaedics
Footnotes
Ethical Considerations
The study protocol was approved as a quality improvement project by the University Hospital Quality Improvement department, and all participants provided informed consent.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Disclosure forms for all authors are available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
