Abstract
INTRODUCTION
Breast cancer survivors (BCS) experience a high prevalence of arm and shoulder dysfunction.1,2 This prevalence increases with more invasive forms of treatment such as mastectomy and breast reconstruction. 3 Additionally, these procedures come with the risk of shoulder disability. 4 Performing arm and shoulder rehabilitation exercises can mitigate these morbidities for BCS.5,6
The need to deliver clinical services, such as patient monitoring and shoulder rehabilitation exercises, and to share educational information more conveniently and remotely following treatment for breast cancer has led to the increased incorporation of mobile device applications (apps) into healthcare settings.7,8 The widespread use of smartphones has created opportunities that combine these technologies with healthcare services as smartphones can collect many types of data. 9
Of the data that can be collected by smartphones, those that can inform about changes in joint range of motion (ROM) can be particularly useful to address shoulder issues and for rehabilitation. 10 As restricted shoulder ROM is common morbidity following breast cancer treatment, 11 ROM measures at the initial assessment and following various rehabilitation exercises can provide objective measures that can be used to gauge the efficacy of interventions. As the early post-operative stages are where large changes in shoulder ROM are expected, 12 objective documentation of ROM gains or detection of lower-than-expected ROM changes over time could be very useful to prompt remote clinical reassessment and timely intervention as needed. Thus, sensor data collected as a BCS performs a shoulder movement or rehabilitation exercise while holding a smartphone in their hand can provide practical information to improve healthcare service and delivery.
There exist many smartphone apps validated for measuring human ROM. In a systematic review by Longoni et al. (2019), 15 different apps developed for angle measurement were summarized. Four of these apps were used to specifically test the reliability and validity of shoulder joint angle measurements. Clinometer by Plaincode, GetMyROM by Interactive Medical Productions, and Goniometer Pro by Digiflex Labs are inclinometer-based applications, whereas DrGoniometer is a photo capture-based application. Although shoulder movements tested and methods applied varied between studies,13–16 intra- and interobserver reliability interclass correlation coefficient (ICC) values ranged from 0.70 to 0.99, indicating good to excellent reliability. 17 Where validity was investigated, app measurements were compared to measurements taken by a universal goniometer: ICC values ranged from 0.60 to 0.97, also indicating good to excellent validity.13–15
A clinical limitation of the inclinometer-based apps mentioned above is that a clinician or trained person needs to be present to either place the smartphone on the upper extremity or record the data. Similarly, for the photo capture-based app an additional trained person must be present to take the photo, which is later used for joint measurement on the app. To our knowledge, there are very few apps currently available that can send and process independently collected sensor data about shoulder ROM from a patient's smartphone to a remote location (accessible to a clinician) using the internet. In a recent study by Soeters et al. (2023), a smartphone app that can be used independently without the assistance of a trained professional was tested for accuracy on healthy participants with full active ROM. When shoulder measures between the app and manual measurements via a goniometer were compared, excellent inter-rater reliability was found in all movements (ICCs 0.90–0.96). Intra-rater reliability ranged from good to excellent (ICC ≥ 0.75).18,19
ShApp (Avicenna™) is a novel, HIPAA and GDPR-compliant patient-centric health research app that was developed to remotely monitor should function of BCS via questionnaires and provide education. To also remotely monitor shoulder ROM following surgery on the same app via telehealth, our team developed an extension that uses the smartphone's gyroscope and accelerometer functions to collect data during shoulder movements. The data automatically uploads to a secure web application on the internet where it can be accessed by those with pre-assigned access and processed to provide a ROM output value.
Thus, the purposes of the study were to: (1) investigate the inter-rater and intra-rater reliability of shoulder ROM measures derived from the accelerometer and gyroscope time series data collected by ShApp on healthy cancer-free participants and (2) to determine whether ShApp can measure shoulder ROM with enough precision to distinguish between three ROM levels (low, medium, and high) which would allow clinicians to monitor changes remotely over the course of recovery. Our hypothesis was that ShApp would be able to measure common active shoulder ROM with good inter and intra-rater reliability and would be able to significantly distinguish changes in clinically useful threshold angles.
METHODS
Participants and study design
Ten healthy cancer-free participants were recruited from the community for this prospective observational study. Participants without any shoulder injuries or pathology that could affect shoulder ROM or pre-existing shoulder pain were included to allow for testing of the full available active shoulder ROM. The study protocol was approved by the university research ethics board (Bio-2327), and all participants provided written informed consent. Participant age, sex, height, and weight were recorded.
Testing protocol
All participants were tested at a university laboratory with two researchers present. Participants attended one session. At the start of the session, participants were given the option to either use a smartphone with the app already downloaded on the device or to download the app on their own device. The app can be used on an Android or iPhone. Sensor data on the device was activated to allow for recording. Next, for orientation and consistency of motions, participants were asked to watch short video clips on the app of an adult performing each shoulder movement correctly. The videos also highlight common compensation patterns and hand movements to avoid that could impact measurements.
Five different shoulder movements that are commonly assessed were tested: flexion, extension, abduction, and external and internal rotation. For each movement, a high-, medium-, and low-angle target value was predetermined (Table 1). Visual feedback on the target angles was provided for the participants through a poster. The height of the poster was adjusted for each participant's height. A full-length mirror was also made available to guide abduction and extension movements.
Pre-determined high-, mid-, and low-ROM values for tested shoulder movements
Next, the participants were asked to stand next to the poster with their shoulder or elbow aligned with the marked point of rotation. For flexion, abduction, and extension the starting position was the arm by the side, while for internal and external rotation the starting position was the humerus raised to 90° abduction and the forearm parallel with the floor. Each participant performed sets of three repetitions (with both right and left arms thus totaling six repetitions) for the five different shoulder movements to the three target ROM levels while holding the smartphone with the app open for a total of 45 movements. Participants were asked to match the target angles as closely as possible and to hold the smartphone with the wrist in a neutral position and with the screen oriented upward and facing away from the palm throughout the testing. If required, each participant was reoriented for each movement so that the participant's shoulder or elbow would align with the point of rotation marked on the poster before performing each movement. Figure 1 provides an illustration of left flexion movements.

Left flexion target angles. Low (A), mid (B), and high (C).
Sample size and data analysis
Sample size was calculated using an a priori dependent means t-test to correspond with the t-test analyses used to determine if the app could distinguish between ROM levels. Findings from a recent study tracking breast cancer survivors’ ROM over time show significant differences between time points for abduction, flexion, and extension, with magnitudes from up to 58°. Based on these data, the effect size was set to 1.0, 20 alpha to .05, and power to 0.8, and a total sample size of 10 was determined.
Raw data produced by ShApp was inputted into a custom Python script that could process the time series data and output ROM angle values for separate repetitions of the movements. The smartphone's gyroscope generates angular velocity data, and the accelerometer generates acceleration data. Accelerometer data were smoothed with a Gaussian filter, and changes in acceleration over time were used to tell when each repetition of a shoulder movement began and ended by identifying local peaks and valleys. Angular velocity over time was converted to angular position, in degrees over time. These two datasets were then used to find the difference in angular position from the beginning and end of each repetition, or the ROM achieved for each repetition in degrees.
Inter-rater reliability between ShApp and the pre-measured target values was assessed with ICCs and 95% confidence intervals (CI) based on a single-measures, absolute-agreement, 2-way random-effects model. A Bland–Altman analysis was also conducted to calculate the mean difference, standard deviation, and limit of agreement (LOA) between ShApp measurements and pre-determined target measurements. Similarly, the inter-rater reliability of ShApp measurements between participants was determined from ICCs and 95% CI based on a single-measures, consistency, 2-way random-effects model. The intra-rater reliability of ShApp measurements within each participant between right and left shoulders was also assessed with ICCs and 95% CI, with a single-measures, absolute-agreement, 2-way mixed-effects model. 19 ICCs and 95% CI were assessed with the following criteria: <0.40 = poor, 0.40–0.59 = fair; 0.60–0.74 = good; >0.74 = excellent. 17 Finally, independent t-tests (α=.05) were conducted to determine the ability of ShApp to distinguish between high and low, high and mid, and mid and low ROM values. All analyses were performed using SPSS Statistics for Windows, version 27 (SPSS Inc, IBM, Chicago, IL, USA).
RESULTS
A total of 10 participants completed the study (4 females/6 males). The mean age (±SD) was 32 ± 10.9 years. Mean (±SD) height and weight were 172 (±6.7) cm and 71 (±10.4) kg, respectively.
ICCs and 95% CI for reliability between ShApp and the pre-measured target values are presented in Table 2. For flexion, external, and internal rotation, reliability was excellent, while for abduction and extension, reliability was good. While the point estimates suggest good to excellent agreement, the CIs for abduction and flexion were wide, indicating potential variability in agreement.
Interrater reliability of ShApp and pre-determined target angles for each shoulder ROM
ICC, interclass correlation coefficient; CI, 95% confidence interval.
The mean difference, standard deviation, and upper and lower boundaries of LOA between the ShApp and target values based on the Bland–Altman analysis are presented in Table 3. In accordance with the interrater reliability between ShApp and pre-determined target angles, the mean difference was smallest for external rotation, followed by internal rotation; external rotation also had the narrowest LOA. This demonstrates that the highest levels of agreement occurred during the humeral rotation movements. In contrast, the mean difference was highest for abduction followed by flexion; abduction exhibited the widest LOA. This suggests that the poorest agreement between measurement systems occurred during the abduction movement, mirroring the results of the inter-rater reliability between ShApp and pre-determined target angles. In both movements, the positive mean difference indicates ShApp measurements were higher than the target measurements.
Summary of Bland–Altman analysis assessing the agreement between shApp and pre-determined target ROMs.
SD, standard deviation; LOA, limit of agreement.
The reliability of ShApp measurements between participants was excellent for all shoulder movements and is presented in Table 4.
Reliability of ShApp measurements between participants for each shoulder movement and within participants between right and left sides for each shoulder movement (participants combined; ICCs and 95% CI).
ICC, interclass correlation coefficient; CI, confidence interval.
The reliability of ShApp measurements within participants between their right and left sides was excellent to good for all shoulder movements except extension (Table 4).
ShApp significantly distinguished between high and low, high and mid, and mid and low ROM values for each movement (Table 5), demonstrating the ability of ShApp to differentiate between gross differences in shoulder ROM.
Degrees of freedom and t-values from independent t-tests comparing mean ShApp measurements between high to low, high to mid, and mid to low values.
* Significant difference between means, P < .001
DISCUSSION
This study presents a method of measuring active shoulder ROM using a smartphone app, ShApp, and sets out to investigate ShApp's ability to distinguish between three target angles considered to be above-reported MCIDs and to assess the inter- and intra-rater reliability of measures. The results demonstrate good to excellent agreement between the ROM recorded by ShApp and the known target values, and reliability both between and within participants for each of the shoulder movements was excellent for all movements except extension. In addition, the ROM values recorded by ShApp show potential to be used as a tool that can detect clinically important changes in ROM measurements over the course of a rehabilitation program remotely and in telehealth settings.
Construct validity of ShApp's ROM measurement function was evaluated by comparing the ROM values generated by ShApp to predetermined target angles depicted on a poster next to which participants stood to guide their shoulder movements. In this study, we chose to use a poster because it could be a cost-effective visual tool that a patient could easily be sent home with to guide their exercises and could further help a clinician gauge shoulder movements during a remote assessment. While previous studies have utilized a universal goniometer to assess the validity of a mobile app, it is important to note that goniometer measurements can be affected by random errors. The degree of error is contingent upon the proficiency level of the user in employing goniometry techniques. 21
Variations in ICC values across different movements when agreement between ShApp and target measurements was assessed may have been influenced by the participant's visibility of the target values. For instance, when the participant could directly see both the target line and the mobile device, ICC values were excellent (e.g., in flexion, external, and internal rotation). However, when participants had to rely on the mirror to guide their movement to the target line, ICC values were lower, as observed in abduction and extension. Particularly for extension, there is the possibility that participants either could not easily visualize their arm position and had to rely on proprioceptive cues 22 or rotated their body slightly to visualize the target line, which may have shifted the axis of movement and influenced the accuracy. In future studies, the amount of practice for abduction and extension might help increase the accuracy of matching the target angle.
Interestingly, the mean difference between the ShApp measurements and the pre-determined target values was highest for abduction and flexion. ShApp measurements were higher than the pre-measured target values for both movements. However, CIs for abduction and flexion were broad, suggesting differences may be due to the nature of measuring these larger movements with the arm. The discrepancy between the two measurements may also be due to the position or movement of the wrist. It is possible that the participants accurately reached the target angle by aligning their arms along the line, but radial deviation at the wrist could have impacted the ShApp measurements by introducing a systematic bias toward overestimation. Although the app provides training videos to make the participants aware of keeping their wrist in a neutral position, radial deviation at the wrist with shoulder abduction and flexion movements can occur fairly naturally. 23 Securing the mobile device above the wrist joint with a band or strap may limit this error but would require purchase of such an apparatus by each participant to use independently at home.
Although direct comparison of our findings to those of others is limited due to differences in methodology (i.e., healthy versus those with pathology, active or passive shoulder movements, standing or supine positioning of the participant) and instrumentation, our study shows similarly inter and intra-observer reliability with other smartphone application-based ROM measuring tools that assessed the shoulder.18,24,25
Novel to our study is the investigation of ShApp's ability to distinguish between clinically relevant shoulder ROM changes. For patients following shoulder surgery, the MCID has been reported to be 7°, 12°, and 3° for abduction, flexion, and extension, respectively. 26 When selecting the target ROM angles, we were mindful of choosing values that had differences above these MCIDs. Additionally, for repeated active shoulder ROM measured with a goniometer, the reported MCD for reliability ranged from 16° to 28° for those with shoulder pathology and 11° to 22° for those without. 27 Again, our target ROM angles were chosen to be within or above these MCD values. In our study, the mean ROM values for the high, mid, and low levels for all shoulder movements were significantly different, suggesting ShApp is proficient at distinguishing between gross and clinically meaningful differences in arm ROM (i.e., changes between 10° and 60°). This is an important finding, as it further validates the clinical utility of ShApp for remote monitoring. As breast cancer patients progress independently with their rehabilitation programs following surgery, therapists may be able to monitor for gross ROM improvements remotely at specific timepoints or identify patients who exhibit a lack of significant ROM changes that would be expected or desired at certain intervals post-breast cancer treatments, such as mastectomy or breast reconstruction. This may prompt the therapist to check in with the patient for an earlier-than-scheduled re-assessment.
ShApp's sensitivity to detect gross ROM changes may be particularly useful to monitor shoulder movements remotely in the acute and early stages of recovery where the greatest ROM loss and recovery are seen. In a study that investigated shoulder movement after breast cancer surgery in 65 women, participants presented with approximately 60°–75° loss of abduction ROM from pre-op to 5 days post-op. From post-op day 5 to 1-month post-op, women regained approximately 30°–50° of abduction ROM, and from 1-month post-op to 3 months post-op, women regained most of their ROM. 21 In another study of over 200 women, loss of flexion ROM from pre-op to 2 weeks post-op was greater than 40°, and from 2 weeks post-op to 1 month more than 20° was regained. 28 ShApp shows promise for measuring these clinically meaningful changes, monitoring patient progress (or lack thereof) but removing potential barriers for consistent follow-ups and treatments.
This study had several limitations. First, our study only reported on healthy, cancer-free participants with no shoulder restrictions or pathology. Thus, generalization to a clinical population is limited, and future investigation on a breast cancer population would be beneficial to ensure accurate measurements. However, other studies have found no differences in the reliability between healthy participants and those with shoulder symptoms. 13 A healthy cohort was chosen to ensure that each ROM level of interest could be assessed for each person; this would likely not be attainable in a clinical population and is required for initial validation. While the sample size of this study may be perceived as small, it aligns with the a priori calculation that determined a sample size of 10 participants was appropriate. Additionally, similar shoulder instrument validation studies have been conducted with comparably small sample sizes.29,30 Furthermore, despite the training videos there still is the risk that errors in the ROM measures were introduced by slight rotations of the smartphone caused by wrist or elbow movements due to our analysis's reliance on rotational data to output ROM measures.
CONCLUSION
In conclusion, ShApp's ROM function shows promise as a reliable and valid tool to remotely monitor shoulder ROM. Its ability to distinguish between three levels of ROM (low, mid, and high) for all five shoulder movements investigated highlights its clinical potential, particularly in the acute and early phases of patient recovery from surgery. As data is directly transmitted in an encrypted form to a secure central database that is password protected and the entire system is HIPAA and GDPR compliant, it can provide a viable option for patients with limited access to rehabilitation and in-person follow-up assessments to receive shoulder rehabilitation and monitoring remotely. This, in turn, can significantly reduce travel time and expenses associated with in-person care, offering both convenience and cost savings for patients.
Footnotes
Acknowledgments
The authors acknowledge the financial support from the Royal University Hospital Foundation (PI: S. Y. Kim). Data will be made available upon reasonable request to the study authors.
Contributorship
Conceptualization: SK, AL, JP, NO; data curation: SK, AL, JP; formal analysis: SK, AL, JP, DB; funding acquisition: SK; investigation: SK, AL, JP; methodology: SK, AL, JP; project administration: SK, AL, JP, NO; resources: SK, NO, JP; software: JP, NO; supervision: SK, AL, NO; validation: SK, AL, NO, JP, DB; visualization: SK, AL, JP, DB; writing—original draft: SK, JP, AL, DB; writing—review and editing: SK, JP, AL, DB, NO.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethics approval
Our research was conducted in accordance with the principles outlined in the Declaration of Helsinki. This study was approved by the University of Saskatchewan Research Ethics Board (Bio-2327). As per the ethics guidelines, the principal investigator will have access to the final data set which will be retained as per requirements.
Funding
Royal University Hospital Foundation, #R210210.
