Sage Journals: Discover world-class research

Abstract

The mobile app rating scale (MARS) is a widely used instrument for evaluating smartphone app quality. We aimed to examine the reliability and validity of the Korean version of MARS (MARS-K). Two independent raters performed the assessment using the translated 23-item questionnaire. We applied intraclass correlation coefficient analysis (ICC) to examine inter-rater reliability, Omega, and item-total correlation for internal consistency, and Pearson’s r for test–retest reliability and correlation between subscales and the total score of MARS-K. Most items showed moderate to good ICC (0.447–1.000). The MARS-K showed excellent internal consistency and all subscales exceeded the acceptable level of omega. Results indicated MARS-K to be a valid and reliable instrument for evaluating disease management apps offered in the Korean app store. However, upgrades are recommended to further improve MARS-K’s rating accuracy and reliability.

Keywords

validity reliability application disease management rating scale

Introduction

Mobile applications (apps) represent one of the fastest-growing technologies due to the high penetration rate of mobile phones worldwide.¹ According to a Korea Internet and Security Agency report, 82.5% of people in Korea over 5 years old are smartphone users.² Mobile technologies play a pivotal part in various aspects of our everyday lives. Advanced smartphone features, such as Bluetooth and location sensing, extend the usability of health applications that perform varied tasks such as providing reminders to track calorie consumption, self-management of specific health conditions, the remote morning of a targeted disease, tracking physical activity, promoting self-care behaviors (taking medications as prescribed, maintaining a healthy diet and weight, maintaining good mental health habits), behavioral tracking, monitoring symptoms, and maintaining a dialogue with healthcare practitioners through secured text messages and video conferencing calls.^3–8

Health applications render promising future directions for disease care due to their accessibility, relatively low cost, and high capacity for information storage. Some applications have already proven to be effective healthcare intervention tools.⁹ Such applications may be particularly helpful for patients with chronic diseases as they offer multiple benefits. Individuals with chronic conditions can become overwhelmed due to complex treatment regimens. Poor adherence to disease management practices increases patients’ risks for complications leading to increased healthcare expenses. Health applications have clinical implications for supplementing traditional clinic-based treatment with real-time assessments, monitoring, and data collection.¹⁰

For this reason, several apps have been developed that specifically target populations with chronic conditions. The mobile health market has reported a 41% growth rate annually from 2015, showing the highest growth rate in the digital health sector.¹¹ Including smartphone apps, mobile health devices were designed primarily for managing chronic conditions such as diabetes, hypertension, and depression. This market has expanded rapidly which has generated interest in the effectiveness of the apps, and the need for additional information regarding app quality. National public health agencies have initiated incorporating mobile technologies to ensure quality in healthcare services.¹² Researchers have demonstrated that perceived usefulness is closely linked to quality which may lead to an increase in available effective health apps.^13,14

In Korea, new free or low-cost health applications are released daily.¹¹ While apps provide consumers with a potential advantage by offering interactive tools that improve access to health information and support treatment adherence,¹⁵ they can also have some downsides. For example, these apps provide incorrect and misleading information and consumers may use this faulty information to make their own health-related decisions.¹⁶ Moreover, every user has a different app adoption process based on circumstantial factors such as society’s values, environment, friends, and individual characteristics such as age, gender, race, health history, and educational background.¹⁷ Consequently, some people adapt faster and gain significant benefits, while others may struggle to learn and use the apps, and wrong information can be transferred and followed by users. Given their potential effects—good and bad—on individuals with chronic conditions, mobile apps should be required to meet standards that guarantee their quality. However, apps do not have to demonstrate their safety or disclose if they are not evidence-based.¹⁸ Therefore, there is a need for a thorough health application assessment process to optimize their effects on public health.

The Mobile Application Rating Scale (MARS) was recently developed by Stoyan and colleagues.¹⁹ It uses 23 items including four objective domains (engagement, functionality, aesthetics, and information) and one subjective domain of health application, to evaluate an app’s quality. The MARS has been widely translated into other languages, including Italian,²⁰ German,²¹ Spanish,²² and Arabic,²³ and earlier studies have demonstrated its reliability. Previous studies also verified its applicability by using MARS to assess various kinds of apps, such as those which focus on smoking cessation, disease management for patients with epilepsy, anxiety, diabetes, obesity and associated disorders, health and fitness, and disease prevention.^20–25 In addition, a recent validation study using MARS rating scores of 1,299 mobile health apps confirmed the validity and reliability of MARS, concluding that MARS is a suitable tool to assess the quality of health apps. In particular, this study included a large number of specific target groups across various diseases including, anxiety, low back pain, cancer, depression, diet, elderly, gastrointestinal diseases, medication adherence, mindfulness, pain, physical activity, post-traumatic stress disorder, rheumatism, weight management, and internalizing disorder MHA for children and youth.²⁶

In Korea, most health applications have not been based on sound evidence and theoretical frameworks; they have been designed by various individuals or organizations rapidly developing multifarious health applications that are easy to market for public usage.²⁷ Concerns have been raised not only about these apps lacking any theoretical basis but also about how few studies have been done to provide systematic evaluations of their effectiveness.²⁸ There are no existing instruments for measuring the quality of mobile applications in Korean. Thus, we aimed to develop a Korean version of MARS and test its validity and reliability.

Methods

We used content analyses²⁹ to chart a range of health-related mobile applications currently being used in Korea. We first collected data from various types of publicly available sources to detect the presence of frequently used mobile applications. To counterbalance potential bias in the findings, we used investigator triangulation.³⁰ Each researcher independently examined the data. The results were compared and a level of common agreement was achieved.

Search strategy: Phase one

The analysis involved four phases. The first phase was identifying health-related mobile applications targeting the general public in Korea. We selected the Google Play Store and the Apple App Store, as their apps work with the mobile operating systems most used in Korea. The Google Play and Apple App Store were searched from March 1 to March 3 of 2020. The search keywords included 46 chronic conditions from a list of diseases provided by the Korea Institute for Health and Social Affairs.³¹

Selection criteria: Phase two

Since there are no strict regulations for developers when putting apps on the market concerning the safety and relevancy of the health apps in Korea,³² three criteria from previous studies were applied^6,23,33 to access the current availability and adequacy of health apps. Each mobile application included in the study was sampled to verify they were (1) written in Korean (if their description was available in Korean), (2) recently updated (in 2019 or 2020; a recent update verifies continuing technical support and fixes for software issues), currently used (if the number of downloads were > 10,000 or apps were included if the review rating exists and was >3.5), and (3) free to download (free versions are likely to be the most frequently downloaded; free apps with potential in-app purchase options were included). The second phase involved tabulating the phase one results³³ and determining the application’s name, language support, founder or sponsor, number of downloads, rating review values, recent updates, in-app purchases, primary users, and main purpose. A comprehensive audit of the features, functionality, and content of each app was then tabulated (Table 1).

Table 1.

List of application tested.

Category	Apps	Provider	Review rating
Hypertension	App 1: Blood pressure recorder	Health & Fitness AI Lab	4.6/5
	App 2: Blood pressure recorder	Szymon Klimaszewski	4.1/5
	App 3: AVAX blood pressure recorder	AVAX App	4.5/5
	App 4: Smart blood pressure recorder	evolvemedsys	3.7/5
Heart disease	App 5: Arrhythmia test	SUNG DO KIM	4.1/5
Severe vision reduction	App 6: Eye training for astigmatism	Wonjong Yang	4.1/5
	App 7: Eye exercise to improve eyesight	andrew.brusentsov	4.5/5
	App 8: Eye exercise	healthcare4mobile	4.5/5
Diabetes mellitus	App 9: Diabetes food guide and diary	Korean Diabetes Association	3.5/5
	App 10: Blood sugar recorder and community support	Dr. Diary Inc.	4.4/5
	App 11: BST recorder, medication alarms and information	Tastylife Inc.	4.1/5
	App 12: Step counter, BST recorder medication alarms	Tastylife Inc.	4.2/5
	App 13: Step counter, diabetes recipes and BST recorder	BBB Inc.	4.1/5
	App 14: Blood sugar recorder	Qbee520	3.5/5
Obesity	App 15: Diet diary with nutrition and exercise information	Funnym Co. Ltd	4.5/5
	App 16: Diet diary with weight management	DROID INFINITY	4.2/5
	App 17: Home workout and diet plan	Healthy Everyday	4.8/5
	App 18: AI food diary	DoingLab Inc.	4.5/5
	App 19: Calorie counter with diet diary	YAZIO	4.5/5
	App 20: Exercise video, step counter and BMI calculator	Funnym Co. Ltd	4.2/5
	App 21: Meal plan, exercise, and BMI calculator	Appsfabrica	4.1/5
	App 22: Calorie counter	Baskaran Arunasalam	3.8/5
	App 23: Calorie counter and fasting schedule	CLECO soft	4/5
	App 24: Dance exercise diet with video contents	NQUBE	4.2/5
	App 25: BMI calculator and goal setting	fmouri	4.3/5
Prostatic hyperplasia	App 26: Pelvic floor exercise	Olson Applications Ltd	4.9/5
Depression	App 27: Community support and psychological test	Atommerce Inc.	4.6/5
	App 28: Psychological test and free consultation	Atommerce Inc.	4.5/5
	App 29: Mental health support programs and consultation	Minding Co.	4.4/5
	App 30: Self-examination and stress management program	Suwon Happiness Mental Health Welfare Center	4.7/5
Cerebral ischemia	App 31: Stroke hospital locations, signs of stroke and first aid	Yonsei stroke team	4.4/5
Dementia	App 32: Dementia therapy with addition game	Jeon Wonjoo	3.8/5
Sexual dysfunction	App 33: Pelvis balance exercise	Leap Fitness Group	4.8/5
Insomnia	App 34: Sleep cycle tracker	Apalon App	3.7
	App 35: Build regular and healthy sleep habits	Seekrtech	4.5/5
	App 36: Sleep sound with relaxing sound and white noise	Sound Sleep	4.7/5
	App 37: Sleep sound with relax music and white noise	Leap Fitness Group	4.7/5
	App 38: Sleepa: Relaxing sounds, sleep	Sound Sleep	4.7/5
	App 39: Sleep sound: ASMR natural sounds	RelaxSound	4.5/5
	App 40: MindWiz – Sleep, focus, ASMR	UBwin Inc	4.4/5
	App 41: Calm: Meditate, sleep, relax	Calm.com, Inc.	4.5/5
	App 42: Sleep monitor: Sleep cycle track and recorder	SM Health Team	4.3/5
Tobacco abuse	App 43: SmokeNote: Quit smoking notification messages	NXCARE	3.8/5
	App 44: QuitNow: Quit smoking tips and resist cravings	Fewlaps	4.6/5
	App 45: Smoking manager records smoking patterns	Cho bro	4.5/5
	App 46: Quit tracker: Diary and track cravings	DespDev	4.7/5

Instrument, translation and back translation: Phase three

The third phase involved using a Mobile Application Rating Scale (MARS) developed by Stoyanov et al.¹⁹ to validate the instrument with a sample of frequently used health-related mobile applications in Korea. This phase aimed at reviewing the quality of healthcare services offered by mobile applications and identifying patterns in the data. This resulted in a series of implications and recommendations being developed for validation of MARS in Korean.

The original version of the Mobile Application Rating Scale (MARS) consists of 23 items using 5-Point Likert scale which included rating scores from one (poor) to five (excellent). An additional option, “not applicable,” exists for five items: 14–17 and 19. There were two parts to the quality rating scales in MARS, the objective and subjective quality app assessments. The objective quality of mobile applications was evaluated across four subscales: engagement (four items), functionality (four items), aesthetics (three items), and information (seven items). Four items (items 20–23) were used for subjective quality assessment.

The Korean version of MARS was developed after obtaining permission from the developer of the original MARS. A forward–backward translation procedure was performed with the consensus of the developer. First, MARS was translated into Korean by the author. Then, a blind backward translation was performed by a bilingual translator who has been in an English-speaking country (the USA) for longer than 20 years. The developer of the original MARS reviewed the results and discrepancies were resolved.

Assessment of apps: Phase four

Based on previous studies,^19,20,34 the sample size was determined as a minimum of 41 apps for a two-rater assessment. For an intraclass correlation coefficient (ICC), a sample size was determined using Zou’s sample size calculation. The number was needed to obtain an agreement of at least 80% with the half width of a two sided confidence interval remain below 0.15 with assurance probability of 0.90. Standard online training was made available for raters¹⁹ to ensure consistent interpretation of terminology regardless of the researchers’ background or country, to improve the quality of app evaluation utilizing MARS. Following training program, three reviewers (KH, SK, and YH) pilot implemented MARS on apps not included in this study to verify the reliability of the result. Any distinctive conflicts regarding interpretational issues and subtle nuance were adjusted and resolved through discussion as an effort to improve alignment. In the assessment process, two authors (KH and SK) played the role of rater and independently evaluated all apps included that both raters were using MARS-K One of the two researchers was based in Australia and used a Samsung Galaxy A5 and an iPhone pro max while the other was based in Korea and used a Samsung Galaxy S10 and an iPhone 8. Both researchers downloaded and tested the mobile application.

Statistical analysis

All statistical analyses were performed using SPSS statistics software, version 27.0 (SPSS Inc., Chicago, Illinois), and the statistical significance was determined at p < 0.05. The rating scores of individual raters were pooled, and descriptive statistics such as mean and standardized deviation (SD) for total scores and subscale scores were produced.

Objectivity

The intraclass correlation coefficient (ICC) was calculated to examine the consistency between the ratings of the two raters. The results were interpreted according to the following criteria: excellent (ICCs above 0.90), good (ICCs between 0.76 and 0.89), moderate (ICC between 0.51 and 0.75), and poor (ICC below 0.50).³⁵

Reliability

The reliability analysis was assessed by Omega as it is known to provide a more unbiased estimation of reliability than Cronbach’s alpha which has been widely used.^36,37,38 Omega scores of individual subscales scores and total scores of MARS-K were calculated to verify internal consistency. The reliability coefficient of Omega was interpreted according to the following criterion: acceptable (0.70–0.79), good (0.80–0.89), and excellent (>0.90).³⁹ In addition, test–retest reliability was used to access the stability of the measurement, with raters calculating the Pearson correlation scores at two points of time. The two assessments were taken approximately 2 weeks apart.

Validity

For item analysis, item-total correlation (ITC) was analyzed. Correlations were calculated to investigate whether each subscale measures unrelated construct.⁴⁰ Pearson’s correlation coefficient analysis was used to obtain the r scores for subscales and total scores of MARS-K; to evaluate the construction of measurements. The coefficient score of 0.70 was used for acceptability.

Results

A total of 284 apps were initially retrieved; then, the duplicates were removed (n = 17). Sixty apps were considered for downloading after applying the exclusion criteria stated above. Fourteen apps were further excluded due to being unavailable in Australia, inadequate contents, and functionality problems. Forty-six apps were included for the MARS-K validation study (Figure 1). The mean scores and distribution of scores on the five subscales by raters were shown in Table 2. Except for skewness score on the information subscale by Rater 2 (−1.026), others were within the ± 1 range.

Figure 1.

Flow chart for app selection.

Table 2.

Mean scores and distribution by rater and subscale.

Subscale	Mean (SD)		Skewness			Kurtosis
Subscale	Rater 1	Rater 2	Rater 1	Rater 2	Rater 1	Rater 2
Engagement	3.17 (0.93)	2.95 (0.82)	−0.230	−0.029	−0.664	−0.817
Functionality	3.84 (0.65)	3.93 (0.35)	−0.451	0.173	−0.110	2.046
Aesthetics	3.64 (1.04)	3.82 (0.67)	−0.443	−0.579	−0.636	0.175
Information^a	3.38 (0.65)	3.67 (0.43)	−0.289	−1.026	−1.050	0.428
Subjective quality	2.93 (0.89)	3.38 (0.88)	−0.075	−0.653	−0.813	−0.015

^aItems 18 and 19 were excluded from calculation because of lack of ratings.

Objectivity

Table 3 includes the results of two reliability analyses—inter-rater and test–retest along with convergent validity with item-total correlation. For inter-rater reliability, ICCs were calculated for MARS total score, subscale score, and each item. The ICCs were varied and ranged from 0.447 (the lowest, item 9) to 1.000 (the highest, items 18 and 19). The highest ICC was calculated for the subscale of engagement (0.929); the information subscale was the lowest (0.667).

Table 3.

Inter-rater, test–retest reliability and item-total correlation of MARS-K.

		Inter-rater reliability		Test–retest reliability		Item-total correlation
	Item	ICC	P	Rater 1	Rater 2	Rater 1	Rater 2
Engagement	Mars1	0.854	<0.001	0.928	0.904	0.817	0.652
	Mars2	0.847	<0.001	0.913	0.899	0.814	0.638
	Mars3	0.817	<0.001	0.909	0.922	0.749	0.769
	Mars4	0.867	<0.001	0.921	0.949	0.765	0.731
	Mars5	0.731	<0.001	1.000	0.861	0.790	0.705
	Subscale 1	0.929	<0.001	0.970	0.985	—	—
Functionality	Mars6	0.607	0.001	0.906	1.000	0.681	0.425
	Mars7	0.726	0.009	0.909	1.000	0.609	0.297
	Mars8	0.741	<0.001	0.838	0.796	0.665	0.459
	Mars9	0.447	0.02	0.847	0.723	0.604	0.457
	Subscale 2	0.744	<0.001	0.963	0.933	—	—
Aesthetics	Mars10	0.551	0.004	0.977	0.789	0.773	0.566
	Mars11	0.791	<0.001	0.932	0.919	0.834	0.671
	Mars12	0.829	<0.001	0.968	0.914	0.836	0.748
	Subscale 3	0.828	<0.001	0.975	0.951	—	—
Information	Mars13	0.729	<0.001	1.000	0.805	0.748	0.561
	Mars14	0.623	0.001	1.000	0.905	0.560	0.604
	Mars15	0.571	0.02	0.965	0.808	0.675	0.512
	Mars16	0.575	0.02	1.000	0.859	0.738	0.423
	Mars17	0.660	0.001	1.000	1.000	0.646	0.406
	Mars18	1.000	—	1.000	1.000	−0.206	−0.229
	Mars19	1.000	—	1.000	1.000	0.081	0.028
	Subscale 4	0.667	<0.001	0.998	0.922	—	—
Subjective quality	Mars20	0.714	<0.001	0.957	0.958	0.828	0.850
	Mars21	0.690	<0.001	1.000	0.887	0.778	0.806
	Mars22	0.681	<0.001	0.871	0.873	0.736	0.873
	Mars23	0.630	0.001	1.000	0.879	0.763	0.764
	Subscale 5	0.786	<0.001	0.973	0.927	—	—
MARS-K		0.879	<0.001	0.992	0.987	—	—

Reliability

Test–retest reliability was estimated using Pearson’s correlation coefficient (r), which ranged from 0.871 to 1.000 for Rater 1 and from 0.789 to 1.000 for Rater 2. Internal consistency was evaluated by examining the item-total correlation, the values of which were acceptable and ranged between 0.560 and 0.828 (for Rater 1), and 0.297 and 0.850 (for Rater 2), except for item 18 and 19 where only two apps were rated. Omega was used as the reliability coefficient (Table 4) and the Omega scores of MARS-K by raters were 0.956 and 0.924.

Table 4.

Omega, by rater and subscale.

Subscale	Omega
Subscale	Rater 1	Rater 2
Engagement	0.932	0.857
Functionality	0.866	0.751
Aesthetics	0.926	0.856
Information^a	0.859	0.763
Subjective quality	0.901	0.940
MARS-K	0.956	0.924

^aItem 18, 19 were excluded from calculation because of lack of ratings.

Validity

Table 5 shows the Pearson’s correlation coefficient between objective quality subscales, subjective quality, and MARS-K total scores of raters wherein all values were statistically significant. For Rater 1, r values between subscales were ranged from 0.643 to 0.800. Only “engagement” subscale of objective quality and “subjective quality” showed r value greater than 0.70 for Rater 1 (r = 0.797).

Table 5.

Pearson’s correlation coefficient, by rater and subscale Rater 1: upper right, Rater 2: lower left).

	Engagement	Functionality	Aesthetics	Information	Subjective quality	MARS total score
Engagement	1	0.564^a	0.777^a	0.710^a	0.790^a	0.903^a
Functionality	0.405^a	1	0.740^a	0.643^a	0.666^a	0.793^a
Aesthetics	0.646^a	0.436^a	1	0.658^a	0.800^a	0.901^a
Information	0.598^a	0.325^b	0.514^a	1	0.746^a	0.857^a
Subjective quality	0.797^a	0.473^a	0.683^a	0.599^a	1	0.918^a
MARS-K total score	0.915^a	0.572^a	0.798^a	0.747^a	0.919^a	1

^ap < .05.

^bp < .01.

Discussion

This study developed and evaluated the Korean version of MARS (MARS-K). A comprehensive search identified 46 health-related apps, targeting disease management that is prevalent in Korea. The MARS-K has good objectivity and overall reliability and validity, proving that MARS-K is suitable for quality evaluation of health applications in Korea. MARS-K would successfully replace previous app ratings, including star ratings in which reliability was a constant concern for health apps.^41,42 Since end-users require information to gauge the reliability of the app, obtaining quality ratings from experts will help assure the end-users, leading to good adherence to the app.

Similar to previous studies, ^20-23 consisting of contents that could easily objectify, relatively higher inter-rater reliability and internal consistency scores were obtained for subscale 1 (engagement) for MARS-K. The questions ascertained whether the app offers diverse features to enhance user engagement and included examples such as gamification, customized setting for individuals' need, option to add feedback, alerts, reminders, and ability to share gathered information with others. Raters were able to decide based on the clear evidence of the apps having those features.

The “functionality” subscale showed the relatively lower inter-rater reliability and internal consistency which is in line with the results of previous studies.^20,22 Given some words had high levels of abstractness, low ICCs were the result of inconsistent interpretation of items. With respect to words such as “appropriate” and “uninterrupted” (item eight) and “intuitive” (item nine), raters might have a different standard which could increase the subjectivity of the ratings. Since several objective items were unclear, the ratings could hardly be objectified. There were numerous aspects to measure options for customization, and complete tailoring to the individual’s characteristics/preferences. It might be beneficial to count several features to clarify the level of each functionality and customization, for example ranging from 0 (none) to 10 or above (maximum). Incorporation of additional and exemplified explanations of the measurement would potentially enhance the consistency, transparency, and inter-rater reliability where necessary.

Findings indicated moderate to good correlation among subscales while “functionality” showed relatively low correlation with other subscales. These results support previous studies^20,22 and could be explained by the commonly shared characteristics regarding apps' performance. Using MARS, apps were likely to obtain higher scores in functionality when consisting of features that are simple and easy to navigate.²⁰ As a matter of fact, both raters showed the highest average scores on functionality subscale. Most apps included in this study were information and education focused rather than disease management focused; thus, theoretically, they are easy to use and uncomplicated for the target population which include elderly with potential illness. The majority of health applications demonstrated good functionality by valuing simplicity and having limited features available for good usability, which also led to bifurcation of functionality from other subscales.

It was hard to draw matched results between raters for certain aspects of MARS-K, for example, dimension 6 (subjective quality). Although raters reached a consensus on how to rate, there was an inevitable difference between the results of raters due to subjective natures of items.¹⁹ In addition, unexpectedly, item six asking the performance of apps showed low ICC. Studies reported the factors influencing app performances are processor, RAM, storage, software, battery, and temperature.⁴³ Two raters used different smartphones within different network environments (Rater 1 was based in Australia, and Rater 2 was based in Korea), which may have caused the divergence of ratings regarding app performance. Indeed, Rater 2 used a newer smartphone than Rater 1. Generally, apps are updated to suffice the latest model of smartphone and operating system; thus, Rater 2 scored higher on apps’ performance than Rater 1. Also, the network environment was not identical, which might have affected the functionality of apps.

All apps provided a certain level of information detailing what their app offered in the Google Play app store; some were comprehensive while others were deficient. Inter-rater reliability measure of sufficiency of app description, for both quality and quantity, showed considerable disjunction between raters. One explanation could be the different academic background of raters. Unlike the other rater, the rater with a nursing background will be able to assure whether or not the information provided is adequate for users’ understanding. Regarding health apps, one could involve multiple stakeholders from different knowledge backgrounds, for example, healthcare professionals or technicians. Considering MARS-K measures diverse aspects of an app’s quality, it is crucial to involve individuals with relevant backgrounds than having identical rating scores.

The following are some considerations for developing app rating scales. Patient experience has become a crucial part of the quality of any healthcare service, and the user-centered design method understands users’ experience and prioritizes the needs of end-users.^44,45 Furthermore, the user-centered design is an evidence-based approach and proven method of mHealth intervention that engages end-users to enable more appropriate app design.^46,47 The World Health Organization (WHO) recommends this approach and promotes this within the lifecycle of mHealth (i.e., app design) to ensure an effective outcome.⁴⁸ “User-centered,” “human-centered,” and “patient-centered” can be used interchangeably.

When apps fundamentally target disease management, it is important to embrace the entire user (patient) journey, emphasize on human-centered design, and encompass holistic well-being.⁴⁹ A human-centered design brings users' experience to the core of the design process. It uses the techniques to communicate, interact, empathize with the user, obtain and understand the desires, experiences, and potentially find the latent need.⁵⁰ Therefore, the app rating scale should evaluate if the app is well-designed from a human-centered point of view and comprehensively consider improving patient experience.

The current MARS-K finds it difficult to measure if an app offers motivation to use the app, provides a supportive and meaningful interaction, helps to effectively achieve their goals, changes behavior towards more positive, and finally improves their health and well-being. Furthermore, a previous validation study suggested the inclusion of therapeutic alliance domain as it could strengthen the quality of assessment when it comes to health apps.²⁶ The ENLIGHT instrument, for example, contains a section to access therapeutic alliance,⁵¹ measuring feasibility of health apps as an effective means of disease management.

Other concerns are the possibility of different weights that individual items represent in each dimension.²⁶ Although it was confirmed that the individual dimensions measure different aspect of app quality, the current calculation methods should be reconsidered. Presently, most studies recognize the sum score of each dimension which poses risk of faulty analysis. Thus, future studies should suggest new calculation methods of how to weight items to ensure the accuracy of metric quality of measurement.

While validating MARS-K, two items showed complete agreement between raters; item 18 and 19. The two items ensure professional credibility and evidence-based practice, able to check if the app has been tested and whether they have a positive result and verified by evidence. Notably, only two apps were available to rate item 18 (developed by a reliable organization) and 19 (evidence-based research), which raised concerns regarding the quality of health apps in Korea. A substantial number of new health-related apps are regularly made available daily and it is widely acknowledged that there are no specific regulations for registering a newly developed app on the apps stores.⁵² Therefore, it is hard to determine which health-related apps truly work, especially among those targeted for disease management. Potentially, apps providing false information could aggravate a user’s health condition. Future work should include improvements to the MARS, particularly to meet the requirements for targeting disease management. Also, a system could be developed to register health-related apps.

In addition, the health apps store private and disease related data; hence, it is crucial to follow the rules and regulations to ensure public safety.⁴² The Food and Drug Administration (FDA) classifies health apps as medical devices with potential risk to users, therefore requiring a process of approval. Given that this approval focuses on safety aspects of the apps, the quality check using MARS, a quality evaluation tool, would provide end-users with sufficient information. As the first quality assessment tool, MARS-K would not only help improve the awareness of researchers and developers regarding the factors that influence the quality of apps but also help them understand what features and information can assist in improving the effectiveness of the app.

Conclusion

The MARS-K is a valid measurement for assessing the quality of health apps targeting disease management. However, further revisions of items would enhance the tools’ reliability. Additionally, the incorporation of user-centered factors and increasing the credibility of health apps would ensure their safety for clinical and personal use. Given the study’s findings of advantages and drawbacks, thorough upgrading of the tool might help improve the accuracy, reliability, and validity of ratings.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2019R1G1A1006737).

References

Ericsson

. Ericsson Mobility Report. Stockholm, Sweden: Ericsson, 2018.

Cho

Lee

Lim

, et al. Domestic location information industry trend survey report. Seoul: Korea Internet & Security Agency, 2018.

Ali

Luo

, et al. Application of mobile health technologies aimed at salt reduction: systematic review. JMIR mHealth and uHealth 2019; 7(4): e13250.

Wilson

Hennessy

Falzon

, et al. Effectiveness of interventions targeting self-regulation to improve adherence to chronic disease medications: a meta-review of meta-analyses. Health Psychol Rev 2020; 14(1): 66–85.

Wang

Min

Khuri

, et al. Effectiveness of mobile health interventions on diabetes and obesity treatment and management: systematic review of systematic reviews. JMIR mHealth and uHealth 2020; 8(4): e15400.

Bardus

van Beurden

Smith

, et al. A review and content analysis of engagement, functionality, aesthetics, information quality, and change techniques in the most popular commercial apps for weight management. Int J Behav Nutr Phys Activity 2016; 13: 35.

Terhorst

Messner

E-M

Schultchen

, et al. Systematic evaluation of content and quality of English and German pain apps in European app stores. Internet Interventions 2021; 24: 100376.

KHK

Dunn

Straker

, et al. A comparative content analysis of digital channels for ventricular assist device patients, caregivers, and healthcare practitioners. ASAIO J 2019; 65: 855–863.

Dayer

Heldenbrand

Anderson

, et al. Smartphone medication adherence apps: potential benefits to patients and providers. J Am Pharm Assoc 2013; 53(2): 172–181.

10.

Lee

J-A

Choi

Lee

, et al. Effective behavioral intervention strategies using mobile health applications for chronic disease management: a systematic review. BMC Medical Informatics Decision Making 2018; 18(1): 12.

11.

Global NIPA . Healthcare market. Korea: National IT Industry Promotion Agency, 2019, https://www.nipa.kr/main/selectBbsNttView.do?key=307&bbsNo=40&nttNo=6862&bbsTy=&searchCtgry=&searchCnd=all&searchKrwd=&pageIndex=2 (accessed 10 April 2020).

12.

Krebs

Duncan

. Health app use among US mobile phone owners: a national survey. JMIR mHealth and uHealth 2015; 3(4): e101.

13.

de Veer

AJE

Peeters

Brabers

, et al. Determinants of the intention to use e-Health by community dwelling older people. BMC Health Services Research 2015; 15(1): 103.

14.

Wang

Wei

, et al. Smartphone interventions for long-term health management of chronic diseases: an integrative review. Telemed E-Health 2014; 20(6): 570–583.

15.

Mcaskill

. The benefits of mobile health strategies: mHealth Intelligence, 2015, https://mhealthintelligence.com/news/the-benefits-of-mobile-health-strategies (accessed 10 March 2020).

16.

Lewis

Wyatt

. mHealth and mobile medical apps: a framework to assess risk and promote safer use. J Med Internet Res 2014; 16(9): e210.

17.

Eikey

Booth

Chen

, et al. The use of general health apps among users with specific conditions: why college women with disordered eating adopt food diary apps. AMIA Annu Symp Proceedings. AMIA Symp 2018; 2018: 1243–1252.

18.

Modave

Bian

Leavitt

, et al. Low quality of free coaching apps with respect to the American college of sports medicine guidelines: a review of current mobile apps. JMIR mHealth and uHealth 2015; 3: e77.

19.

Stoyanov

Hides

Kavanagh

, et al. Mobile app rating scale: a new tool for assessing the quality of health mobile apps. JMIR mHealth and uHealth 2015; 3: e27.

20.

Domnich

Arata

Amicizia

, et al. Development and validation of the Italian version of the mobile application rating scale and its generalisability to apps targeting primary prevention. BMC Medical Informatics Decision Making 2016; 16(1): 83.

21.

Messner

E-M

Terhorst

Barke

, et al. The german version of the mobile app rating scale (MARS-G): development and validation study. JMIR mHealth and uHealth 2020; 8(3): e14479.

22.

Martin Payo

Fernandez Álvarez

Blanco Díaz

, et al. Spanish adaptation and validation of the mobile application rating scale questionnaire. Int Journal Medical Informatics 2019; 129: 95–99.

23.

Bardus

Awada

Ghandour

, et al. The arabic version of the mobile app rating scale: development and validation study. JMIR mHealth and uHealth 2020; 8(3): e16956.

24.

Thornton

Quinn

Birrell

, et al. Free smoking cessation mobile apps available in Australia: a quality review and content analysis. Aust New Zealand Journal Public Health 2017; 41(6): 625–630.

25.

Escoffery

McGee

Bidwell

, et al. A review of mobile apps for epilepsy self-management. Epilepsy Behav 2018; 81: 62–69.

26.

Terhorst

Philippi

Sander

, et al. Validation of the mobile application rating scale (MARS). PLoS ONE 2020; 15(11): e0241480.

27.

Chun

Yoon

Han

, et al. Digital health care programs for obesity management in south korea: a systematic review. Korea Open Access J 2020; 40(1): 560–591.

28.

Shin

Lee

Park

, et al. The investigational study on health-related mobile application software and its improvement. Regul Res Food Drug Cosmet 2015; 10(1): 1–9.

29.

Elo

Kyngäs

. The qualitative content analysis process. J Adv Nurs 2008; 62(1): 107–115.

30.

Begley

. Using triangulation in nursing research. J Adv Nurs 1996; 24(1): 122–128.

31.

Park

. A Report on the Korea Health Panel Survey of 2017, 2017. Korea Institute for Health and Social Affairs, Korea, http://repository.kihasa.re.kr/handle/201002/34640 (2019 (accessed 25 May 2020).

32.

Ahn

. A review of regulation for health apps worldwideKHIDI Brief. Osong: Korea Health Industry Development Institute, 2021.

33.

Santo

Richtering

Chalmers

, et al. Mobile phone apps to improve medication adherence: a systematic stepwise process to identify high-quality apps. JMIR mHealth and uHealth 2016; 4(4): e132.

34.

Zou

. Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Stat Med 2012; 31(29): 3972–3981.

35.

Feldt

Woodruff

Salih

. Statistical inference for coefficient alpha. Appl Psychol Meas 1987; 11: 93–103.

36.

Dunn

Baguley

Brunsden

. From alpha to omega: a practical solution to the pervasive problem of internal consistency estimation. Br J Psychol 2014; 105: 399–412.

37.

Revelle

Zinbarg

. Coefficients alpha, beta, omega, and the glb: comments on sijtsma. Psychometrika 2009; 74: 145–154.

38.

McNeish

. Thanks coefficient alpha, we’ll take it from here. Psychol Methods 2018; 23: 412–433.

39.

George

Mallery

. SPSS for Windows step by step: A simple guide and reference. Boston: Allyn & Bacon, 2003.

40.

deVellis

. Scale development: theory and applications. London: Sage publication, 2021.

41.

Powell

Landman

Bates

. In search of a few good apps. JAMA 2014; 311(18): 1851–1852.

42.

Torous

Lagan

. To the editor: new approaches toward actionable mobile health evaluation. J Am Med Inform Assoc 2021; 28(10): 2306–2307.

43.

Prakash

Wang

Mitra

. Mobile application processors: Techniques for software power-performance optimization. IEEE Consumer Electro Mag 2020; 9(4): 67–76.

44.

Farao

Malila

Conrad

, et al. A user-centred design framework for mHealth. PLoS ONE 2020; 15(8): e0237910.

45.

Mannonen

Kaipio

Nieminen

. Patient-centred design of healthcare services: meaningful events as basis for patient experiences of families. Stud Health Technology Informatics 2017; 234: 206–210.

46.

McCurdie

Taneva

Casselman

, et al. mHealth consumer apps: the case for user-centered design. Biomed Instrumentation Tech 2012; 46(s2): 49–56.

47.

Schnall

Rojas

Bakken

, et al. A user-centered model for designing consumer mobile health (mHealth) applications (apps). J Biomed Inform 2016; 60: 243–251.

48.

World Health Organization. mHealth . New horizons for health through mobile technologies: second global survey on eHealth. Switzerland, 2011: WHO, https://apps.who.int/iris/handle/10665/44607 (2011 (accessed 2 August 2021).

49.

Dunn

KHK

Nusem

, et al. Building relationships and sustaining dialogue between patients, caregivers and healthcare practitioners: a design evaluation of digital platforms for ventricular assist device users. Des Res Soc 2018; 6: 2346–3236.

50.

Giacomin

. What is human centred design? The Des J 2014; 17(4): 606–623.

51.

Baumel

Faber

Mathur

, et al. Enlight: a comprehensive quality and therapeutic potential evaluation tool for mobile and web-based ehealth interventions. J Med Internet Res 2017; 19: e82.

52.

Parker

Bero

Gillies

, et al. The “Hot Potato” of mental health app regulation: a critical case study of the australian policy arena. Int J Heal Poli Manag 2019; 8(3): 168–176.

Validation of a Korean version of mobile app rating scale (MARS) for apps targeting disease management

Abstract

Keywords

Introduction

Methods

Search strategy: Phase one

Selection criteria: Phase two

Instrument, translation and back translation: Phase three

Assessment of apps: Phase four

Statistical analysis

Objectivity

Reliability

Validity

Results

Objectivity

Reliability

Validity

Discussion

Conclusion

Footnotes

Declaration of conflicting interests

Funding

References