Sage Journals: Discover world-class research

Abstract

Background:

Patient satisfaction surveys in Indonesian public health centers use a four-point Likert scale mandated by the Regulation of the Minister of State Apparatus Empowerment and Bureaucratic Reform (PERMENPAN-RB), which may restrict sensitivity and accuracy. Internationally, broader Likert scales commonly five or seven-point—are widely used to obtain more nuanced responses and have been shown to improve reliability, validity, and cross-cultural comparability of patient-reported outcomes. Aligning Indonesian practices with these standards enhances measurement precision and facilitates benchmarking against global indicators. Accurate measurement of patient satisfaction is critical for benchmarking health service quality worldwide. This study aimed to develop a predictive model that generates scores on a seven-point Likert scale based on responses originally measured using a four-point format.

Design and Method:

This cross-sectional study surveyed 200 outpatients from 2 Public Health Centers (Puskesmas) in Padang City, measuring satisfaction with four- and seven-point Likert scales simultaneously. Regression analysis developed a predictive model converting four-point scores into seven-point equivalents.

Results:

The seven-point scale produced higher satisfaction and performance scores. A strong correlation (r = 0.7573) and significant regression model (p = 0.0001) were found, with Y = 33.96 + 0.656X, explaining 57.3% of the variance.

Conclusion:

The seven-point scale provides a more accurate measure of patient satisfaction. The regression model enables conversion of existing four-point data, supporting more precise assessment of patient satisfaction and service quality, and informing evaluation and policy decisions.

Keywords

Likert scale patient satisfaction public health center PERMENPAN-RB regression model

Significance for public health

Service quality in public health centers (Puskesmas) is critical for healthcare utilization and public health outcomes, making accurate measurement of patient satisfaction essential for evidence-based decision-making. Indonesia’s current public service evaluation system mandates a four-point Likert scale, which has methodological limitations, including restricted response nuance, central tendency bias, and lower validity, sensitivity, and discriminatory power. These weaknesses may distort satisfaction scores and hinder effective quality improvement, particularly when results cluster near national performance thresholds. This study addresses these challenges by examining the relationship between patient satisfaction measured using four-point and seven-point Likert scales and by developing a linear regression-based conversion model. The use of a seven-point Likert scale allows for a more sensitive and representative assessment of patient perceptions. The proposed conversion model supports more accurate performance evaluation while remaining compatible with existing regulatory frameworks, thereby strengthening public health policy formulation, service management, and continuous quality improvement initiatives.

Introduction

Social research studies human interactions in social environments, including public health, often using questionnaires with Likert scales to measure attitudes.¹ Originally developed by Rensis Likert in 1932,² these scales have evolved from dichotomous and 1–5 options to multiple formats, with contemporary research favoring five- or seven-point scales.³

Data quality is significantly impacted by the Likert scale’s design. Likert scale formats have been the subject of numerous studies, with a particular emphasis on the number of response categories.⁴ Response reliability is a crucial factor to consider when determining the number of response categories. Determining the ideal number of Likert categories is another common application of internal consistency. The first team of researchers came to the conclusion that the number of response categories has no bearing on internal consistency.^5,6 In contrast, the second viewpoint holds that more response categories can increase the scale’s internal consistency and improve usability.⁷ A third viewpoint argues that a measurement scale has an ideal number of response categories: for example, 3 categories,⁸ 4 categories,⁹ 5 categories,¹⁰ 7 categories^3,11, 8, 9 or even 10 categories.³

Similarly, concerning validity, three viewpoints exist. The first view posits that these two are unrelated.^3,12,13 The second view holds that they are related.^3,14,15 The third view asserts that increasing the number of response categories improves the validity of the responses.³ Some authors recommend using five or seven response categories.^4,14,15

The respondent’s preference for usability is the third factor. Although scales with fewer categories run the risk of not accurately capturing respondent nuance, scales with five, seven, and ten response points are subjectively easier for respondents to complete.^3,4 The fourth consideration is linearity and the fifth is the scale’s sensitivity. Interval (linear) data have advantages because they contain more information than ordinal scales.⁴ While some researchers have critiqued the measurement quality of psychological scales, paired comparisons can address these concerns by constructing models and providing a framework to validate the linearity of response scales.⁴

The mathematical quantification of Likert scales remains a subject of debate, as they are often viewed as ordinal data where the intervals between responses cannot be assumed equal.¹⁶ However, some researchers contend that when several items measuring a single construct are suitably aggregated, Likert scales can serve as interval data. These composite scores can be regarded as numerical data appropriate for factor modeling and predictive analysis if they produce rankable results with quantifiable intervals.^1,17 Nevertheless, this assumption may not hold in all contexts, and caution is warranted when interpreting the results.

In Indonesia, public service satisfaction surveys are mandated to use a four-point scale (Permenpan-RB No. 14/2017).¹⁸ According to this rule, public-service institutions must use a four-point rating system, with scores multiplied by 25 to reach a maximum of 100. However, this approach is limited because the use of only four response categories may restrict variability in satisfaction, quality, and performance scores. This problem is demonstrated by empirical research: on a four-point scale, average satisfaction scores were 2.96 (73.94%) in 2018 and 3.05 (76.36%) in 2019, falling short of the national threshold of 76.61%. In 2022, scores increased to 93.61% when a seven-point Likert scale was used, indicating improved sensitivity and accuracy.¹⁹

All Indonesian institutions, including Public Health Centers (Puskesmas), conduct public service satisfaction surveys as required by Permenpan-RB. For example, a number of Public Health Centers offers their clients a link to an online survey that is open to the public. When interpreting Likert-scale data, researchers and policymakers should be cautious because the number of response categories can significantly affect conclusions and managerial choices. Almost all respondents must choose the highest response option on the four-point Likert scale in order to meet the national satisfaction standard (≥76.61%) in the Indonesian healthcare context, which requires a mean score of 3.53. However, respondents typically cluster around the midpoint and steer clear of extreme categories, leading to satisfaction scores of about 75%. This measurement bias creates a misalignment with national performance indicators and poses significant challenges for Puskesmas in performance assessment, quality improvement planning, and public accountability.

Experts assert that the number of response points on a Likert scale affects reliability, validity, and discriminant power.³ Preston and Colman found that two-, three-, and four-point scales show lower reliability, validity, and discrimination than five-, six-, and seven-point scales, while stability improves up to 10 points but declines beyond that. Respondents prefer 5-, 7-, and 10-point scales for ease of use.³ Dawes (2008) reported that five- and seven-point scales produce similar means, higher than a 10-point scale.²⁰ Budiaji recommended a seven-point Likert scale for attitude measurement because of its superior validity, reliability, discriminant power, and respondent approval, while four-point scales are weak and limit nuanced responses.¹⁷

Several studies have examined the effects of different Likert scale lengths on survey responses. One study found that a seven-point scale may be optimal, as it balances granularity of responses with cognitive burden for respondents.²¹ However, if respondents need to be pushed toward one end of the scale (for instance, to avoid a neutral midpoint), a six-point scale might be more suitable. According to a different study, the distance between responses on an ordinal Likert scale cannot be taken to be measurable, and differences between response options are not always equal.²²

Accordingly, researchers should exercise caution when interpreting and comparing mean scores from Likert scales, as differences between response options may not be equidistant.^21–23 Accurate patient satisfaction measurement is essential for performance monitoring and ongoing quality improvement because public health centers (Puskesmas) play a crucial role in providing essential healthcare services. The measurement tools and analytical techniques employed are crucial to the validity of satisfaction surveys because methodological flaws can produce inaccurate results. By contrasting four-point and seven-point Likert scales, this study investigates the impact of expanding response categories on measurement sensitivity and data quality. This study creates a predictive model to modify patient satisfaction scores from the four-point scale required by the Ministry of Administrative and Bureaucratic Reform (Permenpan-RB), building on evidence that seven-point Likert scales offer higher sensitivity and reliability. The proposed model is intended to produce a more nuanced and representative assessment of patient perceptions while remaining compatible with existing regulatory requirements. Accurate measurement of patient satisfaction is critical for benchmarking health service quality worldwide. The aim of this study is to develop a predictive model that generates scores on a seven-point Likert scale based on responses originally measured using a four-point format.

Design and method

Study setting

Two public health centers (Puskesmas) in Padang City served as the study sites. These two centers were chosen because they are situated in heavily populated regions that are primarily home to migrants. The research was carried out from January to July of 2025.

Study design and variables

The research design used was a cross-sectional study. Respondents were asked the same questions simultaneously using a four- and seven-point Likert scale. The dependent variable was the patient satisfaction level measured using the seven-point Likert scale, while the independent variable was the patient satisfaction level measured using the four-point Likert scale.

Sampling and study participants

The sample size complied with Indonesian regulations, which mandate that each health center conduct satisfaction surveys every 3 months with a minimum of 50 respondents. Over 6 months, 200 respondents (100 from each center) participated in the study. Participants were companions or outpatients between the ages of 18 and 70 who had made at least three visits. Communication problems and unwillingness to participate were among the exclusion criteria. Quota sampling was used, presuming uniformity among the chosen centers.

Data collection and instrument

The instrument for measuring patient satisfaction in public service institutions refers to PermenPAN-RB No. 14 of 2017 as the gold standard for assessing service quality in public institutions. As this instrument is established by government regulation, it is assumed to be valid, and further validity or reliability testing was not performed. Every public service institution in Indonesia is required to measure customer/patient satisfaction periodically, at least once a year.¹⁸ According to Permenpan-RB No. 14/2017, community satisfaction is measured using nine indicators simultaneously with four response options (1 = Poor, 2 = Fair, 3 = Good, and 4 = Very Good) and seven options (1 = Very Poor, 2 = Poor, 3 = Fairly Poor, 4 = Undecided, 5 = Somewhat Good, 6 = Good, and 7 = Very Good). The indicators are as follows:

(a) Requirements,

(b) System, Mechanism, and Procedure,

(d) Cost/Tariff,

(e) Product Specifications and Types of Services,

(f) Implementer Competence,

(g) Implementer Behavior,

(h) Handling of Complaints, Suggestions, and Feedback, and

(i) Facilities and Infrastructure

Data processing and analysis

Data were processed and analyzed using univariate and bivariate methods and presented in comparative tables showing satisfaction levels on four-point and seven-point Likert scales, including Mean Score, Respondents Satisfaction Index (RSI), Service Quality (SQ), and Service Unit Performance. Patient satisfaction scores were calculated following Permenpan-RB guidelines by dividing the total perception score by the number of questions (average), then multiplying by 25 (for four-point Likert Scale) to produce a satisfaction index with a maximum value of 100. These converted scores were analyzed using the Ministry’s indicator table, as follows:

The multiplier for the seven-point Likert scale is 100/7, then adjusted to the Converted Interval Value (CIV) in Table 1. Univariate analysis was conducted to describe patient satisfaction levels based on two measurement methods (four-point or seven-point Likert Scale), while bivariate analysis examined the correlation between scores on the two scales.

Table 1.

Table of conversion values for respondent satisfaction index, service quality, and performance of public service providers.

Perception score	Interval value (IV)	Converted interval value (CIV)	Service quality (SQ)	Service unit performance
1	1.00–2.5996	25.00–64.99	D	Poor
2	2.60–3.064	65.00–76.60	C	Fair
3	3.0644–3.532	76.61–88.30	B	Good
4	3.5324–4.00	88.31–100.00	A	Very Good

Source: Permenpan-RB No. 14 of 2017.¹⁸

Data were analyzed using correlation analysis to assess the relationship between patient responses measured on the four-point and seven-point Likert scales, followed by regression analysis to establish a linear relationship model. Correlation analysis was applied to convert the existing Permenpan-RB model, as inaccurate scoring may lead to confusing results and hinder management in implementing service quality improvements. Regression analysis was employed as a non-causal linking transformation for predictive purposes. The objectives of the predictive and scale development analyses were to calibrate previously reported results and to promote the adoption of a more sensitive seven-point scale.

The conceptual framework of this study is grounded in Psychometric Linking and Scale Transformation theory.^24–26 The literature on test equating and score linking in the domains of psychometrics is where the idea of linking that is, connecting two distinct scales to measure the same underlying construct comes from. The fundamental idea is that, for the purposes of score conversion, calibration, or equating, the relationship between two tests or scales that measure the same construct and are given to the same or comparable populations can be empirically modeled (usually in a linear form). The methodological rationale for connecting four-point and seven-point Likert scale scores via linear regression analysis and using the resulting model for score conversion is provided by this theoretical framework, as shown in Figure 1.

Figure 1.

Conceptual framework.

Hypothesis

This study hypothesizes that responses measured using the four-point Likert scale are significantly associated with those measured using the seven-point Likert scale.

Result

Characteristic of respondents

The demographic and characteristic profiles of participants in the patient satisfaction survey at two Padang City public health centers are shown in Table 2. This study involved 200 respondents in total, 100 from Belimbing Public Health Center and 100 from Seberang Padang Public Health Center. 30% of respondents were men, and 70% of respondents were women. The majority of respondents were either over 60 (39.5%) or between 41 and 60 (41%). The majority of participants had either completed junior high school (28%) or senior high school (44%). In terms of occupation, housewives (37.5%) and fishermen (22.5%) made up the largest percentage of respondents.

Table 2.

Characteristics of respondents.

No	Characteristics of respondents	Frequency (f)	Percentage (%)
1	Gender
	Man	60	30%
	Woman	140	70%
2	Age
	Age 17–25 years	13	6.5%
	Age 26–40 years	26	13%
	Age 41–60 years	82	41%
	Age >60 years	79	39.5%
3	Education
	Junior high school/equivalent	56	28%
	Senior high school/equivalent	88	44%
	Diploma 1/2/3	12	6%
	Bachelor’s degree (S1/D4)	42	21%
	Master’s degree (S2)	1	0.5%
	Doctoral degree (S3)	1	0.5%
4	Occupation
	Government employee/retired	17	8.5%
	Private employee/retired	17	8.5%
	Trader/laborer/farmer/fisherman	45	22.5%
	Entrepreneur	12	6%
	Student	8	4%
	Housewife	75	37.5%
	Unemployed	25	12.5%
	Others	1	0.5%
	Total	200	100%

Source: Data processing results.

The survey results were categorized according to Permenpan-RB

The study highlights significant differences in patient satisfaction levels between the four-point and seven-point Likert scales. Table 3 details service quality and performance at public health centers (Puskesmas) in Padang City.

Table 3.

Satisfaction comparison: four versus seven-point scale at Puskesmas in Padang City.

No	Indicator	Four-point likert scale				Seven-point likert scale
No	Indicator	Mean	RSI	Quality	Performance	Mean	RSI	Quality	Performance
1	Service requirements for the types of services provided at this Public Health Center (Puskesmas)	3.13	78.13	B	Good	6.02	86.00	B	Good
2	Ease of service procedures at this Puskesmas	3.14	78.38	B	Good	5.88	84.00	B	Good
3	Speed/responsiveness in providing services at this Puskesmas	2.94	73.50	C	Fair	5.54	79.07	B	Good
4	Cost incurred to obtain services at this Puskesmas	3.28	81.88	B	Good	6.04	86.29	B	Good
5	Conformity of service products between the service standards and the actual results provided at this Puskesmas	3.07	76.75	B	Good	5.92	84.57	B	Good
6	Competence/ability of staff in providing services at this Puskesmas	3.23	80.75	B	Good	6.21	88.71	A	Very Good
7	Behavior of staff in providing services at this Puskesmas	3.16	78.88	B	Good	6.01	85.79	B	Good
8	Quality of facilities and infrastructure at this Puskesmas	2.93	73.25	C	Fair	5.87	83.79	B	Good
9	Handling of user complaints at this Puskesmas	2.90	72.38	C	Fair	5.78	82.57	B	Good
	Mean	3.08	77.10	B	Good	5.92	84.53	B	Good

Source: Data processing results.

Comparison of patient satisfaction shows a mean score of 77.10 on the four-point Likert scale and 84.53 on the seven-point scale, both classified as service quality “B” and performance “Good.” However, the seven-point scale reflects higher satisfaction and broader response range. On the four-point scale, six indicators were rated “B” and three “C,” with none rated “A.” In contrast, the seven-point scale produced eight “B” ratings and one “A,” indicating greater differentiation in responses.

Comparison of service quality by Likert scale type

Table 4 shows that on the four-point Likert scale, most respondents rated service quality as “C” (51%), followed by “B” (43.5%). In contrast, on the seven-point scale, the majority rated “B” (75%) and “A” (19%).

Table 4.

Comparison of service quality by Likert scale type.

No	Service quality (SQ)	Four-point Likert scale		Seven-point Likert scale
No	Service quality (SQ)	Frequency (f)	Percentage (%)	Frequency (f)	Percentage (%)
1	A	7	3.5%	38	19%
2	B	87	43.5%	150	75%
3	C	102	51%	12	6%
4	D	4	2%	0	0%
	Total	200	100%	200	100%

Source: Data processing results.

Link between satisfaction scores on different Likert scales

Table 5 shows a strong positive correlation (r = 0.7573) between the four-point and seven-point Likert scales, indicating that higher scores on one scale correspond to higher scores on the other. The relationship is statistically significant (p = 0.000).

Table 5.

Correlation and regression analysis between the four- and seven-point Likert scale.

Variable	r	R ²	Equation of the line	p-value
Respondent satisfaction index value (RSI)	0.7573	0.573	Seven-point Likert scale = 33.96 + 0.656 × 4-point Likert scale	0.000

Source: Data processing results.

The regression model explains 57.3% of the variation in seven-point satisfaction scores (R² = 0.573) and is statistically significant (p = 0.000). The equation is: Y = 33.96 + 0.656X, where Y = 7-point score and X = 4-point score. Each one-point increase on the four-point scale raises the seven-point score by 0.656 points.

Predicting the outcome variable (seven-point Likert scale) based on predictor variable (four-point Likert scale)

Figure 2 shows the prediction model for seven-point Likert scores based on four-point scores, following the regression line pattern. The equation is Y = 33.96 + 0.66X, explaining 57.3% of the variation in satisfaction levels. We can predict the outcome variable (seven-point Likert scale) from the obtained regression equation using the predictor variable (four-point Likert scale).

Figure 2.

Prediction model of satisfaction level on a seven-point scale based on a four-point Likert scale.

Discussion

Respondent characteristics

Age and gender are the primary determinants of healthcare utilization in Indonesia.²⁷ The majority of the patients we examined at the community health centers were women 40 years of age and older, as Table 2 illustrates. According to Indonesian social norms, women are more likely to interact with healthcare professionals at community health centers and play a significant role in managing family health. A number of factors, including need, the high prevalence of non-communicable diseases, improved health-seeking behavior, and easier access to healthcare services in the age of national health insurance, contribute to the rising demand for healthcare services among people over 40.²⁷ Demographically, the largest group in Indonesia has a high school education or equivalent. This influences the respondents, who are generally high school or equivalent educated. Generally, the unemployed are more covered by the National Health Insurance (JKN) contribution assistance (PBI) scheme, thus increasing utilization of the unemployed population.²⁷ JKN subsidy recipients utilize public healthcare facilities more frequently than formal workers, who tend to be more selective in choosing other healthcare facilities.²⁷

Comparison of patient satisfaction levels using four- and seven-point Likert scale

Table 3 shows that the percentage of patient satisfaction scores (on a 100-point scale), service quality, and service performance levels are higher when using the seven-point Likert scale than the four-point Likert scale. The use of a seven-point Likert scale increases the likelihood of obtaining higher scores because respondents have a wider range of response options.

The Regulation of the Minister for Administrative Reform and Bureaucratic Reform (PermenpanRB), which mandates the use of a four-point Likert scale, has generated debate in the literature. Based on a number of factors, this study suggests using a seven-point Likert scale. First, the question of whether the Likert scale belongs in the ordinal or interval scale category is still up for debate. Because the Likert scale is made up of ranked categories, it is regarded as an ordinal scale; however, it cannot be strictly considered an interval scale because the distances between categories are not equal. However, there has been a long-standing academic dispute because some experts treat the Likert scale as an interval scale.¹⁶

Second, depending on whether a midpoint in the form of a neutral response is present, the Likert scale used in questionnaire items may be considered either ordinal data or interval data. The data are treated as ordinal data and cannot be analyzed in the same manner as interval data when a Likert scale uses an even number of response categories (e.g. 4, 6, etc.) without a neutral option. However, the data may be handled and examined as interval data if a question has a neutral response option, which is an odd number of response categories (e.g. 5, 7, etc.).²⁸ This is because the presence of a neutral response is considered to represent a psychological distance (interval) between categories, particularly in seven-point scales.²⁹ The four-point Likert scale used in PermenPAN-RB is considered ordinal when analyzed per item, whereas the proposed seven-point Likert scale can be treated as an interval scale.

In this study, the Likert scale is treated from an interval perspective because the data are derived from the average satisfaction scores across nine response alternatives for an individual patient, collected simultaneously using both four-point and seven-point scales. When multiple responses from a Likert scale are combined to measure a single indicator, they can be ranked and treated as having measurable intervals. Some experts consider the Likert scale to function as an interval scale when measurements are quantified from a set of questions, allowing the data to be processed numerically.³⁰ In this case, the Likert scale can be used for factor modeling and predictive purposes.¹⁷

Third, the Regulation of the Minister for Administrative and Bureaucratic Reform (PermenPANRB), which uses a four-point Likert scale to assess patient satisfaction levels, is deemed less equitable because it uses an unduly large constant. An interval scale with a maximum score of 100 is created by multiplying the mean response score on the four-point Likert scale by a constant of 25. Because the constant value is too high (100/4), this is regarded as unfair. In this instance, the constant rather than the patient’s evaluation usually determines the degree of patient satisfaction. On the other hand, with a seven-point Likert scale, the constant value is minimized (100/7) which results in the level of patient satisfaction being determined more by their actual answers. This indicates that more response options can improve the accuracy of satisfaction measurement, enabling management to implement service quality improvements more effectively.

Fourth, when a four-point Likert scale is used for measurements, respondents typically choose middle categories (satisfied = 3) rather than extreme ones (very satisfied = 4), which results in mean satisfaction levels that are close to 75%. The avoidance of extreme responses and central tendency bias are two phenomena that have been extensively documented in the literature on healthcare surveys.^31,32 In the context of healthcare services, this condition is rarely occurs due to the inherently multidimensional and integrated nature of healthcare delivery, which encompasses administrative, clinical, and financing aspects. Consequently, patients’ perceptions are not solely determined by the quality of medical interventions. In the era of the National Health Insurance scheme (BPJS), factors such as waiting time, referral procedures, and administrative restrictions also shape healthcare utilization and patient experiences,^19,27 even when clinical services are considered adequate. On the other hand, the minimum satisfaction threshold is set at or above 76.61% according to the national quality indicator. Consequently, some health centers fail to exceed this threshold not because of poor service quality, but due to errors in the measurement instrument. Based on the reasons outlined and in line with literature highlighting the advantages of the seven-point Likert scale, the researcher recommends using a seven-point Likert scale to obtain a more representative measure of patient satisfaction.

Correlation between four-point and seven-point and regression modeling

Likert scales that are treated as interval data are analyzed in the same manner as interval-level data using parametric statistical techniques, both for descriptive statistical analyses (such as mean, standard deviation, frequency, percentage, and others) and for inferential statistical analyses (including correlation and regression).^33,34 Regression analysis (to determine the relationship model through a linear equation) and correlation analysis (to link patient responses between four-point and seven-point Likert scales) were used to analyze the data in this study. Because it was assumed that patients’ perceptions of satisfaction were synchronized and consistent, responses on the four-point Likert scale were correlated with those on the seven-point scale. This refers to the psychometric theory of linking and scale transformation. There are three requirements for applying this theory: (1) both items (four-point and seven-point scales) measure the same construct, namely patient satisfaction; (2) they are administered simultaneously to the same respondents; (3) the relationship between the two items (four-point and seven-point scales) can be empirically modeled through conversion using simple linear regression. Correlation and regression analysis are appropriate for two variables that are psychometrically linked.^24–26 Patient perceptions of healthcare service experiences are inherently unified and internally consistent. When a patient reports satisfaction using a four-point Likert scale, this response is expected to correlate with the corresponding response selected on a seven-point scale for the same item.

A linear regression model between the two scales was created using regression analysis. This method was selected to encourage the use of a more sensitive seven-point scale while converting patient satisfaction scores that were previously measured using the four-point scale, as advised by PermenPAN-RB. Regression model development and correlation analysis came after the conversion, which was based on real questionnaire responses from both scales. Since this study only assesses satisfaction on a single four-point scale and then uses interval grouping or linear transformation to predict seven-point responses, mathematical modeling was not used. Furthermore, because such modeling relies on fixed formulas that do not account for empirical variation or evidence-based analysis from respondents, it yields a perfect correlation (r = 1).

Instead of statistically comparing the validity of the two scales—a topic already covered by earlier research. This study concentrates on creating a conversion model from the four-point Likert scale to the seven-point Likert scale for evaluating patient satisfaction. Scales with four response options are linked to lower levels of validity, reliability, and discriminant power, according to the literature. On the other hand, seven-point scales are thought to be more sensitive, valid, and reliable than both even-numbered and other odd-numbered scales. They also show better discriminant capacity.^3,4,17

The results of the correlation and regression analysis show a significant relationship between the outcome variable (seven-point Likert scale) and the predictor variable (four-point scale). This relationship can be expressed as follows: seven-point Likert scale = 33.96 + 0.656 × four-point Likert scale. The four-point scale does not cause the seven-point scale in this situation because there is no causal relationship between the two variables. Patient satisfaction is the latent construct that is measured by both scales. The four-point scale is the independent variable (predictor) and the seven-point scale is the dependent variable (outcome), and the terms “independent” and “dependent” are only used operationally to interpret the regression analysis.

With a statistically significant outcome (p = 0.0001), the regression equation offers a good fit, accounting for 57.3% of the variation in seven-point scores (R² = 0.573). In social science research, an R-squared value between 0.50 and 0.99 is deemed appropriate, especially when the explanatory variables show statistical significance.³⁵ The remaining 42.7% is attributed to unmeasured factors, such as differences in responses from the same respondent brought on by cognitive bias, mood bias when responding to questions, random factors (noise), and other influences.³⁶ These factors can lead to errors in responses even when using the same scale. Due to the complexity of human behavior and other social influences that are hard to quantify, social science research frequently finds such residual variation.

This study recommends using a seven-point Likert scale to address the misalignment between PermenPAN-RB survey guidelines and healthcare performance evaluation standards. Such discrepancies create challenges for public health centers in assessing service quality and accountability. Evidence suggests that scales with more response options yield more sensitive and valid data, enabling better identification of areas for improvement.^3,20 Using a seven-point Likert scale can enhance patient engagement in providing feedback, thereby improving service quality. This study also highlights that the number of response options significantly affects the quality and reliability of collected data.¹ A deeper understanding of patient satisfaction enables management to identify areas for improvement and implement strategies to enhance the patient experience. Prior research also shows that more sensitive measurements can improve patient responses and overall satisfaction.^3,20

The combination of clinical audits, user feedback, and insights derived from patient safety complaints underscores the need for valid and reliable measurement instruments to accurately capture patients’ perceptions of service quality. Enhancing service quality largely relies on the active engagement of frontline healthcare personnel and the meaningful involvement of patients in the evaluation process.³⁷ Patients can be important sources of information when assessing a Puskesmas through surveys that receive representative responses. Assessing patient satisfaction is essential to raising the standard of healthcare. Likert scales facilitate international comparisons, support cross-cultural use, and improve accuracy and dependability. To guarantee accurate and reliable results, these tools must be continuously improved and validated.³⁸

Study limitation

This study has several limitations. First, the research was conducted in only two public health centers (Puskesmas). Generalizing the findings to the Indonesian population would require study sites that are more representative of the broader demographic and service contexts. Second, the research instrument used in this study is a nationally standardized tool established by the government to measure patient satisfaction across all population groups in Indonesia. For this reason, the researchers did not perform a Rasch analysis. A Rasch analysis can ensure that each item functions fairly across all respondent subgroups (person reliability). Differential Item Functioning (DIF) assesses whether any questionnaire items are biased toward certain groups. Prior studies have identified potential sources of DIF in patient satisfaction instruments, including³⁹: (a) gender, (b) age, (c) education, (d) socioeconomic status, (e) clinical condition, (f) length of care experience, (g) language or cultural background, (h) type of service received, and (i) response style. DIF is designed to compare item functioning across different respondent groups, whereas this study compares satisfaction scores from the same respondents, measured using different Likert scale formats.

Respondents who receive healthcare services of similar quality and under comparable conditions may still express different perceptions (responses). For example, in the item “completion time,” older respondents may have lower expectations and greater tolerance for waiting time, leading them to provide higher ratings. Conversely, younger respondents who are accustomed to fast and efficient digital services—and who generally have higher expectations—may assign lower ratings.

Differences in response styles may also lead to DIF, for example when some respondents tend to select high values (such as six or seven on a seven-point Likert scale), while other groups rarely choose such extreme categories. However, the presence of extreme responding is not an issue in this study because the same respondents demonstrated consistent response patterns when assessed using different Likert scales (four-point and seven-point). The tendency to choose extreme categories is a stable characteristic within individuals; therefore, this response style persists even when the number of scale categories is changed.⁴⁰

Practical implications

The number of response options on the Likert scale can have a significant impact on how the data is interpreted and analyzed. The study’s findings show that a seven-point Likert scale yields more accurate and representative data when used to gage patient satisfaction. The practical implication is that Puskesmas can use this scale to create service improvement initiatives that are more successful. There is a compelling case for updating the PERMENPAN-RB framework given the shortcomings of the existing standard. The suggested model can bridge the gap between current data and better measurement standards in national surveys by acting as a complementary or transitional tool. However, practical application requires caution, as interpreting Likert data at the interval level remains debated. This study’s limitations include only two Puskesmas in Padang City. Further research could be conducted on a broader scale, representing all districts and cities in Indonesia. Additionally, the Ministry of Administrative and Bureaucratic Reform’s (Permenpan RB) regulations, which are regarded as legitimate and trustworthy, served as the foundation for the questionnaire’s development. Consequently, no additional validity or reliability testing, including Rasch analysis, was performed in this study. Studies on patient satisfaction that use different instruments and social contexts may conduct such analyses to ensure person reliability.

Regression analysis may produce biased estimates when applied to non-homogeneous data, particularly in the context of Indonesia’s highly diverse sociodemographic characteristics. Puskesmas must make sure that converted scores accurately represent patient experiences and are free from statistical artifacts. Notwithstanding these obstacles, the model provides a workable, empirically supported way to improve satisfaction assessment in Indonesia’s public health system. Practically, public health centers (Puskesmas) can convert existing four-point Likert data to seven-point equivalents using the regression model, avoiding the need for new data collection. Puskesmas and survey agencies can redesign instruments with seven-point scales to improve data granularity and representativeness. The converted scores allow for more precise performance evaluations and better alignment with patient perceptions. Moreover, the model provides a foundation for data-driven policy formulation, ensuring decisions are responsive to community needs and based on valid satisfaction metrics.

Conclusion and recommendations

Using a seven-point Likert scale tends to produce higher average satisfaction scores than the four-point scale mandated by Permenpan-RB, as respondents have a wider range of choices. This study concludes that there is a significant relationship between the four-point and seven-point Likert scales in measuring patient satisfaction (p = 0.0001). However, rather than being causal, this relationship shows psychometric linking. The four-point scale accounts for 57.3% of the variability in the seven-point scale, which is regarded as a strong proportion in social research. The remaining variance is probably due to random error, mood bias, and cognitive bias. To align current measures, this study offers a conversion formula (Patient Satisfaction in seven-point Likert Scale = 33.96 + 0.656 × four-point Likert Scale). However, the results are only applicable to two public health centers (Puskesmas), which limits their generalizability. Broader samples should be used in future studies, and when using the model, policymakers need to make sure that linearity assumptions are met.

Supplemental Material

sj-docx-1-phj-10.1177_22799036261441329 – Supplemental material for Beyond the standard: Rethinking Likert scale use in measuring patient satisfaction at public health center in Indonesia

Supplemental material, sj-docx-1-phj-10.1177_22799036261441329 for Beyond the standard: Rethinking Likert scale use in measuring patient satisfaction at public health center in Indonesia by Adila Kasni Astiena, Yudiantri Asdi and Rika Ampuh Hadiguna in Journal of Public Health Research

Footnotes

Acknowledgements

The authors would like to express their gratitude to the enumerators, namely the clinical clerkship students of Public Health from the Faculty of Medicine, Andalas University. Appreciation is also extended to the Heads of Belimbing Public Health Center and Seberang Padang Public Health Center, as well as the technical staff who were involved throughout the research process.

ORCID iDs

Adila Kasni Astiena

Yudiantri Asdi

Rika Ampuh Hadiguna

Ethical considerations

This study was approved by the Research Ethics Committee of the Faculty of Medicine, Universitas Andalas, Indonesia, under Certificate No. 102 a/UN.16.2/KEP-FK/2025.

Consent to participate

This study is an observational study that does not involve any direct intervention or risk to participants. However, informed consent was obtained from all respondents, anonymity was maintained, and permission was granted by the relevant authorities prior data collection. Participants were fully informed about the aims and procedures of the study, and gave their written consent prior to participation. Anonymity was maintained, and permission was granted by the relevant authority prior data collection.

Consent for publication

All relevant stakeholders have been consulted and have agreed to the publication of this study.

Author contributions

AKA conceived and designed the study, collected and analyzed the data, and drafted the manuscript. YA performed the statistical analysis and conducted the literature review. RAH analyzed the data and contributed to the discussion of the study findings. All authors reviewed and approved the final version of the manuscript prior to submission.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The data will be made available at reasonable request to the corresponding author*.

Supplemental material

Supplemental material for this article is available online.

References

Bishop

Herron

RL.

Use and misuse of the Likert item responses and other ordinal measures. Int J Exerc Sci 2015; 8: 297–302. https://doi.org/10.70252/LANZ1453

Likert

A technique for the measurement of attitudes. Arch Psychol 1932; 22: 1932–1953.

Preston

Colman

AM.

Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychol 2000; 104: 1–15. https://doi.org/10.1016/s0001-6918(99)00050-5

Hofmans

Theuns

Mairesse

Impact of the number of response categories on linearity and sensitivity of self-anchoring scales: a functional measurement approach. Methodol Eur J 2007; 3: 160–169. https://doi.org/10.1027/1614-2241.3.4.160

Bendig

AW.

The reliability of self-ratings as a function of the amount of verbal anchoring and of the number of categories on the scale. J Appl Psychol 1953; 37: 38–41. https://doi.org/10.1037/h0057911

Matell

Jacoby

Is there an optimal number of alternatives for Likert scale items? Study i: reliability and validity. Educ Psychol Meas 1971; 31: 657–674. https://doi.org/10.1177/001316447103100307

Churchill

Peter

JP.

Research design effects on the reliability of rating scales: a meta-analysis. J Mark Res 1984; 21: 360–375. https://doi.org/10.1177/002224378402100402

Bendig

AW.

Reliability and the number of rating-scale categories. J Appl Psychol 1954; 38: 38–40. https://doi.org/10.1037/H0055647

Bendig

AW.

Reliability of short rating scales and the heterogeneity of the rated stimuli. J Appl Psychol 1954; 38: 167–170. https://doi.org/10.1037/h0059072

10.

Jenkins

Taber

TD.

A Monte Carlo study of factors affecting three indices of composite scale reliability. J Appl Psychol 1977; 62: 392–398. https://doi.org/10.1037/0021-9010.62.4.392

11.

Ramsay

JO.

The effect of number of categories in rating scales on precision of estimation of scale values. Psychometrika 1973; 38: 513–532. https://doi.org/10.1007/BF02291492

12.

Chang

A psychometric evaluation of 4-point and 6-point Likert-type scales in relation to reliability and validity. Appl Psychol Meas 1994; 18: 205–215. https://doi.org/10.1177/014662169401800302

13.

McCallum

Keith

Wiebe

DJ.

Comparison of response formats for multidimensional health locus of control scales: six levels versus two levels. J Pers Assess 1988; 52: 732–736. https://doi.org/10.1207/s15327752jpa5204_12

14.

Muñiz

Garcı́a-Cueto

Lozano

LM.

Item format and the psychometric properties of the Eysenck personality questionnaire. Pers Individ Dif 2005; 38: 61–69. https://doi.org/10.1016/j.paid.2004.03.021

15.

Velicer

Stevenson

JF.

The relation between item format and the structure of the Eysenck personality inventory. Appl Psychol Meas 1978; 2: 293–304. https://doi.org/10.1177/014662167800200210

16.

Jamieson

Likert scales: how to (ab)use them. Med Educ 2004; 38: 1217–1218. https://doi.org/10.1111/j.1365-2929.2004.02012.x

17.

Budiaji

Measurement scale and number of responses on the likert scale [Skala Pengukuran Dan Jumlah Respon Skala Likert] in Indonesia. J Ilmu Pertan dan Perikan December 2013; 2: 125–131. https://doi.org/10.31227/osf.io/k7bgy

18.

MENPAN-RB. Peraturan Menteri Pendayagunaan Aparatur Negara Dan Reformasi Birokrasi (Permenpan-RB) Republik Indonesia Nomor 14 Tahun 2017 Tentang Pedoman Penyusunan Survei Kepuasan Masyarakat Unit Penyelenggara Pelayanan Publik. Jakarta, 2017. https://peraturan.bpk.go.id/Details/132600/permen-pan-rb-no-14-tahun-2017

19.

Astiena

Lipoeto

Azmi

, et al. Survey report on minimum service standards for waiting time and customer satisfaction at RSUD Rasidin 2022 [Laporan Survei Standar Pelayanan Minimum Waktu Tunggu dan Kepuasan Pelanggan RSUD Rasidin Tahun 2022] in Indonesia. https://drive.google.Com/file/d/1V8xCrnqI79IwlqeaBquib2Ij5x2W01os/view?usp=sharing.2022.

20.

Dawes

Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point and 10-point scales. Int J Mark Res 2008; 50: 61–104. https://doi.org/10.1177/147078530805000106

21.

Taherdoost

What is the best response scale for survey and questionnaire design; review of different lengths of rating scale/attitude scale/Likert scale. Int J Acad Res Manag 2019; 8: 2296–1747. https://ssrn.com/abstract=3588604

22.

Sullivan

Artino

AR.

Analyzing and interpreting data from Likert-type scales. J Grad Med Educ 2013; 5: 541–542. https://doi.org/10.4300/JGME-5-4-18

23.

Chiang

Jhangiani

Price

Constructing survey questionnaires. In: Research methods in psychology, 2nd Canadian ed, https://ecampusontario.pressbooks.pub/researchmethods/chapter/constructing-survey-questionnaires/ (2015, accessed 18 October 2025).

24.

Dorans

Holland

PW.

Population invariance and the equitability of tests: basic theory and the linear case. J Educ Meas 2000; 37: 281–306.

25.

Kolen

Brennan

(eds). Test equating, scaling, and linking: methods and practices. 2014, vol. xxi, p.566.

26.

Kleffe

Gnanadesikan

Gupta

, et al. Handbook of statistics contents of previous volumes reduction of dimensionality. 1980, vol. 1, pp.1143–1169.

27.

Cheng

Fattah

Susilo

, et al. Determinants of healthcare utilization under the Indonesian national health insurance system - a cross-sectional study. BMC Health Serv Res 2025; 25: 48. https://doi.org/10.1186/s12913-024-11951-8

28.

Imam

Dyana

Riski

MH.

International journal of educational methodology number of response options, reliability, validity, and potential bias in the use of the Likert scale education and social science research. Lit Rev 2022; 8: 625–637. https://doi.org/10.12973/ijem.8.4.625

29.

Sirganci

Uyumaz

Determining the factors affecting the psychological distance between categories in the rating scale to cite this article : determining the factors affecting the psychological distance between categories in the rating scale. Int J Contemp Educ Res 2021; 8: 178–190. https://doi.org/10.33200/ijcer.858599

30.

Boone

Virginia

Analyzing Likert data Likert-type versus Likert scales. 2012; 50. Article no. 2TO2. https://doi.org/10.34068/joe.50.02.48

31.

Sabolić

Samuelson

MB.

Mitigating central tendency and acquiescence biases in survey design : a methodological exploration. 2024; 115–138. https://doi.org/10.7494/human.2024.23.2.6706

32.

Cefalu

Elliott

Hays

RD.

Adjustment of patient experience surveys for how people respond. Med Care 2021; 59: 202–205. https://doi.org/10.1097/MLR.0000000000001489

33.

Chen

Methods to analyze Likert-type data in educational technology research. 2020; 13: 39–60. https://doi.org/10.18785/jetde.1302.04

34.

Silan

MA.

When can we treat Likert type data as interval ?

Hal Open Science 2025: 1–11. https://doi.org/10.31234/osf.io/wvkyu

35.

Ozili

PK.

The acceptable R-square in empirical modelling for social science research. SSRN Electr J 2023. https://mpra.ub.uni-muenchen.de/115769/; https://doi.org/10.2139/ssrn.4128165.

36.

Andrade

Understanding statistical noise in research: 3. Noise in regression analysis. Indian J Psychol Med 2023; 45: 310–311. https://doi.org/10.1177/02537176231164651

37.

O’Shea

O’Donovan

Sheehan

, et al. Implementing evidence-based quality improvement in health care quality and patient safety and clinical research programs. Qual Manag Health Care 2025; 35: 94–103. https://doi.org/10.1097/QMH.0000000000000520

38.

Moret

Nguyen

Pillet

, et al. Improvement of psychometric properties of a scale measuring inpatient satisfaction with care: a better response rate and a reduction of the ceiling effect. BMC Health Serv Res 2007; 7: 197. https://doi.org/10.1186/1472-6963-7-197

39.

Analysis of differential item functioning. Australian Council for Educational Research, 2006.

40.

Greenleaf

EA.

Measuring extreme response style. Public Opin Q Res 1992; 56: 328–351.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.14 MB