Sage Journals: Discover world-class research

Abstract

The traditional paper-based Q method was introduced to the social sciences in the 1930s. However, despite its unique capability for measuring peoples’ subjective opinions, the method has not been broadly employed by researchers. Moreover, despite recent efforts to administer the Q method surveys via the Internet, they have been mostly limited to the usage of textual prompts. Besides, there is no concrete evidence on the usability, reliability, and agreement analysis between the online Q method research tools and the traditional paper-based Q method. Therefore, the visual Q method online research tool was developed to resolve these deficiencies and add new dimensions (audio and video) to the Q method research. The development of the visual Q method platform was a three-stage process that commenced with a usability test on 31 visitors to a local museum. Second, a test–retest reliability analysis with a convenient sample of 37 students resulted in a high reliability coefficient superior to that of the traditional paper-based Q method. Third, the analysis of agreement between the visual Q method and the traditional paper-based Q method on a sample of 10 students revealed substantial similarities between their generated (V)Q sorts. Overall, the study findings provide substantial evidence on the usability and rigor of the visual Q method online research tool.

Keywords

Q methodology visual Q method usability reliability methods agreement

Introduction

Q methodology was invented by William Stephenson in the mid-1930s to scientifically measure self-referent viewpoints, mindsets, and attitudes (Brown, 1993; Good, 2010). Q methodology is both a constructivist and a social constructionist research approach, whether it focuses on the individual or social construction of beliefs, knowledge, or discourse about a subject (Watts and Stenner, 2005). This method has received increasing attention in the social sciences because of its strong statistical component for the study of human subjectivity (Ellingsen et al., 2010). Further, this method can be used to cluster like-minded people about a non-cognitive topic through by-person factor analysis in contrast with the more commonly used by-variable factor analysis. Each factor represents a unique viewpoint about the subject of investigation.

The conduct of Q method research encapsulates eight stages: design, preparation of survey material, deliverance, data elicitation, data record, analysis, interpretation, and write-up. The design stage entails the formation of universe of opinions surrounding the topic, that is, concourse of communication (COC), reducing the COC to a representative sample (Q sample), selecting the research participants (P set), designing a graded spectrum of cells (grid) on top of which Q sample items (henceforth Q items) are sorted (Q sorting), and outlining the condition of instruction. The preparation stage entails printing the grid, condition of instruction, score sheet, and the Q items on pieces of paper. The deliverance stage encompasses the arrangement of the survey material on a table, in a convenient time and place to both researcher and participants.

In the Q method, securing the diversity of perspectives among population of interest is more relevant than the recruitment of a large and representative sample prevalent in quantitative research methods (Gabor, 2013;Watts and Stenner, 2005). However, in the traditional paper-based Q method (TPQM) the face-to-face interactions with participants and usage of paper-based tools intensify its operational difficulties and resource intensiveness. Even alternative methods such as mail deliverance of the Q survey materials have been unwieldy and confusing for participants (Klooster et al., 2008).

Nonetheless, some researchers believe that interaction of researcher and participant is advantageous to comprehending the participants’ perspectives. In other words, the researcher promotes a “feeling for organism” with participants, which is only attainable through the researcher’s interaction and accumulative acquaintance with participants (Brown, 1989; Previte et al., 2007). Moreover, in contrast to Internet-based surveys, in TPQM, researchers can augment participants’ involvement, commitment, and contemplativeness through one-to-one interactions. In this vein, some studies reflect on the resultant low intellectual commitment, cognitive engagement, and resultant data quality with online surveys (Couper et al., 2001).

While this interaction may assist the researcher in interpretation and analysis of findings, it also reinforces the researcher’s influence on the participants (Cross, 2005; Scott et al., 2014). Simply put, the researchers’ close interactions with participants forge a sense of being observed among participants, leading to their “satisficing” responses and “failing to differentiate among a set of diverse objects in ratings” (Cross, 2005; Krosnick, 1991: 213). Also, in sensitive topics such as sexual abuse, crime, or political views, participants might believe that disclosing their true perspectives would potentially threaten their status or image, and thereby, attempt to impress the researcher by complying with well-accepted notions and satisficing behavior (Scott et al., 2014).

Moreover, the influence of the researcher may undermine the reproducibility of findings, such that other researchers may not reach the same conclusions if follow the same procedures (Goodman et al., 2016; Guillemin and Gillam, 2004). Although reproducibility is the degree of measurement variation in unstable conditions such as change of researchers, reliability is the capacity of a measurement tool to differentiate between respondents under controlled conditions (Berchtold, 2016). Nonetheless, Watts and Stenner (2005) assert that the reliability in TPQM reflects more of the stability of opinions over time rather than the stability of the instrument itself.

Scholars assert that the range of perspectives on any topic is stable (Brown, 1971; Nicholas, 2011). Therefore, the range of perspectives can be used as a benchmark for testing the reliability of an instrument by assessing its stability over repeated measures (taking a Q survey twice). Nonetheless, no scholar has examined if the individuals who hold certain viewpoints (henceforth view-holders) would continue over repeated measures. In other words, will they share the same viewpoints with the same people about a topic over repeated measures? This understanding is fundamental, as in Q methodology the determination of viewholders is as crucial as identification of the viewpoints.

The agreement between two methods (methods agreement) is the extent to which identical findings can be obtained using two different measurement tools (Bland and Altman, 2010). In contrast to reliability, methods agreement intends to identify systematic errors rather than random errors. In this vein, the perfect agreement means similar results, whereas perfect reliability means high correlation (Thomas, 2017). The methods agreement is mainly conducted to reveal whether a new method can replace an old and gold-standard method and if a single method is rigorous enough to detect changes in participants’ opinions over time (Berchtold, 2016). However, if random error is larger than the change over time, the measurement tool would be unable to detect longitudinal changes (Berchtold, 2016). Hence, reliability underpins and should antecede the methods agreement analysis.

Several new tools have been recently developed to assist researchers in standardizing the research process and overcoming previous limitations such as the Q-assessor (Reber et al., 2000), FlashQ (Hackert and Braehler, 2007), and the Q-sortouch (Pruneddu, 2018). Nonetheless, none of them have undergone methods agreement analysis with the TPQM, nor does there exist any identified robust evaluation of their reliability.

The visual Q method (VQMethod) online research tool was developed to diminish the resource-ladenness of TPQM and to enhance its reliability and parsimony (Nazariadli, 2017). Moreover, the VQMethod has made the Q method research accessible and usable to a broad spectrum of researchers, due to the addition of new dimensions in the world of Q methodology research: video and audio (as Q items). For this, the purpose of this study is set out to evaluate the usability and psychometric properties of the VQMethod research tool in contrast to the TPQM.

Therefore, in a first stage, the usability (ease of use without need for any technical guidance/training) of VQ method survey (henceforth VQ survey) interface was tested on real users, with indicators of efficiency, effectiveness, and user satisfaction. Furthermore, in a second stage, the VQ survey interface underwent a test–retest reliability analysis with two measurement criteria: perspective overlaps, viewpoint stability and viewholder stability. Also, in this stage, the effect of time in taking VQ surveys on the reliability of VQ sorts was investigated. In a third stage, the methods agreement analysis was performed between the VQMethod and the TPQM to determine if there is any systematic bias (difference) between the two methods.

Hence, the following research questions and hypotheses guided this study:

Stage one

Q₁: To what extent is the VQ survey interface usable for average users?

Stage two

H₁. In VQMethod, the viewpoints remain stable at repeated measures.

H₂. In VQMethod, the viewholders remain stable after repeated measures.

H₃. In VQMethod, the time taken for completing a VQ survey significantly predicts the reliability of VQ sorts.

Stage three

Q₂: To what extent do the TPQM and the VQMethod generate identical (V)Q sorts?

Stage one: examining the usability of the VQMethod research tool

Usability studies are mainly conducted to assess how well a new method or system can be used by people with no formal training (Holzinger, 2005). Usability measures the effectiveness and efficiency (overall performance) of a product, and the degree of end-user satisfaction (ISO 9241-11). Effectiveness addresses the complete accomplishment of the given tasks, efficiency is the needed time and resources, and satisfaction refers to both the overall and task-specific ease of use.

Usability laboratory tests and post-test questionnaire on the VQMethod

In the usability laboratory test, the potential end-users are invited to a laboratory wherein they perform specific tasks without realizing that they are being observed. The method can be conducted through one-way mirror unobtrusive observation, that is, the observation room is divided from the experiment room (Kaikkonen et al., 2005). This strategy keeps the researcher from influencing experimenters’ performances (Lin et al., 1997). Post-test questionnaires or interviews can complement the usability tests of a new system by directly inquiring users about their trouble with the system and potential resolution.

Sample and data collection

The usability laboratory and questionnaire tests were conducted at the Visual Investigate Lab, in the Museum of Natural Sciences in Raleigh, North Carolina. It was intended to observe how potential survey-takers operate on the VQ survey interface, thereby, finding the problems firsthand. This test was then followed by a post-test questionnaire to measure the participants’ task-specific level of satisfaction, uncover their difficulties, and apprehend how they prefer to solve their potential problems.

The museum visitors were intercepted in front of Visual Investigate Lab and informed about the research topic, estimated time of completion, required tasks, and the incentive (US$15 Amazon gift certificate). Among 35 people who were approached, 31 agreed to participate. The participants were purposefully selected to represent different age groups (21–73) and races (African American, Asian, Hispanic, and Caucasian). Nonetheless, the sample resulted in a somewhat unbalanced gender split (64.3% female and 35.7% male).

The usability tests at the museum were conducted in five sessions each lasting between 3 and 6 hours. A VQ survey was set up on a desktop computer connected to the Internet (with i3 Intel processor, Microsoft Windows 10 Professional, 4 GB memory RAM, GeForceGTX1050graphic card). The computer was equipped with two vertically cascaded 19 in. monitors that enabled the researcher to witness the experimenters’ interact with the VQ survey via the upper monitor. Resembling a one-way mirror, the experiment room was set up in a fishbowl laboratory setting surrounded by glass walls (Kaikkonen et al., 2005).

Next, participants were directed to the lab, sat behind the monitors, instructed about the study, and briefed about the post-test questionnaire. The researcher stood behind the glass wall to unobtrusively monitor the participants’ operations and to promote a normal working condition. For this, he oftentimes swapped his position between inside and outside of the lab to make his presence unobtrusive to participants (Figure 1) (Holzinger, 2005). The researcher instructed the participants to wave their hands if they were puzzled or faced any errors anytime during the test.

Figure 1.

The unobtrusive observation in a fishbowl laboratory setting.

In the body of the VQ survey, first, the participants affirmed their consent in taking the survey. Second, they sorted a stack of 40 images into three boxes, that is, most like, neutral, and most unlike their views about rural life. Third, they dragged and dropped the images from the three boxes into a graded grid with a forced distribution of −4(3), −3(3), −2 (5), −1(6), 0(6),+1(6),+2(5),+3(3), and +4(3)¹ from most like to most unlike their expectations from rural life. Fourth, they reasoned about their placements of images in the −4 and+4 columns, and fifth, they responded to some general demographic questions.

After completing the online survey, each participant was handed the post-test questionnaire. The post-test questionnaire contained printed screenshots of five steps of the VQ survey. It inquired how difficult/easy it was to accomplish each task, through a five-point Likert-type scale Single Ease Question (SEQ) (Sauro and Dumas, 2009). The participants were then asked to draw or write on the screenshots and propose their solution to the encountered problems, if any. The solutions further fostered the participatory design of the final product (Spinuzzi, 2005), that is, the VQ survey interface. In addition, the success rate in completing the survey (measure of effectiveness) was recorded, system errors were captured, and field notes were taken.

Data analysis and findings

Participants mostly expressed their uneasiness with the forced distribution grid, the small font size, and the lack of initial familiarity with the Q items (images) before Q sorting. They also expressed their uneasiness with the inability to move the images between the grid and the three boxes in addition to the researcher’s sole interest in questioning about specific images placed at the edges of the grid. In addition, they both illustrated and stated that usage of thumbnails makes image recalls easier, saves space, and enhances faster comparisons between images (Figure 2). Last but not least, they requested a status tracker, to display their progress in taking the survey.

Figure 2.

Participants’ drawings on the post-test questionnaire.

Furthermore, the post-test questionnaire also revealed that on a scale of one (easy) to five (difficult) the participants found Q sorting more difficult (M = 2.05, SD = .99) than pre-sorting (M = 1.47, SD = .69), reflecting about images (M = 1.35, SD = .83), and answering the complementary questionnaire (M = 1.05, SD = .12).

VQ survey interface improvements

Overall, the findings were weighed against the Q methodology rules and logics and accordingly the most feasible solutions were schemed and implemented before the reliability and methods agreement analysis in stage 2. Figures 3 to 5 showcase some main improvements on the VQ survey interface.

Figure 3.

Familiarization with Q items and the topic.

Figure 4.

Dragging and dropping images into the three boxes.

Figure 5.

Q sorting.

Among other improvements, a familiarization step was added right after the initial consent to help the participants develop an overall understanding of the topic and the Q items. Progress bar, image countdown, and some guiding messages were added to the body of the survey to highlight the survey status and prevent the disorientation of participants. Clickable thumbnails substituted big pictures, to demonstrate more pictures together and make their comparative judgment easier. The drag and drop of Q items between and within the three boxes and grid cells were improved in a fashion to simulate sorting on a sheet of paper. The option of including video instead of textual instructions to the body of the survey was to moderate the pure detachment of researcher, giving him or her an opportunity to convey the expected duties better.

Stage two: reliability analysis

The purpose of this stage was to examine the reliability of the VQMethod online research tool through a test–retest analysis. The Q method research seeks to identify and distinguish between extant viewpoints surrounding a topic. Nonetheless, this recognition will be invalidated if the individual Q sorts are unreliable.

The composite reliability of Q sort items (K) affects the factor reliability (r_xx) (P = number of Q sorts significantly loaded under factor), which in turn sets a basis for the calculation of the standard error of factor scores (SE_fs) (S_x = standard deviation (SD) of Q sorts). Next, the SE_fs of each factor is used in the calculation of standard error of the differences (SED_x–y) between two factors. The SED_x–y multiplied by significance level, that is, 1.96 (p < .05) or 2.58 (p < .01) and rounded to the nearest whole number will determine the Q items which distinguish one viewpoint from others (Figure 6) (McKeown and Thomas, 2013).

Figure 6.

A series of dependent formulas to identify distinguishing Q items.

Sample and data collection

To evaluate the reliability of the VQMethod online research tool, a convenient sample of 37 undergraduate students were recruited for their voluntary participation in the study. It was stressed that the research would be anonymous and voluntary and would not affect their grades and/or student–professor relationship. They were aged between 18 and 24, comprised 19 females and 16 males, and two students who did not specify their gender. It constituted 3 African-Americans, 1 Asian, 30 Caucasians, 2 Hispanics, and 1 Palestinian. They took two similar VQ surveys in two phases 1 week apart (Times A and B) in a classroom setting. They either used their own laptops (mostly MacBook) or their university laptops (Hp 14″ Chromebook).

The VQMethod automatically saved the total and stepwise time for each survey. The relative efficiency of the VQ survey was calculated through the ratio of the time taken by the users who completed the task over the total time taken by all the users (Seffah et al., 2006). The effectiveness of the VQMethod was measured by computing the ratio of completed surveys over the total number of surveys multiplied by 100 (success rate).

The participants’ overall satisfaction with the VQ survey interface was measured with the 10-item, 5-point Likert-type System Usability Scale (SUS), including 10-item, 5-point Likert-type scale questions adapted from the US Patent and Trademark Office to be distributed to the students after completing the VQ survey. The SUS developed by Brooke (1996) is a reliable and valid measure of user satisfaction (Kortum and Bangor, 2013), which promotes the comparison of the usability of different systems by generating a 0–100 score (Kortum and Bangor, 2013).

Data analysis

Three measures were considered for assessing the reliability of the VQMethod: the composite reliability of the VQ sorts, stability of participants’ viewpoints, and stability of number of factors (viewpoints).

Each Q item having an identification number, their placements in the grid at Times A and B for each participant were recorded and exported into an Excel sheet by the VQMethod platform (Table 1). Next, all the VQ sort data at Times A and B were combined and entered in the KenQ online data analysis software. A series of Pearson product–moment correlations were performed, and a matrix of inter- and intra-participant correlations was created, cross-tabulating the data from Times of A and B.

Table 1.

The Q scores for each participant at phases A and B.

Participant	Phase A					Phase B
	Image no.					Image no.
	1	2	3		40	1	2	3		40
Pa1	3	4	1	–	−3	3	4	3	–	−3
Pa2	3	−2	3	–	1	2	0	2	–	1
Pa3	4	0	−3		1	3	1	−3	–	−2
Pa4	4	−3	4	–	−1	4	−4	3	–	−1
Pa37	2	−4	3		−2	2	−3	2		−4

Subsequently, they formed a matrix sized 74 × 74: (total number of VQ sorts at Times A and B) × (total number of VQ sorts at Times A and B). Next, the mean correlation between all different pairs at Time A ( ${\bar{r}}_{a a}$ ), at Time B ( ${\bar{r}}_{b b}$ ), and between Times A and B ( ${\bar{r}}_{a b}$ ) were computed. The ${\bar{r}}_{a a}, {\bar{r}}_{b b}, and {\bar{r}}_{a b}$ represent the average correlations above diagonal at the upper left, lower right, and upper right quadrants of Table 2, respectively.

Table 2.

The test–retest correlation matrix.

Pa1: participant #1 at phase A; Pb1: participant #1 at phase B.

The correlations between the same participants at the crossed diagonals werenot used for calculation of composite reliability of VQ sorts.

In testing hypothesis one, the factor stability was measured by comparing the number of factors (viewpoints) extracted from factor analysis of VQ sorts. For this, both combining and separating the VQ sort data at Times A and B were examined. In testing hypothesis two, the patterns of defining sorts (participants’ VQ sorts significantly correlating with a factor) at Times A and B were compared using the McNemar’s test of significance. Finally, in testing hypothesis three, a linear regression analysis was conducted to examine if the time spent on VQ surveys predicts the reliability of the resultant VQ sorts.

Hence, two separate factor analyses were performed at Times A and B, via principal component analysis as a method of factor extraction and varimax rotation (Stephenson, 1935). Eight factors with eigenvalues above one were kept for further consideration (Shinebourne, 2009). If the number of defining sorts per factor was equal to or above 2, they were kept for further analysis (Brown, 1980).

Findings

Composite reliability of VQ sorts

The composite reliability of the VQ sorts was calculated via Spearman’s (1913) formula for the correlation of sums or differences (p = number of sorts at Time A, q = number of sorts at Time B) (Brown, 1971). The correlations between all different pairs averaged at .42, .41, and .38 at Times A ( ${\bar{r}}_{a a}$ ) and B ( ${\bar{r}}_{b b}$ ) and between Times A and B ( ${\bar{r}}_{a b}$ ), respectively, as

\begin{array}{l} r (a_{1} + a_{2} + \dots + a_{p})(b_{1} + b_{2} + \dots + b_{p}) = \\ \frac{(\sqrt{p q}) ({\bar{r}}_{a b})}{\sqrt{1 + (p - 1) {\bar{r}}_{a a}} \sqrt{1 + (q - 1) {\bar{r}}_{b b}}} \end{array}

r_{composite} = \frac{\sqrt{37 \times 37} (0.38)}{\sqrt{1 + (37 - 1) \times 0.42} \sqrt{1 + (37 - 1) \times 0.41}} = . 88

Factor stability

Both the combined and the separate factor analysis of the VQ sorts at Times A and B generated a similar number of factors (three) with composite factor reliabilities of over 95%. A factor should have at least five defining sorts, eigenvalue above one, and seem theoretically sound (Brown, 2002). The VQ sort factor loadings that exceeded $. 37 (S D / \sqrt N = 2.27 / \sqrt 37 = . 374)$ were considered significant (defining sort) and flagged under the corresponding factor. Hence, at this point, hypothesis one was accepted since the number of factors, that is, viewpoints remained stable at repeated measures (Tables 3 and 4).

Table 3.

Generated factors at Times A and B.

	Time A			Time B			Times A and B combined
	Factor1	Factor2	Factor3	Factor1	Factor2	Factor3	Factor1	Factor2	Factor3
No. of defining sorts	9	12	10	18	5	8	25	12	25
Explained variance	17%	16%	17%	23%	15%	16%	18%	12%	20%
Reliability	0.976	0.98	0.978	0.986	0.952	0.97	0.99	0.98	0.99
Correlations
Factor1	1	0.3952	0.5463	1	0.392	0.4542	1	0.4838	0.456
Factor2	0.3952	1	0.5547	0.392	1	0.222	0.4838	1	0.2466
Factor3	0.5463	0.5547	1	0.4542	0.222	1	0.456	0.2466	1

Table 4.

Comparing the pattern of defining sorts, separate and combined.

The blue and copper colored cells identify the VQ sorts, which were loaded under only one and different factors, respectively. The red cells identify VQ sorts loaded under the same factor. The red and green arrows stand for the times below the first and above the third quantiles, respectively. The yellow circle identifies between one and third quantiles.

In analyzing the pattern of defining VQ sorts, the extracted factors and constituent defining sorts were positioned side by side at Times A and B. Each row represented a person’s correlation with the factor (factor loading). The visual comparison of the pattern of defining sorts (highlighted by red cells) revealed marked discrepancies between Times A and B when VQ sorts were separately analyzed. However, the pattern of defining VQ sorts after the factor analysis of combined VQ sorts revealed high similarity (Table 4).

For the combined model, McNemar’s test of significance for the within-subject binary dependent variable (1 = loaded under the same factor and 0 = loaded under a different factor) was used to examine the difference in the pattern of participants’ factor membership (Table 5). The results indicate that the participants’ factor membership did not change significantly over the period of 1 week between two tests (p = .80). Hence hypothesis two was conditionally accepted that the viewholders remain alike, only if the VQ sorts are combined, and factor analyzed.

Table 5.

McNemar’s test of difference between combined VQ sorts.

Time A	Time B
	Not sig. loaded	Sig. loaded
Not sig. loaded	72	9
Sig. loaded	7	23
Exact sig. (two-tailed)	.804

Effect of time on reliability of VQ sorts

In testing hypothesis three, a linear regression was run to recognize the effect of the average time taking the surveys (independent variable) on the correlation of VQ sorts at Times A and B (dependent variable) as a measure of reliability. On average, participants needed twice as much time (in seconds) to accomplish the VQ survey at Time A (M = 779.78, SD = 158.45) than Time B (M = 412.19, SD = 78.47). Participants on average spent more time on Step2 (M = 160.22, SD = 47.36), then on Step 3 (M = 222.95, SD = 83.34), then on Step 4 (M = 149.87, SD = 138.28), and on Step 5 (M = 47.41, SD = 20.21). Time for Step 1was not recorded.

To assess linearity, some preconditions were met. The scatterplot of the average total time taking the surveys against the correlation of VQ sorts with superimposed regression line indicated a positive linear relationship. Also, there was no observable correlation in the residual-by-predicted plot—the spread of residuals around zero was balanced (Figure 7). Dividing skewness by standard error of correlations (−.752/.398 = 1.89) and mean time (−.300/.398 = .75) being less than ±1.96 (p < .05) indicated an acceptable normal distribution (Table 6) (Mertler and Reinhart, 2016). The box plot inspection of the correlation (dependent variable) showed the presence of two outliers (r_p1 = .13 and r_p8 = .20) which were removed from the analysis. Furthermore, the box plot inspection of mean time (independent variable) suggested the removal of P36 for being an outlier in Steps 1 and 2 (Figure 8).

Figure 7.

The identification of outliers.

Table 6.

Descriptive statistics of the dependent and independent variables.

Variable	Mean (SD)	Median	Min.	Max.	Skewness (SE)	Kurtosis (SE)
Correlation	.66 (.17)	.73	.26	.89	−.752 (.398)	.063 (.778)
Mean time (Seconds)	573.48	578.50	367	755	−.300 (.398)	−.140 (778)

Figure 8.

Residual by predicted plot (left). Average time–correlation scatterplot (right).

The results show that the average total time taking two surveys did not significantly predict the correlation of VQ sorts (reliability estimate), F(1, 35) = 1.772, p < .05. The average time accounted for 5.1% of the variation in VQ sorts’ correlations with adjusted R² = 2.2% (Tables 7 and 8, which according to Cohen (1988) is a small effect size (hypothesis three rejected).

Table 7.

ANOVA.

Model	Sum of squares	df	Mean square	F	Sig.	R ²	Adjusted R²
Regression	.055	1	.055	1.772	.192	.051	.022
Residual	1.019	33	.031
Mean time	1.073	34

Table 8.

Simple regression model for correlation coefficient of VQ sorts (n = 37).

	B (95%CI)	SE	β	t	Sig.
Constant	.423 (−0.048,0.798)	.184		2.294	.028
Time (minute)	.000 (.000,0.001)	.000	−.226	1.331	.192

B: unstandardized beta coefficient; SE: standard error, β: standardized beta coefficient, t: t-test statistic; Sig: significant value.

Overall satisfaction, effectiveness, and relative efficiency

In this test, all the students were able to finish the survey. Therefore, 100% relative efficiency and effectiveness were recorded for the VQ survey. The results of the SUS questionnaire on the VQ survey interface uncovered its above-the-average score (above 68%)—a cutoff score for comparing the usability of different systems (M = 69.31, SD = 10.65) (Alomar et al., 2016). This finding was notable, provided that Kortum and Bangor (2013) in their evaluation of 14 everyday products reported an average SUS score of 56.5% for Excel, 74.6% for PowerPoint, 78.5% for iPhone, and 81.8% for Amazon.

The most impactful item in improving the participants’ overall satisfaction was their sufficiency in completing the VQ survey (4th question) (M = 4.53, SD = .68). Indeed, this finding is highly promising for an online research tool whose survey-takers cannot interact with the researcher or technician for assistance. In contrast, they gave the lowest scores to the first question (M = 2.88, SD = .79), which can be potentially attributed to their disinterestedness and/or sense of burden in retaking a survey in a short period (Table 9).

Table 9.

The SUS scores.

Questions	M	SD	SUS
1. I think that I would like to take similarly designed online surveys	2.88	0.79	51.49
2. The online survey was unnecessarily complex	3.28	1.06	58.04
3. I thought it was easy to perform on the online survey	3.60	0.90	62.44
4. I think that I could take the survey without the support of a technical person.	4.53	0.68	85.42
5. I found the various functions in the online survey were well integrated	3.63	0.84	69.65
6. I thought there was too much inconsistency in the online survey	3.78	0.80	68.75
7. I imagine that most people would learn to take the online survey very quickly	4.10	0.78	75.49
8. I found the online survey very intuitive	3.65	0.70	67.81
9. I felt very confident to perform on the online survey	3.90	0.90	69.94
10. Icould take the online survey without having to learn anything new	4.40	0.59	82.24

Stage three: methods agreement analysis

The purpose of this section was to uncover if the VQMethod can substitute the TPQM, that is, to provide identical results to the ones from TPQM. For this, a methods agreement analysis was performed to contrast the patterns of Q sorts and VQ sorts of 10 participants over two identical surveys and within a time frame of 1 week.

The methods agreement analysis sheds light on the extent to which one method systematically overestimates or underestimates the results of another method. Although the accuracy of beliefs (validity) is irrelevant to the Q method research (Kampen and Tamás, 2014), the agreement between the methods elucidates that Q method researchers can alternate between the VQMethod and the TPQM by knowing possible differences in their Q sort patterns (Berchtold, 2016).

Sample and data collection

A convenient sample of 10 students aged 18–24 was recruited to take a Q survey and a VQ survey in two phases. Five participants randomly started from the Q survey and five participants from the VQ survey. The Q survey was completed in a controlled private room on a rectangular table (Figure 9) and the VQ survey at their preferred location with access to the Internet. For each participant, the two rounds were completed within a 1-week to 10-day time frame. The participants taking the Q surveys were given a printed step-by-step instruction and consent form, identical to that of the VQ survey. The printed grid with a distribution of −4(3), −3(3), −2 (5), −1(6), 0(6), +1(6), +2(5), +3(3), +4(3) and three above-positioned boxes, sized a paper sheet of 28 × 32 in. The Q items were evenly sized at 4 × 4 in.

Figure 9.

Student taking the paper-based survey.

Data analysis

The Bland–Altman (BA) diagram has been extensively used for analysis of the agreement between two measurement methods (Bunce, 2009). The resulting graph is a scatterplot, in which the Y-axis shows the difference and the X-axis represents the average of paired measurements. The BA diagram suggests that the 95% of differences should lie within ±2 SD of the mean differences. However, the BA diagram does not determine whether the agreement is sufficient or not. Nonetheless, it is possible to say that the difference is significant if the line of equality (Y = 0) does not lie between the confidence interval of mean differences.

Findings

The positioning of Q items in the grid (henceforth Q score) was recorded, and the differences and the mean Q scores were computed (Table 10). As a foundational prerequisite for the methods agreement analysis, for each method, the scatterplot of the mean and SD of Q scores at repeated measures were visually checked. The inspection did not demonstrate any pattern of relationship (Figure 10).

Table 10.

The schematic data entry for the Bland–Altman diagram.

Participant sorting Q items	TPQM	VQMethod	Difference	Mean
P₁Q₁	3	4	−1	3.5
P₂Q₁	4	−1	5	2.5
P3Q₁	3	0	3	1.5
–	–	–	–	–
P₁Q₂	4	4	0	4
–	–	–	–	–
–	–	–	–	–
P₁Q₄₀	3	4	−1	3.5
P₂Q₄₀	2	4	−2	3
P₃Q₄₀	0	2	−2	1
–	–	–	–	–
P₁₀Q₄₀	4	3	1	3.5
Sum	0	0	~0	~0

P₁Q₁: participant number one’s scores for Q item 1.

Figure 10.

The mean-SD plot for TPQM (left) and VQMethod (right).

Next, the mean of the mean, and the mean of differences of Q scores were, respectively, plotted on the X and Y axes, shaping the BA diagram (Figure 11). The diagram shows that the distribution of mean differences between the Q scores of TPQM and VQMethod is normal. Nonetheless, given that the sum of Q scores in a symmetrical distribution is always zero, both the sum of differences (M = −.0025, SD = 2.17) and the means approximated to zero (M = .01, SD = 1.97).

Figure 11.

The Bland–Altman diagram.

The SD of differences was computed to generate the 95% limits of agreement between two methods: mean difference ±1.96 × (SD of differences) = 0 ± 1.96 × .6 = ± 1.18. This means that, on average, 95% of Q score differences between the two methods reside between ±1.18; 95% confidence interval (CI) [1.58, 0.88] lower limit and 95% CI [.89, 1.58] higher limit. The apparent trend in the diagram with more data points in the middle than on the sides is affected by the symmetrical bell-shaped distribution of the grid, and is not an indicator of a systematic bias.

Nonetheless, as the Q score ranges are restricted between −4 and +4, it would be almost impossible that correlation between the Q scores of two methods is affected by the data range. Hence, we performed a least linear regression to identify convergence of (V)Q scores between the two methods. The 45° diagonal line in Figure 12 is the signage of a perfect equality between the two variables. The closeness of this line with the regression line (red line) suggests a significant linear relationship between the TPQM and the VQMethod. The analysis showed that the correlation coefficient is significantly different from zero (β = −.0.9287, t (399) = −1.06, p = .29). Equation: VQMethod = 0.00357 + 0.9287 TPQM (Table 11).

Figure 12.

The line of equality between two methods.

Table 11.

Ordinary least-squares fit.

Parameter	β	95% CI		SE	t statistic	p value
Intercept	0.0	−0.2	to 0.2	0.10	0.04	0.9715
Slope	0.9287	0.7925	to 1.065	0.067286	−1.06	0.2960

H0: α = 0. β = 1. The intercept is equal to 0. The slope is equal to 1.

H1: α≠0. β≠1. The intercept is not equal to 0. The slope is not equal to 1.

Discussion and conclusion

The purpose of this study was to evaluate the usability and psychometric properties of the VQMethod research tool in contrast with the TPQM. This study was conducted in three stages:(1) a usability test informing the improvement of VQMethod’s survey interface, (2) test–retest reliability analysis, and (3) methods agreement analysis.

In stage one, via observation of 31 potential survey-takers at a local museum of natural sciences, several problems were identified. Moreover, the participants sketched their several design ideas and explained the degree of difficulty they experienced at each stage. While the survey-takers’ high level of satisfaction was substantial, the authors at some point attributed it to the game-like and visual elements, and to the lively and interactive nature of VQ surveys (Sleep and Puleston, 2011). Likewise, some participants even compared it with solitaire games that are built upon the drag and drop features. This finding is similar to that of Guin, Baker, Mechling, and Ruyle (2012) that the functionally visual and gamified treatments produce more satisfaction among research participants. However, it should be noted that an overly rich media design may in turn increase respondent’s confusion and lead them to opt out (Malinoff, 2010).

The traditional surveys reminded the participants about a series of repetitious questions leading to their boredom, sense of burden, and their impatience (Crawford et al., 2001). However, the participants were surprised to see how a survey can request minimal information from the participants and even engage them in a fun and game-like manner. After the usability tests were conducted, participants’ opinions on how to improve the interface reached a level of saturation. Subsequently, the VQ survey interface was improved, and the reliability and methods agreement tests were put forward.

In stage two, the test–retest analysis of the VQ surveys revealed that the composite reliability estimate of VQ sorts is higher than that of TPQM. Simply put, the participants tend to generate VQ sort patterns with VQMethod more precisely than with TPQM. Moreover, the number of viewpoints and the pattern of defining sorts in repeated measures as indicators of reliability remained stable. Nonetheless, the latter only was approved when the VQ sort data at Times A and B were blended rather than separately analyzed (hypothesis one entirely and hypothesis two partially accepted). This blending method of analysis would be highly suitable when contrasting the viewpoints of two populations are at play, that is, to blend the Q sort data of two samples, rather than juxtaposing them. Otherwise, the interpretations of differences and similarities will be arbitrary and even faulty.

Nonetheless, some scholars have proposed second-order factor analysis as a method of comparison between two groups (Dennis, 1990; Rhoads and Brown, 2002). For example, the factor arrays from group A (constituting factors 1, 2, and 3) and group B (constituting factors 4, 5, and 6), are combined, and factor analyzed (refactored) to determine if the factors within the two separate groups are comparable. That is, if second-order factor K contains 1 and 4, second-order factor L contains factors 2 and 5, and second-factor M contains factors 3 and 6. However, as it is evident, this method then gets us back to the first-order factor analysis and side-by-side interpretation of the data, which is cautioned by this study.

The average time spent taking the VQ survey did not significantly predict the reliability of VQ sorts (hypothesis three was rejected). This finding is substantial given that online surveys have a high potential to decrease respondents’ cognitive engagement, and increase their boredom and random responding (Guin et al., 2012). In stage three, the methods agreement analysis showed that on average the maximum difference between the VQ scores and the Q scores is 1.18, that is, somewhat larger than a one-column difference. However, it remains unclear how significantly this difference would affect data analysis and interpretations in Q method research. Nonetheless, the close match of the regression line with the 45° line of equality demonstrates the high correlation between the VQ scores and Q scores.

Furthermore, the visual analysis of the BA diagram illuminated its potential for diagnosing the Q items (image in this study). For example, the reliability of the Q items can be investigated by asking a number of participants to take a Q survey twice, then calculating the mean and SD of the Q scores to inform the BA diagram. If any of the Q items are placed above the 95% limits of agreement, there would be something spoiling the Q item, such as bad wording, unclear or confusing image, and so on. Then the Q items can be revised or replaced. Moreover, the BA diagram can be potentially used in longitudinal research, in discovering participants’ changing attitudes over the Q items. For this, unimodal distribution of the grid can better visually demonstrate the increase/decrease in the participants’ Q scores over time, since the BA diagram’s shape is reasonably influenced by the shape of the grid.

In sum, the VQMethod online research tool proved to be highly usable and rigorous in the conduct of the visual Q method research, by providing reliable data which highly converges with the TPQM.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes

Author biographies

Shahab Nazariadli is a Research Coordinator at Memphis Tourism, the official destination marketing organization for Memphis, Tennessee and Shelby County. His interests hover around visual research methods in tourism, as well as leveraging tourism as a vehicle for empowerment of disenfranchised and alienated segments of societies.

Duarte B Morais examines how tourism microentrepreneurship can be used by host communities to break away from long-standing hegemonies and gain self-determination. Morais is an Associate Professor at NC State University, USA and the University of Johannesburg, South Africa; he is the Lead In(ve)stigator of People-First Tourism Lab and the CEO of P1t Inc.

Stacy Supak is interested in leveraging geospatial analytics to support park and protected land management decisions. She has previously published work on topics including geology, geology tourism, open-source web-based mapping, geospatial analytics & marketing for tourism destinations. As a Teaching Assistant Professor at North Carolina State University in the Center for Geospatial Analytics, she educates both graduate and undergraduate students.

Perver K Baran uses immersive virtual environments to understand how urban green and open spaces support physical activity, psychological restoration, safety, and park use. Other interests include 3-D city modeling for urban planning, urban design, and innovative public participation methods. Baran is a Teaching Associate Professor in the Center for Geospatial Analytics and Department of Parks Recreation and Tourism Management, and an affiliated Faculty with the College of Design’s Ph.D. Program at North Carolina State University.

Kyle S Bunds is an Assistant Professor of Sport and Sustainable Community Development at NC State University. Bunds examines sports and recreations role in long-term sustainable solutions to ecological crises, political and communicative processes of development, and social class division.

References

Alomar

Almobarak

Alkoblan

et al . (2016) Usability engineering of agile software project management tools. In: International Conference of Design, User Experience, and Usability, Toronto, 17–22 July. Cham: Springer, pp. 197–208.

Berchtold

(2016) Test-retest: Agreement or reliability? Methodological Innovations 9: 1–7.

Bland

Altman

(2010) Statistical methods for assessing agreement between two methods of clinical measurement. International Journal of Nursing Studies 47(8): 931–936.

Brooke

(1996) SUS-A quick and dirty usability scale. In: Jordan

Thomas

Weerdmeester

et al . (eds) Usability Evaluation in Industry. London: Taylor & Francis, pp. 189–194.

Brown

(1971) The forced-free distinction in Q technique. Journal of Educational Measurement 8: 283–287.

Brown

(1980) Political Subjectivity: Applications of Q Methodology in Political Science. New Haven, CT: Yale University Press.

Brown

(1989) A feeling for the organism: Understanding and interpreting political subjectivity. Operant Subjectivity 12(3–4): 81–97.

Brown

(1993) A primer on Q methodology. Operant Subjectivity 16(3–4): 91–138.

Brown

(2002) Subjective behavior analysis. Operant Subjectivity 25(3/4): 148–163.

10.

Bunce

(2009) Correlation, agreement, and Bland–Altman analysis: Statistical analysis of method comparison studies. American Journal of Ophthalmology 148(1): 4–6.

11.

Cohen

(1988) Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.

12.

Couper

Traugott

Lamias

(2001) Web survey design and administration. Public Opinion Quarterly 65(2): 230–253.

13.

Crawford

Couper

Lamias

(2001) Web surveys: Perceptions of burden. Social Science Computer Review 19(2): 146–162.

14.

Cross

(2005) Exploring attitudes: The case for Q methodology. Health Education Research 20(2): 206–213.

15.

Dennis

(1990) Patients’ control and the information imperative: Clarification and confirmation. Nursing Research 39: 162–166.

16.

Ellingsen

Størksen

Stephens

(2010) Q methodology in social work research. International Journal of Social Research Methodology 13(5): 395–409.

17.

Gabor

(2013) Q methodology (Q Factor Analysis)–particularities and theoretical considerations for marketing data. International Journal of Arts and Commerce 2(4): 116–126.

18.

Good

(2010) Introduction to William Stephenson’s quest for a science of subjectivity. Psychoanalysis and History 12(2): 211–243.

19.

Goodman

Fanelli

Ioannidis

(2016) What does research reproducibility mean? Science Translational Medicine 8(341): 341–353.

20.

Guillemin

Gillam

(2004) Ethics, reflexivity, and “ethically important moments” in research. Qualitative Inquiry 10(2): 261–280.

21.

Guin

TDL

Baker

Mechling

et al . (2012) Myths and realities of respondent engagement in online surveys. International Journal of Market Research 54(5): 613–633.

22.

Hackert

Braehler

(2007) FlashQ [computer software]. Available at: http://www.hackert.biz/flashq/home

23.

Holzinger

(2005) Usability engineering methods for software developers. Communications of the ACM 48(1): 71–74.

24.

Kaikkonen

Kekäläinen

Cankar

et al . (2005) Usability Testing of Mobile Applications: A Comparison between Laboratory and Field Testing. Journal of Usability Studies 1(1): 4–16.

25.

Kampen

Tamás

(2014) Overly ambitious: Contributions and current status of Q methodology. Quality and Quantity 48: 3109–3126.

26.

Klooster

Visser

Jong

MDT

et al . (2008) Comparing two image research instruments: The Q-sort method versus the Likert attitude questionnaire. Food Quality and Preference 19(5): 511–518.

27.

Kortum

Bangor

(2013) Usability ratings for everyday products measured with the System Usability Scale. International Journal of Human-computer Interaction 29(2): 67–76.

28.

Krosnick

(1991) Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology 5(3): 213–236.

29.

Lin

Choong

Salvendy

(1997) A proposed index of usability: A method for comparing the relative usability of different software systems. Behaviour & Information Technology 16(4-5): 267–277.

30.

McKeown

Thomas

(2013) Quantitative Applications in the Social Sciences: Q methodology. Thousand Oaks, CA: SAGE.

31.

Malinoff

(2010) Sexy questions, dangerous answers. In: Proceedings of the, CASRO Technology Conference, New Orleans, LA, 24–25 February. New York: Market Strategies International.

32.

Mertler

Reinhart

(2016) Advanced and Multivariate Statistical Methods: Practical Application and Interpretation. London: Routledge.

33.

Nazariadli

(2017). VQMethod [Online research tool]. Available at: https://vqmethod.com; http://doi.org/10.17605/OSF.IO/D3WH6

34.

Nicholas

(2011) Reliability in Q methodology: A case study. In: Proceedings of the eastern education research association annual conference, Sarasota, FL, 23–26 February. Ohio: The University of Akron.

35.

Previte

Pini

Haslam-mckenzie

(2007) Q methodology and rural research. SociologiaRuralis 47(2): 135–147.

36.

Pruneddu

(2018) Q-Sortouch [computer software]. Available at: https://www.qsortouch.com/

37.

Reber

Kaufman

Cropp

(2000) Assessing Q-Assessor: A validation study of computer-based Q sorts versus paper sorts. Operant Subjectivity 23(4): 192–209.

38.

Rhoads

Brown

(2002) “Sex, lies, and videotape”: Attitudes toward the Clinton impeachment. Operant Subjectivity 25: 99–116.

39.

Sauro

Dumas

(2009) Comparison of three one-question, post-task usability questionnaires. In: Proceedings of the SIGCHI conference on human factors in computing systems, Boston, MA, 4–9 April 2009. New York: ACM.

40.

Scott

Baker

Shucksmith

et al . (2014) Autonomy, special offers and routines: A Q methodological study of industry-driven marketing influences on young people’s drinking behaviour. Addiction 109(11): 1833–1844.

41.

Seffah

Donyaee

Kline

et al . (2006) Usability measurement and metrics: A consolidated model. Software Quality Journal 14(2): 159–178.

42.

Shinebourne

(2009) Using Q method in qualitative research. International Journal of Qualitative Methods 8(1): 93–97.

43.

Sleep

DÔC

Puleston

(2011) Measuring the Value of Respondent Engagement: Summary of Research Findings. Bellevue, WA: Global Market Insight.

44.

Spearman

(1913) Correlations of sums or differences. British Journal of Psychology 5(4): 417–426.

45.

Spinuzzi

(2005) The methodology of participatory design. Technical Communication 52(2): 163–174.

46.

Stephenson

(1935) Technique of factor analysis. Nature 136: 297.

47.

Thomas

(2017). Resolving the test–retest agreement or reliability dilemma. Methodological Innovations 10(2).

48.

Watts

Stenner

(2005) Doing Q methodology: Theory, method and interpretation. Qualitative Research in Psychology 2(1): 67–91.

Assessing the visual Q method online research tool: A usability,reliability,and methods agreement analysis

Abstract

Keywords

Introduction

Stage one

Stage two

Stage three

Stage one: examining the usability of the VQMethod research tool

Usability laboratory tests and post-test questionnaire on the VQMethod

Sample and data collection

Data analysis and findings

VQ survey interface improvements

Stage two: reliability analysis

Sample and data collection

Data analysis

Findings

Composite reliability of VQ sorts

Factor stability

Effect of time on reliability of VQ sorts

Overall satisfaction, effectiveness, and relative efficiency

Stage three: methods agreement analysis

Sample and data collection

Data analysis

Findings

Discussion and conclusion

Footnotes

Declaration of conflicting interests

Funding

Notes

Author biographies

References