Sage Journals: Discover world-class research

Abstract

The primary purpose of this paper is to advocate the use of multidimensional scaling (MDS) preference plot to study relationships among variables and individual differences in these variables. MDS preference plot is not a new visual technique; nevertheless, its application to visualize individual differences in variables for high-dimensional data is rare, particularly in education and social sciences. We illustrated its application using a real example in an educational setting. The results indicate that the MDS preference plot is a viable visualization technique for data mining and analytics. Traditional statistical methods, such as the analysis of variance, can be used to further support the visual analysis results.

Keywords

Machine learning visual analytics MDS preference plot data mining biplot

Visual analytics is the science of analytical reasoning via the visual interface (Thomas and Cook, 2005). Vieira et al. (2018) define visual learning analytics as visualization techniques to understand behavioral phenomena. It can integrate visual representations of data and data analysis into a coherent fashion to enhance our understanding of complex data because visual analytics leverage our perceptual abilities to detect patterns in a large volume of data more effectively and efficiently (Fekete et al., 2008). Carney and Levin (2002) indicate that research shows that individuals understand and remember better information presented in visual format than in words or sentences. As the volume and variety of data grow, data visualization and analytics become an increasingly important part of data analytics in educational and social science research. The advancement in computer technologies and the availability of graphic software make visual analytics more accessible, and it has received more attention in recent years. For example, the common goals in educational research have been modeling student behavior, improving assessment, predicting learning outcomes, enhancing student emotional well-being and social skills, and reducing dropout or delinquency. Various visualization techniques have been used to meet these goals in addition to more conventional methods such as clustering, regression, and analysis of variance (e.g. Davis et al., 2016; Hsiao et al., 2017; Martinez-Maldonado et al., 2016; Xing et al., 2015).

Many researchers have discussed the advantages of visualization (e.g. Pastore et al., 2017; Tay et al., 2016). For example, Otten et al. (2015) state that graphs improve communication between researchers and the public because sophisticated statistical methods can be understood relatively quickly when the information is presented graphically. Information in graphics can also help formulate models and serve as an essential first step to guide further analytic decisions (Butner et al., 2015). Studies from neuroscience indicate that human visual processing can encode information in a 10th of a second, and nearly half of the brain is devoted to visual processing (Abbott et al., 2012; Otten et al., 2015; Semetko and Scammell, 2012). Some researchers recommend a graphic accompanying every statistical model (e.g. Fife, 2020; Fife and Rodgers, 2022; Tay et al., 2016).

Regarding visualization techniques, Vieira et al. (2018) found in their literature review that visualization techniques most used included bar or histogram, line plot, scatter plot, bubble plot, radar plot, timeline, pie chart, concept map, correlation plot or heat map, word cloud, and word tree. More novel visualization methods included interaction matrix, glyph, geomap, spiral timeline, circular graph, adjacency matrix, and social network graph (Ertug et al., 2018; Hsiao et al., 2017; Vieira et al., 2018). However, most of these visualization methods are used to show relations between variables (e.g. variable distributions, trends, and relationships) or among individuals. For instance, the correlation plot shows relationships among variables using different shades of color, with a dark color showing a strong relationship and light color showing a weak relationship. Word cloud shows groupings of words according to usage frequency, and the social network graph shows connections among individuals. How about simultaneously visualizing the clustering of variables and individuals on the same graph? How should we visually investigate relationships among variables and individual differences in “preference” for these behaviors as assessed by these variables? The visualization methods, as mentioned above, fall short since they can not easily classify or group variables and individuals according to specific characteristics (Heer et al., 2010). For example, a correlation or adjacency matrix can show relationships between variables or individuals but not both. When we deal with large volumes of data, simultaneously visually grouping many variables and individuals can enhance the quality of the data analysis by quickly determining the focus of the analysis.

In this paper, we suggest the use of multidimensional scaling (MDS) preference plot (also known as perceptual mapping) as an additional graphic representation that can simultaneously capture the clustering of variables and individuals on the same graph and show the relationships between clustered variables and individuals (Carroll, 1972). As a visualization technique, the MDS preference plot is a clustering plot representing complex and nonlinear relationships among variables and individuals. The equation-based statistical method (e.g. analysis of variance or regression modeling) can then be used for further analyses based on information derived from it. In other words, based on the MDS preference plot, we can conduct further statistical analysis to test the visual information to its fullest extent. This paper aims to illustrate MDS preference visual analysis, accompanied by a statistical model for numeric testing, in conducting research. We will use an example with real data to highlight this process.

The framework of the MDS preference plot

This paper illustrates the innovative use of the MDS preference plot in research, particularly in education and social sciences. The MDS preference plot visualizes proximity data among a group of variables and individuals in a low-dimensional space (Carroll, 1972). With the advent of big data and data analytics, MDS is considered part of visual analytics and unsupervised machine learning and works to embed high-dimensional data points in some low-dimensional space (Buja et al., 2008). It may be one of the most powerful visualization methods to analyze high-dimensional data, and it can project the individual preference among high-dimensional data into a low-dimensional space, typically in a two-dimensional space for easy visualization. More technically, given high-dimensional data v₁, . . ., v_N €R^k, the MDS preference plot attempts to find the variable and people configuration in a low-dimensional Euclidean space by embedding variables v = 1, . . ., V and people p = 1, . . ., P, to a configuration by plotting coordinates of both variables and individuals on the same plot, allowing us to examine the relational patterns embedded in the plot visually.

Since its introduction over 40 years ago, its usage as a visualization technique has been underutilized in educational research, particularly as a visual analytic tool based on high-dimensional data, as evidenced by few studies that have used such a method. However, recent years have seen growth in applying artificial intelligence in education (Chen et al., 2020; Roll and Wylie, 2016), such as intelligent tutoring (Hwang et al., 2020), machine learning as a classification and predictive system (Musso and Cascallar, 2009; Musso et al., 2020). As a visual analytics method, the MDS preference plot¹ can simultaneously depict relationships among variables, individuals, and individual differences concerning these variables from an unsupervised machine learning framework.

The MDS preference plot is not a new visual technique and is closely related to a biplot (Gabriel, 1971; Gower et al., 2011), a joint display of rows and columns in a two-dimensional space. In a biplot, the row coordinates (i.e. individuals) are plotted as points, and the column coordinates (i.e. variables) are plotted as vectors. It is a visual display of multivariate data, and its underlying methodology can be used for a novel approach to data analysis and decision making (Roux and Gardner, 2006). Gower et al. (2011) consider a biplot a multivariate extension of an ordinary scatter plot. There are different variants of biplot, such as principal component analysis (PCA) biplot and MDS preference plot. The MDS preference plot is basically a PCA biplot on the transposed input data matrix. In applying MDS preference plot analysis, we need to consider the data as a matrix of individuals’ preference ratings on a set of variables. A high score indicates that individuals are more likely to endorse the behavior assessed by that variable. For example, if a person gives a rating of 5 for the item “Because there is no support available to me, I would not do it now” on a scale of 1–5, with 5 indicating a high level of endorsement of this item, we can consider this rating represents his/her behavior preference. This type of data is common when we use surveys or questionnaires. We could use the MDS preference plot to visualize this preference, with variables represented by points and subjects represented by vectors, showing individual differences in preferences.

Visual analysis of MDS preference plot

The analysis of the MDS preference plot is based on patterns identified in the graph. First, the MDS preference visual analysis displays the clustering of variables, and variables that show similarities are typically located together. Second, the visual analysis shows the direction and clusterings of individuals in the graph. Individuals with the same direction and approximate location indicate the closeness of these individuals. Since vectors represent the individuals, the vectors’ length suggests the individuals’ preference, and a short vector length suggests that the individual does not show preference.

Third and more importantly, the plot shows the relationships between individuals and variables. Since individuals and variables are jointly depicted in the same graph, we can examine how the clustering of individuals is related to the clustering of variables. We may detect individual differences by looking at the direction of vectors and how individuals are clustered.

Figure 1 shows an example of an MDS preference plot of decision-making behaviors among 30 individuals. Participants rated their decision-making styles on a scale of 1 (strongly disagree) to 4 (strongly agree) for a set of nine items (e.g. “I always do things calmly.,” “Because there is no support available to me, I would not do it now.,” “I know to act differently under different conditions”). The MDS preference plot analysis was conducted regarding their decision-making style preference. As shown in Figure 1, the red points or dots represent variables or items of decision-making behaviors, and vectors indicate individuals. First, items 14, 15, and 16 are located close together on the opposite side of other items, while items 1, 5, and 21 are close, and items 10, 19, and 22 are somewhat close together. Second, individuals’ preferences are mainly located on the right side of the plot. Four individuals (as indicated by v5, v7, v12, and v30) are away from the rest. Interestingly, cases 7 and 23 have a very short vector length, suggesting that these two persons do not prefer any items and are different from the others. Third, concerning individual differences in decision-making behaviors, individuals do not seem to endorse or prefer items 14, 15, and 16. Some individuals prefer items 1, 5, and 21, while others prefer items 10, 19, and 22. However, their preference for all these items is not strong, as indicated by the relatively short vector length. No one prefers items 14, 15, and 16.

Figure 1.

Example of MDS preference visual analysis of decision-making behaviors.

We could perform further statistical analyses to test different hypotheses based on these visual findings. For instance, we could hypothesize individual differences in decision-making behaviors by location in the biplot. Specifically, we could classify individuals into different groups according to their coordinates. Then we can conduct the analysis of variance to examine the individual differences. Below we present a real example of an MDS preference plot to examine school differences in science literacy.

A real example

Science literacy has become a fundamental skill in an increasingly complex world, and enhancing students’ science literacy is a core focus in public education (Anderson et al., 2007). Success in science, technology, engineering, and mathematics (STEM) is a national focus across the globe, and many efforts and resources have been dedicated to student achievement in this area (Young et al., 2018). For example, in the United States, schools are pressured to improve science achievement in all students (Every Student Succeeds Act, 2015), and schools are expected to conduct scientifically-based teaching practices. Schoolwide approaches to enhancing student performance are being explored, particularly in STEM education (Bybee, 2018).

Because school science literacy outcome is a function of various characteristics of school context, examining the relationships between school science literacy level and school context as assessed by various factors related to students, teachers, and administrators can reveal how these variables are associated and operate as a system for schools to achieve better. Although assessment outcomes may not be the best indicators of school achievement level (Chingos and West, 2015), schools still attempt to improve these outcomes to enhance public perceptions of school quality (Dee, 1998; Heck, 2000). From the pragmatic perspective of school policy and practice, school administrators and teachers may be interested in knowing the associations between factors in the school context and school science literacy achievement. Although such data do not account for student differences in science learning outcomes, they can be used to examine school differences concerning school science literacy outcomes. Because many variables are involved in school success, we used the MDS preference plot as an unsupervised machine learning tool for pattern classification and prediction for school science literacy outcomes. We are interested in the question, “In what way do the schools differ regarding their science literacy outcomes?” As a machine learning approach, we described the MDS preference plot analysis based on a basic approach for applying machine learning techniques suggested by Alyahyan and Dustegor (2020) .

Data source

The data used in this example were drawn from the PISA 2015. PISA operates on a 3-year cycle, and achievement areas are assessed in each cycle, but one specific area is emphasized each cycle. In 2015, science performance was the focus of the assessment. According to PISA, science literacy is “the ability to engage in reasoned discourse about science and technology, which requires the competencies to: explain phenomena scientifically, evaluate and design scientific enquiry, and interpret data and evidence scientifically” (OECD, 2017: 22). In this illustration, 2015 PISA data from 179 Chinese schools were used.

Initial preparation

The study included 48 variables. These variables were derived from the student, teacher, and principal questionnaires of PISA 2015. Variables derived from the school principal questionnaire included: school resources, school climate, school leadership, the quantity of teaching staff, school type, school responsibility, and school extra-curricular activities. Variables derived from the teacher questionnaire included: educational resources, job satisfaction, school leadership, teaching and assessment practices, and teaching and teacher collaboration. Variables derived from student questionnaires included: student disposition toward collaborative work, interest in science, science learning in school, student motivation, and science-related disposition. Table 1 lists and describes all the variables used in this study.

Table 1².

Variables used to assess context for school science literacy outcomes.

	Description and sample items	Reliability (α)
Principal-reported variable
The student-teacher ratio (Student teacher ratio)	The student-teacher ratio was obtained by dividing the number of enrolled students by the total number of teachers.	-
Class size (Class size)	The average class size is derived from one of nine possible categories in question, ranging from “15 students or fewer” to “More than 50 students.”	-
The proportion of fully certified teachers (Proportion fully certified teacher)	The proportion of fully certified teachers was computed by dividing the number of fully certified teachers by the total number of teachers.	-
The proportion of science teachers (Proportion science teacher)	The proportion of science teachers was computed by dividing the number of science teachers by the total number of teachers.	-
The proportion of fully certified science teachers (Proportion certified science teacher)	The proportion of fully certified science teachers was computed by dividing the number of fully certified science teachers by the total number of teachers.	-
The proportion of science teachers with an ISCED 5A qualification (Proportion science major teacher)	The proportion of science teachers with an ISCED 5A qualification and a major in science was calculated by dividing the number of these teachers by the total number of science teachers.	-
Creative extra-curricular activities (Creative extra activity)	The index of creative extra-curricular activities at school was computed as the total number of the activities that occurred at school such as art club or art activities.	-
Principal’s management of school practices (Management school)	“I use student performance results to develop the school’s educational goals.”	0.87
	“I ensure that teachers work according to the school’s educational goals.”	0.87
Shortage of educational material (Material shortage)	A lack of educational material (e.g. textbooks, IT equipment, library, or laboratory material).	0.82
Shortage of educational material (Material shortage)	Inadequate or poor-quality physical infrastructure (e.g. building, grounds, heating/cooling, lighting, and acoustic systems).	0.82
Shortage of educational staff (Staff shortage)	A lack of teaching staff. Inadequate or poorly qualified assisting staff.	0.87
Student-related factors affecting the school climate (Student hindering behavior)	Student truancy. Students intimidating or bullying other students.	0.75
Teacher-related factors affecting the school climate (Teacher hindering behavior)	Teachers not meeting individual students’ needs.	0.84
	Teachers not being well prepared for classes.	0.84
Professional development (Professional development)	Our school invites specialists to conduct in-service.	0.72
Professional development (Professional development)	Training for teachers.	0.72
The number of computers available at school (Available computer school)	The index of availability of computers is the ratio of computers available to 15-year-olds for educational purposes to the total number of students in the modal grade for 15-year-olds.	-
The number of computers available at school connected to the internet (Available computer internet)	The index was calculated as the ratio of number of computers available to 15-year old for educational purposes to the number of these computers that were connected to the internet.	-
The schools’ science-specific resources (School science resources)	It was constructed by summing up the principals’ answers to the question: Compared to other departments, our school’s science department is well equipped. We have enough laboratory material that all courses can regularly use it.	0.75
Monitoring of the practices of teachers (Monitor teaching)	Principal or senior staff observations of lessons. Observation of classes by inspectors or other persons external to the school.	0.78
Use of teacher developed assessment (Teacher developed assessment)	To make decisions about students’ retention or promotion. To group students for instructional purposes.	0.81
Use of standardized assessment (Standardized assessment)	To guide students’ learning.	0.79
Use of standardized assessment (Standardized assessment)	To make judgments about teachers’ effectiveness.	0.79
Responsibility for curriculum (Curricula responsibility)	An index of the relative level of responsibility of school staff in issues relating to curriculum and assessment, namely “establishing student assessment policies,” “choosing which textbooks are used,” “determining course content,” and “deciding which courses are offered.”	0.83
Providing teacher support (Provide teacher support)	When a teacher has problems in his/her classroom, I take the initiative to discuss matters.	0.80
Providing teacher support (Provide teacher support)	When a teacher brings up a classroom problem, we solve the problem together.	0.80
Responsibility for resources (Resource responsibility)	Establishing student disciplinary Policies. Deciding on budget allocations within the school.	0.77
Teacher participation (Teacher participation)	Formulating the school budget. Establishing student assessment policies.	0.74
School autonomy (School autonomy)	Choosing which textbooks are used. Determining course content.	0.71
Teacher-reported variable
Teaching and learning interaction in the classroom (Teacher reported inquiry teaching)	I explain scientific ideas. The class corrects homework together.	0.81
Emphasis on science approaches and processes (Science emphasis)	Emphasis is given to: “Knowing basic science facts\principles.”	0.76
	“Integrating science with other subjects.”	0.76
Instruction strategies (Instruction strategy)	Assign tailored tasks to the weakest as well as to the best students.	0.72
Instruction strategies (Instruction strategy)	Read state-of-the art papers in my scientific discipline.	0.72
Assessment practice (Assessment practice)	I develop and administer my own assessment.	0.74
Assessment practice (Assessment practice)	I have individual students answer questions in front of the class.	0.74
Teacher satisfaction with the current job environment (Teacher satisfaction with school)	I enjoy working at this school;	0.85
	I am satisfied with my performance in this school.	0.85
Satisfaction with the teaching profession (Teacher job satisfaction)	If I could decide again, I would still choose to work as a teacher;	0.81
	I would recommend my school as a good place to work.	0.81
Educational material shortage from the teacher’s perspective (Teacher perceived material shortage)	A lack of educational material (e.g. textbooks, IT equipment, library or laboratory material).	0.78
	Inadequate or poor-quality physical infrastructure (e.g. building, grounds, heating/cooling, lighting and acoustic systems).	0.78
Staff shortage from the teacher’s perspective (Teacher perceived staff shortage)	A lack of teaching staff.	0.80
	A lack of assisting staff.	0.80
Science teacher collaboration (Teacher collaboration)	We discuss the achievement requirements for science when setting tests.	0.85
Science teacher collaboration (Teacher collaboration)	My fellow science teachers benefit from my specific skills and interests.	0.85
Self-efficacy related to teaching science content (Teaching self efficacy)	Design experiments and hands-on activities for inquiry-based learning.	0.79
	Facilitate a discussion among students on how to interpret experimental findings.	0.79
Self-efficacy related to science content (Science content self efficacy)	Explain a complex scientific concept to a fellow teacher.	0.76
	Read state-of-the art papers in my scientific discipline.	0.76
Exchange and co-ordination for teaching (Teaching exchange)	It is natural for us to cooperate on what homework to give to our students.	0.82
Exchange and co-ordination for teaching (Teaching exchange)	We exchange tasks for lessons and homework that cover a range of different levels of difficulty.	0.82
Transformational leadership from the teacher’s perspective (Teacher perceived leadership)	The principal tries to achieve consensus with all staff when defining priorities and goals in school.	0.77
	The principal treats teaching staff as professionals.	0.77
Student-reported variable
Teacher-directed science instruction (Perceived directed teaching)	The teacher explains scientific ideas.	0.81
	The teacher demonstrates an idea.	0.81
Perceived feedback (Teacher feedback)	The teacher tells me how I am performing in this course.	0.79
Perceived feedback (Teacher feedback)	The teacher tells me in which areas I can still improve.	0.79
Achievement motivation (Achieve motivation)	I want top grades in most or all of my courses. I want to be the best, whatever I do.	0.81
Science self-efficacy (Science self efficacy)	Recognize the science question that underlies a newspaper report on a health issue.	0.85
Science self-efficacy (Science self efficacy)	Interpret the scientific information provided on the labeling of food items.	0.85
Enjoying Team work (Team work)	I prefer working as part of a team to working alone.	0.78
	I am a good listener.
	I take into account what others are interested in.
Teacher support in a science class (Teacher support)	The teacher shows an interest in every student’s learning.	0.74
Teacher support in a science class (Teacher support)	The teacher gives students an opportunity to express opinions.	0.74
Instrumental motivation (Instrumental motive)	Making an effort in my science subject(s) is worth it because this will help me in the work I want to do later on.	0.86
Instrumental motivation (Instrumental motive)	Studying my science subject(s) is worthwhile for me because what I learn will improve my career prospects.	0.86
Test anxiety (Test anxiety)	I often worry that it will be difficult for me taking a test.	0.82
Test anxiety (Test anxiety)	I get very tense when I study for a test.	0.82
Enjoyment of science (Enjoyment science)	I generally have fun when I am learning science topics.	0.86
Enjoyment of science (Enjoyment science)	I am happy working on science topics.	0.86
Inquiry-based science teaching and learning practices (Perceived inquiry teaching)	The teacher explains how a science idea can be applied to a number of different phenomena (e.g. the movement of objects, substances with similar properties).	0.81
	Students are allowed to design their own experiments.	0.81
Individualized instruction (Individualized teaching)	My teacher gives hints or offers strategies that help me to solve a task.	-
Individualized instruction (Individualized teaching)	My teacher adapts the content and method to my needs.	-
School science literacy outcome (School science literacy)	Student science literacy test scores aggregated to school level.	-

Term in parathesis indicates variable name used in MDS preference visual analysis.

Data processing

We first aggregated the student and teacher data to the school level to create school-level variables since these variables are not nested with each other. These variables were used as a measure of the school learning context. We then checked for outliers in the data to ensure all variables’ values were within a reasonable range. No outliers were found in these variables. Next, all the continuous variables were standardized with a mean of 0 and a standard deviation of 1, as they were measured using different scales. Finally, all variables were scored, with a high score indicating a higher characteristic level as assessed by that variable. For example, a high score for student-reported inquiry-based teaching indicated a higher average at the school level. A high score for job satisfaction reported by teachers indicated a higher average job satisfaction at the school level.

Visual analysis and results

MDS preference plot analysis was conducted using R (R Core Team, 2013), and the subsequent analysis was conducted using SAS (SAS Institute Inc, 2013). In this illustrative example, the evaluation of the MDS preference plot was based on identifying patterns in the graph, as mentioned previously. First, the visual display of schools and variables was divided into four quadrants for discussing patterns according to each quadrant. Second, vector length indicated the degree of a school’s preference for an item, and the relatively longer vector length suggested more preference than the short vector length. Third, the angle of vectors (i.e. direction) indicated the degree of association among schools as well as item preference, with a smaller angle indicating a stronger association among schools. Fourth, schools whose vectors were in the direction of the variable indicated a preference for that variable, with a longer vector length suggesting a higher preference level. For example, if a school’s vector is pointed toward the student-reported variable “science self-efficacy,” it suggests a higher degree of student science self-efficacy at the school level.

Figure 2 shows the MDS preference plot. We could discuss the plot in terms of four quadrants. First, for easy visualization, the variable plot is shown in panel A of Figure 2. As can be seen, variables are scattered around four quadrants. On the one hand, variables in the same quadrant tended to be more associated than those from other quadrants, notably the opposite quadrants. On the other hand, school literacy level is more likely to be associated with variables inside the black circle than those outside the circle.

Figure 2.

The MDS preference visual analysis. Panel (a) Variable plot and panel (b) MDS preference plot. Number labeling of the quadrant is arbitrary. The black circle in Panel A depicts the region where other variables are close to the “school science literacy” variable. We show the variable plot for easy reference when viewing the MDS preference plot. The black circle in Panel B depicts a 95% confidence interval of variable distribution.

Second, nearly all schools were located in quadrants 1 and 2, as shown in panel B of Figure 2. Schools in quadrant 1 are more likely to differ from schools in quadrant 2 for variables located on the top of the solid black line in quadrant 1. This information indicated that these schools were more closely matched with features represented by variables in that quadrant region. Similarly, the schools in quadrant 2 are more likely to differ from schools in quadrant 1 for variables located at the bottom of the solid black line in quadrant 2. Nevertheless, schools are less likely to differ for variables in the region between the two solid black lines. For instance, regarding item “curricula responsibility,” schools in quadrant 1 showed a higher mean level than those in quadrant 2. This result is shown in panel A of Figure 3 via the mean plot with a 95% confidence interval from the analysis of variance. The finding supports the conclusion drawn based on visual analysis concerning “curricula responsibility.”

Figure 3.

Mean Plot of School Differences with 95% CI by quadrant. Panel (a) responsibility for curriculum design and assessment and panel (b) students perceived inquiry-based teaching. The mean score is in a standardized unit.

In contrast, for item “perceived inquiry teaching,” schools in quadrant 2 showed a higher mean level than those in quadrant 1, as shown in panel B of Figure 3 via the mean plot with a 95% confidence interval from the analysis of variance. However, it is interesting that all these schools do not differ concerning school science literacy achievement and no school vectors go in that direction. Thus, we could not explicitly address the question, “In what way do schools differ regarding science literacy outcomes?” for these Chinese schools. One reason may be that these schools in the PISA sample are based on the purposeful sample and do not differ in their science literacy level.

Discussion

This paper’s primary aim was to advocate using the MDS preference plot as an innovative visual method to study individual differences in outcomes for high-dimensional data. Although the MDS preference plot is not a new visualization technique, not many studies of education or social sciences have taken advantage of its visual capability for pattern discovery and prediction. Visual analytics is becoming an essential part of data analytics and machine learning. Visual analytics of high-dimensional data can help enhance the interpretability of results from data analytics, which provides a convenient way to understand the relationships among variables and individual differences in these variables. In our example of PISA data, we could see that schools show no differences concerning school science literacy outcomes, although they differed in other aspects of school-related issues, such as the responsibility of school staff in issues relating to curriculum and assessment.

As part of visual analytics and machine learning, the MDS preference plot identifies patterns among variables and individuals for classification and prediction, establishing connections between variables and individuals and their direct correlates. Its graphical representation captures key aspects of complex interactions between variables and displays the individual differences in behavioral preferences toward these variables. Although we tend to use mathematical equations first (e.g. hierarchical linear modeling) to study the complex relationships among variables, this process can be reversed with visual analytics (Butner et al., 2015). For instance, visual analytics can be the first step used in statistical modeling. We can then test the results or hypotheses derived from visual analytics with traditional statistical modeling. In our example illustrated here, from our MDS preference plot, we further examined how schools differ in certain school features using the analysis of variance. The results of such analysis supported this expectation. Thus, visual analytics can aid in testing one’s expectations via visual clues.

Although we advocate using the MDS preference plot as a machine learning approach in research, the MDS preference visual analysis also has limitations. First, high-dimensional data are visualized in the low-dimensional space, leading to information loss. However, it should be noted that the first two dimensions typically account for most of the variances in the high-dimensional data, and the first two dimensions are the most important visual representations of key relationships. Second, analyzing an MDS preference plot involves certain subjectivity and a work of art. For instance, we drew a circle around the “school science literacy” variable to assess its relationships with other variables in the variable plot. It is subjective to decide how big this circle should be. Similarly, we drew two solid black lines with a 50° angle in the MDS preference plot to decide school differences in variables in these regions. However, we could draw lines with a different degree angle. This decision is based on our own experiences and our analysis of visual results rather than some objective criteria. Thus, it is also important to stress the role of theory in developing complex visual representations.

Despite these limitations, using the MDS preference plot as an unsupervised machine learning approach effectively communicates research results via a visual technique. Methodologically, it is a novel application of MDS modeling based on high-dimensional data.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Cody Ding

Notes

Author biographies

Yan Zhang is a lecturer of applied psychology in the College of Education at Shenyang Normal University. She received her M.Ed. in applied psychology from Shenyang Normal University and is pursuing a Ph.D. from Liaoning Technical University. Her research interests include adolescent psychological adaptation and learning, machine learning, and data visualization.Cody Ding is a professor of measurement psychology in the Department of Education Sciences & Professional Programs at the University of Missouri-St. Louis. He received his Ph.D. in educational psychology from the University of Minnesota. His research interests include Multidimensional Scaling models for change and preference, adolescent psychosocial adaptation, learning, neuroeducation, machine learning, and data visualization.

References

Abbott

Byrne

(2012) Diminished subjective wellbeing in Schizotypy is more than just negative affect. Personality and Individual Differences 52(8): 914–918.

Alyahyan

Düstegor

(2020) Predicting academic success in higher education: Literature review and best practices. International Journal of Educational Technology in Higher Education 17: 1–21.

Anderson

Lin

H-S

Treagust

, et al. (2007) Using large-scale assessment datasets for research in science and mathematics education: Programme for international student assessment (PISA). International Journal of Science and Mathematics Education 5: 591–614.

Buja

Swayne

Littman

, et al. (2008) Data visualization with multidimensional scaling. Journal of Computational and Graphical Statistics 17(2): 444–472.

Butner

Gagnon

Geuss

, et al. (2015) Utilizing topology to generate and test theories of change. Psychological Methods 20: 1–25.

Bybee

(2018) STEM Education Now More Than Ever. Arlington, VA: National Science Teaching Association Press.

Carney

Levin

(2002) Pictorial illustrations still improve students’ learning from text. Educational Psychology Review 14: 5–26.

Carroll

(1972) Individual differences and multidimensional scaling. In: Shepard

Romney

Nerlove

(eds) Multidimensional Scaling: Theory and Applications in the Behavioral Sciences (Volume 1): Theory. New York, NY: Seminar Press, pp.105–155.

Chen

Zou

Cheng

, et al. (2020) Detecting latent topics and trends in educational technologies over four decades using structural topic modeling: A retrospective of all volumes of computers & education. Computers & Education 151: 103855.

10.

Chingos

West

(2015) The uneven performance of Arizona’s charter schools. Educational Evaluation and Policy Analysis 37(1_suppl): 120S–134S.

11.

Davis

Chen

Hauff

, et al. (2016) Gauging MOOC learners’ adherence to the designed learning path. In Barnes

Chi

Feng

(eds) Proceedings of the 9th International Conference on Educational Data Mining. Raleigh, NC: International Educational Data Mining Society, pp. 54–61. Available at: http://www.educationaldatamining.org/EDM2016/proceedings/paper_63.pdf

12.

Dee

(1998) Competition and the quality of public schools. Economics of Education Review 17: 419–427.

13.

Ertug

Gruber

Nyberg

, et al. (2018) From the Editors—A brief primer on data visualization opportunities in management research. Academy of Management Journal 61(5): 1613–1625.

14.

Every Student Succeeds Act (2015) U. S. C., Available at: https://www.congress.gov/bill/2114th-congress/senate-bill/1177 (accessed 10 August 2022).

15.

Fekete

Van Wijk

Stasko

, et al. (2008) The value of information visualization, Information Visualization. In: Kerren

Stasko

Fekete

, et al. (eds) Information Visualization: Human-Centered Issues and Perspectives. Berlin, Heidelberg: Springer, pp.1–18.

16.

Fife

(2020) The eight steps of data analysis: A graphical framework to promote sound statistical analysis. Perspectives on Psychological Science 15(4): 1054–1075.

17.

Fife

Rodgers

(2022) Understanding the exploratory/confirmatory data analysis continuum: Moving beyond the “replication crisis”. American Psychologist 77: 453–466.

18.

Gabriel

(1971) The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3): 453–467.

19.

Gower

Lubbe

Roux

(2011) Understanding Biplot. Chichester: John Wiley & Son.

20.

Heck

(2000) Examining the impact of school quality on school outcomes and improvement: A value-added approach. Educational Administration Quarterly 36(4): 513–552.

21.

Heer

Bostock

Ogievetsky

(2010) A tour through the visualization zoo. Communications of the ACM 53: 59–67.

22.

Hsiao

IYT

Lan

Y-J

Kao

C-L

, et al. (2017) Visualization analytics for second language vocabulary learning in virtual worlds. Educational Technology & Society 20(2): 161–175.

23.

Hwang

Xie

Wah

, et al. (2020) Vision, challenges, roles, and research issues of artificial intelligence in education. Computers and Education Artificial Intelligence 1: 100001.

24.

Martinez-Maldonado

Pardo

Mirriahi

, et al. (2016) Latux: An iterative workflow for designing, validating and deploying learning analytics visualisations. Journal of Learning Analytics 2(3): 9–39.

25.

Musso

Cascallar

(2009) Predictive systems using artificial neural networks: An introduction to concepts and applications in education and social sciences. In: Richaud

Moreno

(eds) Research in Behavioral Sciences, vol. 1. Buenos Aires, Argentina: CIIPME/CONICET, pp.433–459.

26.

Musso

Cascallar

Bostani

, et al. (2020) Identifying reliable predictors of educational outcomes through machine-learning predictive modeling. Frontiers in Education 5: Article 00104.

27.

OECD (2017) PISA 2015 Assessment and Analytical Framework: Science, Reading, Mathematics, Financial Literacy and Collaborative Problem Solving. Pris: OECD Publishing.

28.

Otten

Cheng

Drewnowski

(2015) Infographics and public policy: Using data visualization to convey complex information. Health Affairs 34(11): 1901–1907.

29.

Pastore

Lionetti

Altoè

(2017) When one shape does not fit all: A commentary essay on the use of graphs in psychological research. Frontiers in Psychology 8: 1666.

30.

R Core Team (2013) R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.

31.

Roll

Wylie

(2016) Evolution and revolution in artificial intelligence in education. International Journal of Artificial Intelligence in Education 26: 582–599.

32.

Roux

Gardner

(2006) Analysing your multivariate data as a pictorial: A case for applying biplot methodology? International Statistical Review/Revue Internationale de Statistique 73(3): 365–387.

33.

SAS Institute Inc (2013) Sas/Stat^® 9.4 User’s Guide. Cary, NC: SAS Institute Inc.

34.

Semetko

Scammell

(2012) The SAGE Handbook of Political Communication. Los Angeles, CA: SAGE.

35.

Tay

Parrigon

Huang

, et al. (2016) Graphical descriptives: A way to improve data transparency and methodological rigor in psychology. Perspectives on Psychological Science 11(5): 692–701.

36.

Thomas

Cook

(2005) Illuminating the Path: The Research and Development Agenda for Visual Analytics. Los Alamitos, CA: IEEE Computer Society.

37.

Vieira

Parsons

Byrd

(2018) Visual learning analytics of educational data: A systematic literature review and research agenda. Computers & Education 122: 119–135.

38.

Xing

Wadholm

Petakovic

, et al. (2015) Group learning assessment: Developing a theory-informed analytics. Educational Technology & Society 18(2): 110–128.

39.

Young

Capraro

, et al. (2018) Every student can’t succeed if every voice is not heard: Equity perspectives from STEM educators. Teachers College Record 120(13): 1–26. https://www.tcrecord.org (accessed 20 July 2021).

Using MDS preference plot as visual analytics of data: A machine learning approach