Abstract
Introduction
There is an imperative need to address accessibility for students with visual impairment or blindness in higher education.
Statement of the Problem
Statistics are a common challenge to some psychology and social science students due to their complexity, abstract nature, and the need to mathematically model human behaviour, yet sparse efforts have been made to teach statistics specifically to this student population.
Literature Review
Research on the topic usually recommends accessible teaching materials or strategies, and some RStudio packages are accessible for this student population.
Teaching Implications
The Mubsereen package (‘those who have insight’) comprises 10 functions that compute, analyze, interpret descriptive statistics, perform mean comparisons, correlation, regression, chi-square, and logistic regression. To facilitate screen reader accessibility, an alternative output method is pre-programmed into the functions and can be modified as needed.
Conclusion
The package is not a stand-alone tool. We provide instructor preparations and recommendations (such as tactile printouts) for a successful lecture. The proposed package is not published on CRAN, as we intend to beta-test it in classrooms like yours before uploading it.
Students with visual impairment or blindness attend higher education at higher rates than ever before but are not achieving comparable academic outcomes to peers without visual disabilities (Lintangsari & Emaliana, 2020; Marcone & Penteado, 2013). Although there is a growing emphasis on inclusivity and accessibility in higher education, particularly in psychology, the experiences of students with visual impairment or blindness have received limited attention (Lourens & Swartz, 2016). Understanding the challenges and needs of these students is crucial for developing inclusive learning environments and ensuring their academic/professional success (Amponsah & Bekele, 2022; Reed & Curtis, 2012).
Common in-classroom accommodations for students with visual impairment or blindness in higher education include accessible materials, assistive technology, extended time, preferential seating, adaptive lab equipment, and verbal descriptions of visual aids (Gray & Wilkins, 2005; Silva & Motti, 2024). In addition, screen readers such as Job Access With Speech (JAWS), audio recorders, and Braille embossers have emerged as powerful tools for students with visual impairment or blindness, enabling students to access digital content through speech or even Braille output (Kirboyun, 2021). Assistive technologies facilitate converting visual information into auditory or tactile formats, thereby enhancing the accessibility of software, online resources, and electronic documents for these students.
Statistics is a particularly relevant subject in Psychology and other social sciences as it provides the analytical foundation for conducting research and interpreting data, which becomes essential for academic success and future professional endeavours (Dahlstrom-Hakki & Wallace, 2022; Carrillo-García et al., 2021). The abstract nature of statistics makes traditional teaching rely heavily on visual representations, representing significant barriers for students with visual impairment or blindness (Godfrey & Loots, 2015; Marcone & Penteado, 2013). Limited access to educational resources and accessible instructional content impede the learning process, broaden academic disparities, and hinder the academic development of statistics students with visual impairment or blindness (Tsinajinie, 2021).
This underscores the importance of addressing the unique learning requirements of this student population, particularly in disciplines like psychology, where statistics are a fundamental component of research and practice, and in graduate programs where students are expected to publish throughout their training (Hatch & Skipper, 2016). The challenges faced by students with visual impairments or blindness in learning statistics are usually compounded by the limitations and complexities of proprietary statistical software. These programs are expensive and often have poor compatibility with screen readers, significantly hindering students’ independent learning and statistical computation (Godfrey & Loots, 2014). The combination of high costs and accessibility issues results in substantial barriers to academic progress and engagement with statistical analysis in these students (Marcone & Penteado, 2013; Stone et al., 2019). Consequently, there is a need for statistical processing tools that are easily available, scalable, and customizable and scalable for students with visual impairments or blindness.
R has been widely recognized as the most accessible statistical software for students despite its steep learning curve, even for sighted individuals (Godfrey, 2013). Building on R's flexibility in text input and output, as well as its screen reader compatibility, research suggests that an R package that minimizes text input while maximizing efficiency would be most beneficial for students with visual impairments or blindness (Godfrey, 2013; Godfrey & Loots, 2015). Seminal literature on teaching this student population highlights using the R packages BrailleR, tcltk, and Braille embossers as learning materials for these students (Erhardt & Shuman, 2015; Godfrey & Loots, 2014). However, researchers acknowledge that little progress has been made in developing tools specifically designed to perform statistical tests in an accessible manner for teaching these students (Godfrey & Loots, 2015; Stone et al., 2019; Wongkia & Poonpaiboonpipat, 2022).
Promoting inclusivity and academic equity in psychological higher education and professional practice requires improving the statistical learning experience for students with visual impairments or blindness. To address this need, we developed Mubsereen, a package that aids students in performing and interpreting hypothesis tests. The Mubsereen package tackles the unique challenges faced by students with visual impairments or blindness in learning statistics by employing a minimum input—maximum output format, utilizing an open-access statistical processor, and incorporating inclusive teaching strategies. Mubsereen distinguishes itself from other accessible resources by adhering to an Open Educational Resources (OER) framework. This approach ensures that the package is not confined to classroom use but can be utilized in various settings. Furthermore, Mubsereen has the potential to evolve into a scalable and sustainable tool for learning and applying statistics, making it an asset for these students throughout their academic and professional journeys (Koohang & Harman, 2007).
After over a year of development and alpha testing, we wish to incorporate other students and experts in the field to begin beta testing Mubsereen before publishing it on CRAN. Please contact us if you are interested.
Using the Mubsereen Package
Instructor Preparations
Setting a Working Directory
The package will require one folder to concentrate the datasets used during the lecture. We suggest a desktop folder for easy access. The file path must be specified using the setwd(‘/…’) function.
Renaming Variables and Importing Variables into the R Environment
Given the popularity of IBM's SPSS, the package is currently programmed to import .sav datasets, but that can be easily changed to include .csv, .xlsx, .sas7bdat, .dta. The groundwork for the instructor consists of shortening variable names in the .sav (or desired format) data file for memorization purposes, as this shortens and facilitates the input process for students (e.g., ‘STAI_PRE_TEST’ to ‘st1’ or even ‘stai1’). The instructor should also ensure the data values are correctly specified in SPSS or the desired program. Save the dataset in the folder that was previously specified as the working directory.
The .sav file is imported into R using haven::reav_sav(‘…’). We have found it is easier to import the .sav dataset into a three-lettered R object, as debugging and more complex variable specification is easier (e.g., iris$Sepal.Length[iris$Species==‘setosa’] vs. len[spe==1]). To prevent using the $ sign in accessing variables in a dataset, the package uses the list2env(dataframe, globalenv()) function to import variable columns separately into the R environment. This way, students only have to type the shortened variable names into the functions without specifying a data frame. Select the entire script and run it to load the dataframe, variable values, and functions. When a new dataset is needed in the course, we suggest the instructor follow the same steps for the new dataset and save the R script with the changes.
JAWS
The JAWS screen reader encountered some issues identifying and reading the console in certain versions of RStudio (2023.03.1. ‘Cherry Blossom’ and older). We then chose to add an optional sink function to redirect the output to a notepad (.txt) document located on the desktop, which we conveniently named ‘output’. This option can be disabled by adding the pound sign before the sink command in each function (i.e., #sink(‘file path’, append = F)), and this results in a text output in the console. By choosing not to append, the resulting output will refresh every time the notepad document is closed and reopened, which is easier than scrolling through previous outputs.
Once RStudio is open with the Mubsereen script, blind users use the ctrl + 1 to direct the cursor from the console to the script, ctrl + a to select all, ctrl + enter to load, ctrl + 2 to redirect the cursor from the script to the console, and finally, ctrl + l to clear the console in preparation for analysis (for Mac users, substitute control by command ⌘). If the sink function is desired, the analysis will not print the output to the console, so the user should exit RStudio and open the output notepad document for JAWS to read the output. Once the output is read, reset the process by closing the output file, returning to RStudio, and clearing the console.
For users with JAWS and other screen readers, the output conveniently states when the function output starts, when it ends, and when group-specific outputs start and end, too.
Printing
To complement the package, we recommend including tactile graphics. We used the PIAF® tactile embosser, swell touch paper, and a laser printer. The instructor could prepare in advance boxplots, scatterplots, distributions, and other tactile representations of statistical concepts (e.g., the distance between a point and its distribution mean, the distance between two sample means, the distance between three sample distributions to a grand mean, overlapping confidence intervals…etc.). We found that downloading the BrailleKiama font, using a 24-point font size in a 13.33 by 7.5-inch slide, worked best.
Functions
qhe(y): Exploration Function
This function analyzes the univariate distribution of a variable (i.e., y). It provides the mean, standard deviation, variance, standard error of the mean, 95% confidence interval, and minimum and maximum values. It also reports and interprets the Shapiro-Wilk test statistic, p-value, skewness, and kurtosis to determine whether the examined variable is normally distributed, follows an approximately normal distribution, or is not normally distributed. The skewness cut-off value is |2|, and the kurtosis is |6|, as suggested by Kim (2013).
The explore function incorporates the describe function from the psych package and requests type = 2 for skewness and kurtosis to equal SPSS output. The function also uses the MeanCI function from DescTools and the shapiro_test function from rstatix for 95% confidence intervals, Shapiro-Wilk test statistics and p values. See qhe function script on OSF materials for code (Fuentes-Balderrama, 2024).
qhmge(y, g): Multiple Group Exploration Function
This function provides elements similar to the exploration function and specifies descriptive statistics of a variable (i.e., y) by group (i.e., g). In addition to the previous commands, this function incorporates the val_label command from the labelled package to name the groups and changes the last three lines of each group's output to specify whether each group is normally distributed, follows an approximately normal univariate distribution, or is not normally distributed.
qhmc(y, g): Mean Comparisons Function
Based on the number of specified groups (i.e., g), the package will determine whether the appropriate hypothesis test should be a t-test or a One-Way ANOVA. Then, based on Levene's test, the function will determine whether a Welch t-test or Student t-test should be performed. Similarly, the function will determine whether a Fisher's ANOVA with Scheffés post-hoc test or a Brown-Forsythe ANOVA with Dunnett's T3 post-hoc test should be performed.
The output begins by specifying to the user what family of hypothesis tests was performed based on the number of groups found in the g variable. The package will then specify whether the homogeneity of variances assumption is fulfilled and whether the classic test or heteroskedastic counterpart is performed. The output will then report the test statistic, degrees of freedom, and resulting p-value, which will be interpreted for the user. The post-hoc module will also provide test statistics and multiple comparison-adjusted p values for bivariate comparisons.
The t-test module for the mean comparisons function uses the leveneTest command from the car package and base R t.test() commands, alternating between var.equal = T/F to choose between the Student or Welch t-test. The ANOVA module uses base R's lm(), the Anova function from car, and the ScheffeTest function from DescTools while the Brown-Forsythe ANOVA uses the bf.test function from the onewaytests package and the dunn.test function from the dunn.test package.
qhrm(y pre, y Post, id, tx = NULL): Repeated Measures Function
The repeated measures function provides a mean comparison for two observations (i.e., y pre and y post) by the same participant (i.e., id). The user can choose to add a condition to test as both a between-subjects factor (i.e., tx) and an interaction with time for a within-subjects factor. The function works following a two-step process. The first step restructures the data from wide to long format, which needs an id variable to structure, and then performs either a paired sample t-test or a factorial repeated measures ANOVA. The package currently supports two timepoint measurements and a single between-subjects factor.
The package's output will begin by specifying to the user which test was performed based on whether the tx argument (i.e., a between-subjects factor such as treatment group) is present. The paired t-test output will begin by specifying descriptive statistics at both measurements and their correlation. It will then provide the arithmetic difference between the means (i.e., pre-post), the t statistic, degrees of freedom, p-value, and an interpretation of the p-value. In contrast, the factorial repeated measures ANOVA will begin by specifying the F ratio for the within-subject effect of time, degrees of freedom, p-value, and interpretation. The second part of the output will report the F ratio for the between-subjects effect of the factor, degrees of freedom, p-value, and interpretation. Thirdly, it will report the F ratio for the time*tx within-subjects effect, degrees of freedom, p-value, and interpretation. Finally, the output will also report the pre and post-test descriptive statistics for the separate tx groups.
Regarding the paired t-test, the first step is achieved through the pivot_longer function from the tidyr package, then the base R t.test() with Paired = T, and cor() to report the association between pre and post-test observations. The factorial repeated measures ANOVA also begins with the pivot_longer() command, labels the groups using the val_labels function from labelled and then performs the hypothesis test using anova_test from rstatix.
qhfa(y, x1, x2): Factorial ANOVA Function
The factorial ANOVA function tests for mean differences in a dependent variable (i.e., y) inside a factorial design resulting from merging two independent variables (i.e., x1 and x2). The function provides output for both independent variables as main factors and output for the interaction term. Column, row, and cell mean comparisons are followed by Scheffé's post-hoc test. No new commands are introduced in this function.
The output for this function begins by specifying to the user which variables are being used in the analysis and a brief description of the factorial design resulting from the interaction of both independent variables. The second part of the output provides the F ratio for the first main effect, degrees of freedom, p-value, interpretation of the p-value, and Scheffés post-hoc test between groups if applicable. The third section of the output provides the same output for the second main effect. In contrast, the final section of the output provides the F ratio, degrees of freedom, p-value, interpretation of the p-value, and Scheffés post-hoc test between factorial design cells if applicable.
qhanc(y, x1, x2 = NULL, c):One-Way and Factorial ANCOVA Function
The One-way or factorial ANCOVA function tests for covariate (i.e., c) adjusted mean differences across groups in a single factor or factorial design by two factors (i.e., x1 or x1, x2). The package will detect the number of independent variables or factors and perform either a One-way or a factorial ANCOVA. Whether the user chooses to provide or not to provide a second factor in their code, we have found it useful to always specify which variable is the covariate by writing ‘c=’ before it. This function uses base R's aov() and emmeans command from the emmeans package, as well as the dplyr pipe operator (i.e., %>%).
The output of the qhanc function begins by specifying to the user whether a one-way or factorial ANCOVA is being performed and lists the factor (or factors) and the covariate being used. For a one-way ANCOVA, the output is followed by reporting the F ratio for the main effect, its degrees of freedom, corresponding p-value, and interpretation. If the F ratio is significant, the package will specify the number of significant bivariate comparisons of Estimated Marginal Means in the model, identify the contrast, and provide its t statistic and Sidak adjusted p-value. The final section of the output will report the F ratio for the covariate, degrees of freedom, p-value, and interpretation.
The factorial ANCOVA output will be very similar. The first section identifies the variables being used and briefly describes the factorial design being used (similar to the qhfa() output). The following sections of the function's output will provide the user with F ratios for each main effect, their interaction and covariate used, their degrees of freedom, corresponding p values, and interpretations. If F ratios for main effects or interaction terms display statistical significance, group contrasts will be reported along with t values and Sidak adjusted p values.
qhcor(y1, y2, g = NULL): Bivariate Correlation Function
This function performs Pearson's product-moment correlation on two variables (i.e., y1, y2). Optional to the user is adding a group (i.e., g), which results in the global correlation and then the correlation by groups. This function uses base R's cor.test() command.
qhreg(y, x1, x2 = NULL): Simple and Multiple Regression Function
The regression function can perform simple and multiple linear regression using two predictors only, using base R's lm(), qf(), and lm.beta::lm.beta() commands. The function's output begins by stating whether the package performs a simple or multiple regression. The simple regression output begins by identifying the dependent and independent variables, followed by the report and interpretation of the ANOVA. The second tier of the output describes the intercept of the regression equation and provides the user with its t-test and p-value. The third section of the output reports the standardized and unstandardized regression coefficients, its associated t-test, and raw score interpretation of the unstandardized regression coefficient. The last part of the output reports the model's R2.
The multiple regression output begins by testing the correlation assumption between the three variables. The package provides the correlation coefficient, resulting p-value, and interpretation. The first correlation reported is between the dependent variable and the independent variable entered first; the second correlation corresponds to the dependent variable with the second independent variable; the third correlation corresponds to both independent variables. The package automatically prints out a warning about multicollinearity and high correlations between both our predictors. VIF and tolerance are not reported. The second section of the output tells the user the package calculated a multiple regression, identifies the variables, and reports and interprets the ANOVA.
Like the output for simple regression, the next section describes the intercept and provides its t-test, p-value, and interpretation. The following two sections describe standardized and unstandardized regression coefficients, their t-tests, p values, and interpretations, while the last section of the output reports the model's total R2.
qhchi(x1, x2): Chi-Square Function
This function performs the chi-square test of independence between two categorical or nominal variables. It uses the to_factor command from the labelled package as well as base R's chisq.test() commands. The function's output begins by specifying to the user the number of categories inside each variable and the total number of cells in the confusion matrix. The second portion of the output reports the test statistic, degrees of freedom, resulting p-value, and interpretation. The final portion of the output presents the user with the confusion matrix.
qhlog(y, x1, x2 = NULL): Logistic Regression Function
This function can perform simple and multiple logistic regressions with up to two predictors. It uses the to_factor command from labelled, the NagelkerkeR2 function from fmsb, as well as base R's glm(), exp(), and pchisq() commands. The output begins identifying whether the package is performing a simple or multiple logistic regression based on the number of predictors. The output will then continue with the chi-square omnibus test and interpretation, as well as the intercept in log odds and its p-value.
The next step in the output will be related to the regression coefficients. First, the coefficients are presented in the log odds or unstandardized format, alongside their p values and interpretation. Then, regardless of statistical significance, the log odds will be exponentiated, then reported and interpreted as odds ratios. The final part of the output reports the model's Nagelkerke's pseudo R2. Note to JAWS users, we found that using ‘are’ as opposed to ‘R’ helps the phonetics of the screen reader.
Discussion
This report aims to share an accessible statistical tool for students with visual impairment or blindness. The higher purpose of this resource is to not only address classroom accommodation and accessibility but to provide this student population with a sustainable tool they can wield on their own in their path to academic and professional success (Amponsah & Bekele, 2022; Lourens & Swartz, 2016). In line with the philosophy behind Open Educational Resources (OER), we are sharing a scalable and potentially sustainable resource that is readily available, can be tailored to fulfill the specific needs of students and lecturers alike, and can be used outside a classroom setting (Koohang & Harman, 2007; Moon & Park, 2021).
For optimum performance, we suggest teachers structure the course by modules and begin introducing the functions in the order in which we present them here, as the first functions will become the statistical assumptions for the latter. Our experience was greatly enriched by student-teacher meetings before the semester to get familiar with JAWS, R, Mubsereen, and set up the output WordPad document if needed. Variables in datasets should be as short as possible to align with the minimum input—maximum output philosophy. We found it is best if the instructor emails the dataset for each module ahead of time and spends time before the lecture to change the data to import into the working package if needed.
Office hours are strongly advised to engrain the steps to perform an analysis and for the student to learn the code for each function based on repetition. The most effective strategy in office hours and in-classroom activities was to have the student verbalize a hypothesis (e.g., Group one will be higher), turn that into a statistical hypothesis (e.g., There will be a statistical difference between the groups), have the student identify the hypothesis test and assumptions (if any), then have them identify the corresponding function and perform the hypothesis test. The instructor aids in verifying the student's code JAWS functioning and provides the student feedback based on the output.
The main limitations for students using Mubsereen are cognitive overload and overreliance on technology. Although there is a significant reduction in typing inputs, users who are blind will have to 1) Plan the hypothesis test, 2) Program the functions, 3) Open the WordPad document for the output, 4) Close the document, 5) Go back to Rstudio, and 6) Clear the console. Similarly, some functions (e.g., ANOVA with post-hoc, multiple group normality exploration) result in long outputs that are difficult to follow. One advantage of R is that the output text can be shortened or modified for all functions.
The main limitation for instructors, not exclusive to the proposed R package, is the amount of out-of-class preparation and individualized instruction they get involved in, which usually results in burnout. Balancing in-class time for students with visual impairment of blindness and other students can be challenging. One advantage of having a mixed classroom was to have certain examples previously included in the student's R environment as vectors and have them provide the Mubsereen output to aid the rest of the classroom in calculations (e.g., Everyone in the classroom listens to the Mubsereen output, and then the other students calculate the mean and variance of a small dataset by hand).
Conclusion
The Mubsereen package has pedagogical implications for statistics lecturers, particularly those in Psychology and other social sciences. Being programmed in R makes it a powerful, flexible, and customizable statistical package that invites collaboration between lecturers and students with visual impairment or blindness. The package's development will benefit from the experience and expertise of students and lecturers who wish to incorporate it into their academic journeys.
Footnotes
Acknowledgement
The authors declare no conflict of interest and would like to acknowledge the invaluable support of Vanessa Ayala, the Assistive Technology Education Center at The University of Texas at Austin, the brilliant Pei-Chiang Lee for his engagement and insightful alpha testing, Yessenia Castro for making sure we all crossed paths, as well as Aaron Richmond and Nazanin Heydarian.
Function vignettes and sample code are accessible upon request.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
