Abstract
Individuals with severe mental illnesses (SMI) often have difficulty performing daily activities that require intact executive functions, such as grocery shopping. Performance-based evaluations are valuable but lack the subjects’ viewpoints during task performance. This study aims to combine performance-based observation and cognitive science methods to provide insights regarding real-life behavior and problem-solving in populations with SMI. In this correlational-research study, 42 participants (10 in the SMI group and 32 in the control group) performed the Test of Grocery Shopping Skills (TOGSS) while wearing an eye-tracking device. We hypothesized that patterns in task planning, task-time use, and attention allocation to written information relevant to the task would differ between the groups during the task. The results showed between-group differences in both TOGSS efficiency outcomes (time and redundancy), duration, and number of fixations. An eye-tracking pattern analysis determined between-group differences in scanning patterns of the grocery list but similarities in task planning. The selection process was found to be significantly more accurate and efficient for the control group than for the SMI group. Our findings suggest that a combination of perspectives allows us to better understand the behavior of SMI individuals in a regular daily task.
Keywords
Introduction
For people dealing with severe mental illnesses (SMI), executive functions (EF) are widely reported to be an essential mediation for completing real-world tasks.1,2 The EF components are often assessed through performance-based assessments, which serve to observe EF components in everyday life tasks, 3 such as cooking meals,4,5 managing finances, 6 scheduling meetings, 7 and grocery shopping.8,9
Using performance-based evaluations is valuable but does not capture the subjects’ viewpoints during task performance. In contrast to previous studies that used task recording 5 or virtual tasks, 10 here we employed a translational research approach, aiming to gain the advantages of technologies that enable recording behavioral and physiological performance, combined with an ecological setup of a naturalistic environment.
For this purpose, we stepped out of the lab and conducted the Test of Grocery Shopping Skills (TOGSS 11 ) in a local supermarket using an eye-tracker device. In recent years, the TOGSS has been studied for the SMI population9,12–14 and established relationships between TOGSS outcome measures and EF components. Zayat et al.'s (2011) research noted that working memory and problem-solving had significant links to the TOGSS outcomes. However, relationships with planning have been found only for the TOGSS efficiency outcome measures and not for accuracy. Adding an eye tracker to the study protocol enabled us to extend the understanding of these relationships.
For more than four decades, eye gaze has been studied in schizophrenia 15 and other mental disorders. 16 We used eye fixation as a physiological measurement to provide information about cognitive processing and EF efficiency throughout the task. Stable fixations result from complex motor actions; whereby sensory information is retrieved from the retina to signal eye muscles to maintain eye position. 17 Fixations longer than 100 ms are considered to indicate a meaningful cognitive process. 18
The benefit of measuring eye movements during task performance is the eye tracker's noninvasiveness and ability to distinguish among different behaviors. 19 In this study, participants wore the eye-tracker device while performing the TOGSS. Expected behaviors were related to task planning according to the list provided, task performance while detecting and selecting items, and task remonitoring by checking the grocery list.
Planning has received wide attention in research and intervention programs for the SMI population.3,6,20 People with schizophrenia spend proportionally less time planning and more time executing tasks relative to control groups. Understanding how people with cognitive deficits perform tasks is considered as important as knowing whether they performed them accurately. 5
The purpose of this study was to combine performance-based observation and cognitive-science methods to provide insights regarding the SMI population's problem-solving in real-life situations. This is the first article to analyze differences between SMI and control groups relative to both EF in the TOGSS and eye-movement outcomes. Therefore, based on previous studies, we hypothesized that we would find: (a) significant between-group differences in TOGSS time and redundancy in a way that would show less efficient behavior by the SMI group, (b) significant correlations between time and redundancy within each group, and (c) group differences in eye fixation for list-screening patterns related to the TOGSS task demands and for gathering written information during item selection (assuming these performance outcomes showed significant group differences).
Materials and methods
Naturalistic grocery shopping
The TOGSS 12 is an ecological performance-based evaluation of grocery shopping for individuals with schizophrenia and schizoaffective disorders. It examines the selection of grocery items in an unfamiliar grocery store. At the beginning of the evaluation, participants receive a list of 10 grocery items to purchase while the examiner measures and documents their performance. Our study examined two aspects of the TOGSS outcomes—efficiency and accuracy. Efficiency is measured by two scores: time (total minutes:seconds to complete the shopping task) and redundant entries to aisles (how often participants return to a previously visited aisle). Accuracy is measured by three scores: the number of correct items selected, whether the items were correct in size, and whether the selected items had the lowest price. The maximum total score is 30 points. 2
Eye tracking
In addition to the behavioral indices measured by the TOGSS, eye tracking was carried out throughout the task to extract eye gaze. SensoMotoric Instruments developed the eye-tracking device, designed to resemble ordinary glasses (weighing 68 g) to minimize disturbance to the participant. 21 The system contains a video-based eye tracker that records the visual field through the participant's perspective. It tracks eye position at a rate of 60 Hz throughout the performance. The system provides reliable binocular eye-tracking data in real-time. The glasses are attached to a designated smartphone used for calibration and temporary data storage (raw fixation data, pupil measurements, etc.). In this study, fixations were defined as gaze samples clustered around a given point for a minimum of 100 ms, along with a dispersion threshold of 0.5°–1° of visual field. This duration can be interpreted in several ways. Some researchers claim that the fixation-duration proportion indicates readers’ meaningful comprehension of what they read. 22 Other studies addressed fixation duration as an indicator of the cognitive effort being put into the processing. 23 For fixation amount, the overall number of fixations is believed to be negatively related to search efficiency. 24

Test of grocery shopping skills (TOGSS) predicted sequence. The predicted sequence of the TOGSS: (1) Start: Explanations and starting point. Plan how to perform the task by scanning the items on the list. (2) Item detection: Walk around the supermarket and look for the 10 items. Screening, categorization, and using signs are expected. If needed, ask for help. (3) Item selection – event: Screen the options on the shelf and select the correct item in the right size at the lowest price. Screening, gathering information from the item package and signs, comparing prices, making a decision, marking the item on the list. (4) Check and end: Check the selected items. Make sure that 10 items were selected and go to cashier.
Participants
Forty-two participants completed the study. After the University of Haifa Ethics Committee and the Israeli Ministry of Health approved the study protocol, a presentation led by the first author was held with participants and staff from more than 20 mental health support organizations. The sample size was calculated with the G*Power statistical program based on a moderate effect size, α < .05, power of 0.80 and effect size of 0.0625. For two groups and four variables, the calculation was 65 participants in each group, and for one group with the same conditions, 37 participants. Due to COVID-19 restrictions on the data-collection process, the study had to be stopped in March 2020, before reaching its planned sample size. Nevertheless, the following describes the recruitment done.
The SMI group consisted of stable outpatients who were eligible to receive public services and met the Israeli criterion for psychiatric disability, defined as severe enough to compromise at least 40% of their functioning. This diagnosis was determined by a medical committee, which included a psychiatrist recognized by the Israeli National Insurance regulations. 7 All participants could understand simple reading tasks in Hebrew, scored at least 7 on the Jaeger Chart for visual acuity, 25 and were accustomed to grocery shopping as part of their daily routine. Exclusion criteria included severe chronic physical diseases, drug and alcohol abuse, or psychiatric hospitalization of over 24 h during the month preceding the intervention onset.
Both groups were measured for EF by the Behavior Rating Inventory of Executive Function-adult version (BRIEF-A) and, except for four participants (two from the SMI group and two from the control group), all subjects had standard scores. Moreover, differences of general pupil size were found to be nonsignificant between the groups (Z = –.68, p = .48).
Procedure
All 42 participants performed the TOGSS in an unfamiliar supermarket while wearing the eye-tracker device. The acquired eye-tracking data were analyzed using designated software (SMI BeGaze, version 3.6.57) and in-house MathWorks MATLAB® scripts. We analyzed the data by (1) categorizing list-screening patterns, (2) separating the TOGSS time index between item-selection and item-detection phases, and (3) defining four item selections and coding eye fixations according to 16 areas of interest (AOIs, see Table 1).
Area of interest (AOI) coding.
The event onsets were manually determined for each item selection by using consistent criteria, such as detecting the first fixation while the participant physically stood in front of the shelf. All 10 items of the TOGSS were calculated for performance time; the sum was calculated as item-selection time for each participant. From these 10 items, four were selected for further analysis: can of corn, dish soap bottle, paprika bag, and spaghetti bag. These items appeared on supermarket shelves in all examination sessions and, therefore, were measured and analyzed for AOI fixations. Sixteen AOIs were assigned to each event based on the information collected when fixating on this area. To direct the selection, the brand symbol (AOI #7) would be informative to the subject, but the price (AOI #8) would be more valuable to making a decision.
Because each participant had his or her own baseline eye-gaze performance, some data are presented herein as percentages. In addition, we used fixation-by-fixation analysis to determine screening patterns of the grocery list After the fixations were coded to each AOI, a MATLAB code was executed to exclude fixations below the cutoff of 100 ms.
Statistical analysis
Demographic characteristics and descriptive information for the whole experiment were described and compared between the groups using chi-square and Mann-Whitney U tests due to the small sample size. For TOGSS outcomes, group differences were assessed using Mann-Whitney. Spearman's tests were applied to explore correlations between the TOGSS outcomes for each group and correlations between TOGSS and eye-gaze outcomes. For eye-gaze patterns of grocery-list screening and for the four item-choice events, group differences were assessed using Mann-Whitney tests. Differences among the eye-tracker indices within each group for the four events were tested using Friedman tests. All statistical analyses were carried out using SPSS software (version 25) with statistical significance set at p ≤ .05.
Results
Twenty-two individuals with SMI were recruited and signed informed consent to participate in the study. However, 12 did not carry out the supermarket task due to technical reasons, difficulty setting a meeting, and reluctance during the actual meeting date. Therefore, 10 participants with SMI completed the entire set of tests (50% female; age M = 32.5 years, SD = 5.93). Thirty-two participants of matched age and gender (53% female; age M = 32.06 years, SD = 6.51) were recruited to the control group (Figure 2).

Flow diagram of participants.
Demographics and sample characteristics
Descriptive data (means, standard deviations, and range) for demographic variables are reported in Table 2. There were no significant differences in age or gender. The groups differed significantly only in education, with the SMI group having fewer years of education. No group differences were found for eye gaze in either the amount of excluded fixations or the percentage of total fixations throughout the experiment.
Descriptive statistics by group.
Note. SMI = Severe Mental Illness; TOGSS = Test of Grocery Shopping Skills. Efficiency is represented as the time and redundancy. *p < 0.05, **p < 0.01.
Performance data
The TOGSS outcomes indicating performance efficiency, time, and redundancy were significantly higher in the research group than in the matched control group (p < .01). The SMI group spent more time performing the task and entered more aisles than required (Table 2). As expected, no correlation was found between the TOGSS efficiency outcomes and accuracy. The duration for item detection (time) was found to be negatively correlated with TOGSS accuracy (r = –.77, p < .01) in the SMI group, whereas in the control group, it was positively correlated with redundancy (r = .68, p < .01; see Table 3).
Correlations between test of grocery shopping skills (TOGSS) sub-outcomes and eye-gaze data in severe mental illness (SMI) and control groups.
Note. Values above the diagonal represent correlations among the SMI group (n = 10), whereas values below the diagonal represent correlation coefficients among the control group (n = 32). *p ≤ .05, **p < .01.
Eye-tracking measurements of grocery-list screening
Of the 42 participants, 36 screened the list in a way that enabled its coding for further analysis (8/10 in the SMI group, and 28/32 in the control group). There were no between-group differences in mean fixation duration (which we assumed to be nonsignificant due to the low sample size). However, the eye-tracking measures indicated differences in grocery-list scanning patterns. Most participants in the SMI group scanned the list before they started the task (only one scanned it shortly after task initiation). In the control group, 17 participants scanned the list before starting the actual task, and nine upon starting. Rescanning the list at the end of the task was detected for 20% of the SMI group and for about 60% of the control group. Types of scanning strategies noted were (a) full and wide range scan, (b) F-shaped pattern, (c) spotted pattern, and (d) partial list scanning (examples in Figure 3). The SMI group participants tended to gather less information in the planning phase using mainly a spotted or partial pattern.

Heat maps and fixation representation from the BeGaze software to list screening. Each heat map shows a type of scanning strategy in a range of fixation average duration in this order: (a) full and wide range scan (range: 100–1600 ms). (b) F-shaped pattern (range: 20–420 ms); note that the F is flipped because, in Hebrew, the direction of writing is from right to left. (c) spotted pattern (range: 100–1000 ms). (d) partial scanning of the list (range: 20–400 ms),
Eye-tracking measures relative to TOGSS outcomes
A significant correlation was found between the total number of fixations throughout the task and TOGSS efficiency in the control group (time: r = .67, p < .01; redundancy: r = .37, p < .05), suggesting a negative relationship between performance efficiency and fixation production (since increased shopping time and redundancy translate into lower efficiency scores). The SMI group also showed positive correlations between total number of fixations and total TOGSS time (r = . 67, p < .05) and item-selection time (r = .85, p < .01) but not for detecting the items.
We further analyzed fixation duration relative to information sources during the four item-selection tasks. The number of fixations for each event was separated into written and general visual information and converted into percentages from the total number of fixations. All item-selection tasks showed significant between-group differences for the percentage of fixations on written information relevant for selection of the correct item (Table 4).
Percentages of fixations towards written information relevant to task.
Note. Areas of written information relevant to selection that requires higher information processing (e.g. brand logo). *p ≤ 0.05, **p < 0.01.
Discussion
This study aimed to describe eye-gaze patterns relative to a naturalistic task for an SMI sample and a matched control group. We combined two research methods based on two frames of knowledge. Here we discuss the findings in a graded manner, starting with results that validate findings of prior studies and continuing to the new insights this study provides.
The TOGSS outcome is the basis upon which the other variables build. As expected, no significant correlation was found for TOGSS accuracy or efficiency. Consistent with previous studies,2,13 we hypothesized that no correlation would be found between accuracy and efficiency. As in previous studies, we found between-group differences for the TOGSS efficiency outcomes13,26 but none in accuracy and no correlation between the accuracy and efficiency outcomes. As such, our findings validate the use of the TOGSS as it was translated and adapted to the Israeli supermarket and SMI population.
After assuring the fit of the TOGSS and its representation, we dived deeper using the eye-tracker data to explain possible underlying EF mechanisms needed for task performance, especially the EF planning and monitoring components. Here we focus on the planning component, as reflected by scanning the grocery list, since the eye tracker enabled us to accurately observe it through the participants’ eyes. Unlike other studies that observed text scanning, here the text was not the main task; instead, it was an assistive tool for successful shopping. Planning using the list was much more common than was remonitoring. As also described in previous studies,4,5 the control group conducted more self-monitoring (revisiting the list) than did the SMI group. We found the scanning-strategy types to be similar to web-scanning patterns that Palani et al. 27 described and classified by shape. Although in our study we did not analyze fixation duration for this part, heat maps suggested there are also attention-allocation patterns. Future studies might build on this information for deeper understanding of the phenomenon.
After planning the task by gathering and organizing information with the list, participants sought out the items in the supermarket according to the instructions. We split the time of fulfilling the task according to how it was spent during the task. This new presentation of the data allowed us to link outcomes to the exact performance type. For example, we noticed that the SMI group spent more time trying to detect the items. The correlation between time and redundancy may be explained by the observation that these participants spent longer time durations searching along paths they had already taken. This redundancy might be explained by the lack of prior planning or working memory during the task. 2 Moreover, the correlation between item-detection time and accuracy for the SMI group suggests that spending time dwelling on item choices decreased correct performance scores. Redundantly wandering the supermarket and spending more time detecting items than gathering and processing relevant information may have decreased accuracy and performance, as Rempfer et al. 14 noted.
We move from discussing observed components from the examiner's viewpoint and analysis of the eye-tracker device's recorded output to observations from the participants’ viewpoints, analyzing the eye gaze produced throughout the task. Unlike prior studies, the environment in our study included dynamic and noisy characteristics. 28 Therefore, we chose to focus on simple methods and more stable indices and situations.
For a broad view of task performance, we address the fixation duration since it is not affected by the total time of task performance, as is the number of fixations. For many years, studies have related fixation duration to cognitive and information processing. Longer fixation was considered to indicate not only heightened interest in the stimuli but also greater difficulty in processing visual information.29,30 Because of these inconsistent interpretations about what a fixed period of time means, it is not surprising that we found no significant between-group differences in the average fixation duration of each condition. That is, both task difficulties for the SMI group and higher attention for the control group can be present during the performance phase and affect average fixation duration. However, the SMI group exhibited longer fixation-duration averages for some item-selection tasks, suggesting these items required more complex selection abilities.
Based on the finding that time duration for the four item-selection events did not differ between the groups, we examined the number of fixations. We found them to be consistently different between groups in percentage of fixations to relevant written information, with the SMI group producing more fixations. This finding may coincide with the difference in planning before and during the task, suggesting that SMI individuals carried out the selection process mainly while standing in front of the shelf, whereas the control group arrived more prepared to carry out the task at hand (Figure 4). Furthermore, the number of fixations is considered to be negatively correlated with search efficiency,24,31 consistent with the TOGSS efficiency outcomes presented earlier.

Heat maps and fixation representation from the BeGaze software to corn selection. All heat maps show the corn selection with their duration range of fixation average duration (in ms) in this order: (a) Subject 108, severe mental illness (SMI) group (range: 200–2000 ms). (b) Subject 8, control group (range: 200–2000 ms). (c) Subject 110, SMI group (range: 100–1700 ms). (d) Subject 10, control group (range: 100–1387 ms).
Study limitations
Despite the many advantages of real-world assessment, stepping out of the laboratory might increase the likelihood of confounding variables. Changes in surrounding brightness, simultaneous mental processes, and environmental changes, such as item shortages or customer density, may affect the obtained eye-gaze measurements. Second, recruiting the SMI group sample was rather difficult; more than half of the individuals who signed informed consent forms discontinued the study. Most did not want to take part because the study required them to perform a task in an outside, unfamiliar environment. This in itself is an interesting finding that should be studied in future research. However, for the current study, it meant the SMI sample represented only a subsample and results cannot be fully generalized. Finally, due to COVID-19 restrictions, we were required to cease data collection before all expected participants completed the tasks.
Conclusions
This study attempted to step out of the laboratory and carry out an eye-tracking protocol for a performance-based task in a naturalistic environment. We describe eye gaze for the task components and strengthen previous findings on the TOGSS efficiency outcomes that were the basis for the process. It would be helpful to use these preliminary results in future research to show trends in scanning patterns with a larger sample size. Moreover, other tasks should be considered in future research, especially those tasks in which the SMI population would be likely to participate. We address the sequence of planning, task performance, and task monitoring—information that is captured in the eye-tracking record and can be robustly quantified. We find this method useful and recommend it in the future for other processes and analysis of functional behaviors.
Footnotes
Author contributions
S.R. and N.J. conceptualized the study, and S.R., N.J., and A.M. developed the methodology. S.R. recruited the participants and carried out the study, wrote the initial manuscript draft, and analyzed the data with the assistance of N.J. and A.M. All authors read and agreed upon the manuscript's final version.
Data availability statement
The research findings presented in this study are included in the article and supplementary materials, and further enquiries can be directed to the author.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by The National Insurance Institute of Israel.
All data is available upon request.
Author biographies
Sivan Regev research topics includes the evaluation of Executive Functions and strategy use in people with Severe Mental Illnesses (SMI), in the domains of Daily Living. In addition, Sivan consults to the Israeli Ministry of Health committee for mental health services in the community. She is involved in a mental health initiative, “Nefashot”, which promotes events in the cultural arena addressing and promoting mental health issues.
Naomi Josman is Professor Emerita of Occupational Therapy in the Department of Occupational Therapy, the Faculty of Social Welfare & Health Sciences at the University of Haifa, Israel. She is currently an academic advisor for Horizon-Europe, Research authority. She previously served as the associate faculty dean for research and former Director of the department's Ph.D. program. Dr. Josman is an internationally recognized leader, scholar and educator in the area of cognitive rehabilitation. Her research investigates cognition, metacognition, executive functions and their influence on everyday life. Her work is based on an ecologically valid assessment of cognitive disabilities to performance-based assessments, utilizing innovative methods and tools, inter alia Virtual Reality, for evaluation and intervention. Her research activities extend to the study of a wide range of populations, including: people with Central Nervous System deficits; Individuals diagnosed with Schizophrenia; children with neurodevelopmental deficits; adults with neurological dysfunction. She has authored over 100 research articles and 14 chapters in edited books.
Avi Mendelsohn obtained his PhD from the Neurobiology Department at the Weizmann Institute of Science in Israel, where he concentrated on the neural mechanisms involved in episodic memory consolidation and retrieval. After completing a post-doctoral fellowship at Mount Sinai Hospital in New York City, he founded a laboratory at the Sagol Department of Neurobiology at the University of Haifa in Israel. His team focuses on investigating the factors that facilitate the development of long-term declarative memories, utilizing behavioral, physiological, and neuroimaging techniques. They place particular emphasis on creating naturalistic experimental paradigms that improve ecological validity and shed light on memory formation processes related to real-life situations.
