Sage Journals: Discover world-class research

Abstract

The high data density on electronic medical record screens is touted as a major usability issue. However, it may not be a problem if the data is relevant and well-organized. Our objective was to test this assumption using a comprehensive set of measures that assess the three pillars of usability: efficiency (both physical and cognitive), effectiveness, and satisfaction. Physicians were asked to go through a series of tasks using two versions of the same electronic medical record: one where all the display items were separated into tabs (the original display), and one where important display items were grouped logically in one tab (the redesigned display). Results supported the hypothesis that combining relevant data in organized fashion into a smaller location would improve usability. The findings highlight the role of good display organization to mitigate the effects of high data density, as well as the importance of assessing cognitive load as part of usability studies.

Keywords

cognitive load electronic medical records eye tracking GOMS usability

Introduction and motivation

In general, the introduction and adoption of electronic medical records (EMRs) has been touted as a means of providing better patient medical care.¹ However, the reality is that EMRs have solved only some problems, such as providing remote access and medication management.^2,3 At the same time, EMRs have introduced a host of other novel problems.^4,5

The root of many of the problems encountered, whether related to frustration, medical errors, or delays, can be traced to the usability of the EMRs.⁶ Usability is defined by the International Organization for Standardization (ISO) as “the effectiveness, efficiency and satisfaction with which specified users achieve specified goals in particular environments.”⁷ Good EMR usability can improve the quality of patient care and enhance hospital operations.⁸ Poor EMR usability, however, can prevent medical personnel from carrying out their tasks in a timely and correct fashion.^9–11

The problem of data overload

EMR usability straddles a number of different issues, including color variation, readability, preservation of context, and others.¹² In particular, one of the major problems within the broad umbrella of EMR usability is that of data density—often referred to as clutter or data overload.¹³ This refers to the amount of data within a certain screen area.^14,15

In the medical domain, data overload has figured most prominently in the EMRs used by physicians.^4,10,16–19 EMR data overload can take on many forms, such as a high density of alerts¹⁸ or simply a large amount of poorly organized, irrelevant medical information.^17,20,21 Data overload has been shown to prevent physicians from quickly and accurately extracting EMR information, which can compromise both efficiency and safety in the hospital.^18,22,23

However, combining data into one common location does have its advantages. It can greatly help physicians find the information they are looking for and improve system navigation.¹² Poor navigation was one of the four major EMR usability issues identified by Clarke et al.²⁴ A recent review about the problems with EMR navigation highlighted how inefficient navigation and a large amount of switching between screens can lead to errors and user fatigue.²⁵ Moreover, data spread over fewer areas allow physicians to obtain the big picture of the patient they are supposed to be diagnosing.²⁶ It is clear that higher data density has its own advantages and disadvantages, and the debate is then whether higher data density would lead to better or worse usability in the context of EMRs.

The key factors in answering this question are organization and task relevance.²⁷ In their literature review of the topic, Moacdieh and Sarter²⁸ defined clutter as “performance and attentional costs that result from the interaction between high data density, poor display organization, and abundance of irrelevant information.” Despite the worry that high data density will lead to less EMR usability, it is possible that good organization and task relevance might play a critical role in mitigating these effects. There is a need for more EMR usability evaluation studies that target this issue in a systematic and controlled fashion.

Evaluating EMR usability

Frequently used usability evaluation methods in the literature include interviews,^29,30 surveys,³¹ observations,³² heuristic walkthroughs,³³ and different types of analyses such as task analysis, user analysis, representational analysis, and functional analysis.³⁴

However, it is important to supplement the subjective data obtained from such techniques with more objective measures that are not dependent on the opinions of medical personnel, as highlighted in a recent review of EMR usability.³⁵ In addition, it is important to assess the different pillars of usability: efficiency, effectiveness, and satisfaction.¹² Otherwise, the impression of usability that is obtained would be incomplete.

Efficiency

Efficiency is related to the amount of resources that must be expended to complete a task. In general, efficiency could have both an overt, physical component related to the speed needed to finish a task and a mental or cognitive component related to the mental resources needed for a task. From an overt physical perspective, measures that can be used include response time, key presses, mouse clicks, scrolling activity, screen visits, and back button uses.¹² Response time is the prevalent performance measure when it comes to evaluating the amount of data on a screen and how that relates to usability.^36,37

From a cognitive perspective, the mental resource that a person must devote to complete a certain task within a certain time frame is referred to as cognitive load.³⁸ A complex construct, cognitive load is mainly linked to limited memory and the limited attentional resources that people have.^39,40 Given the importance of this construct, there are several approaches that have been used to measure cognitive load.

One approach makes use of a well-established questionnaire for mental workload, the NASA-Task Load Index,⁴¹ which asks respondents to rate their impression of cognitive load on several scales. Another less-used approach is the Keystroke Level Model (KLM), together with GOMS (goals, operators, methods, and selection).⁴² KLM predicts the time it will take for a user to carry out a certain computer task and has been heavily used in the human–computer interaction literature as a means of determining response time.⁴³ GOMS, however, is a way to perform cognitive task analysis (CTA), or breaking down all the mental and physical tasks of a certain interface interaction into its constituent parts.⁴⁴ Once GOMS has been performed, it then becomes possible to apply KLM to determine how much time each task needs. KLM includes one mental operator that represents the mental workload that a user needs to go through. By separating this mental operator from the other physical ones, one can get an impression of the percentage of time that the user will spend doing mental processing.⁴⁵ This would then be another measure of cognitive load.

Finally, yet another approach is the use of eye-tracking measures. Eye tracking is a non-invasive, infrared-based technique that is used to trace where a person is looking on a screen.⁴⁶ The eye-tracking coordinates are used to determine eye fixations, or when a person looks at something for a minimum period of time,⁴⁷ and saccades, which are rapid jumps between fixations.⁴⁸ Fixations and saccades then form the building blocks for several eye-tracking metrics that can be used to assess cognitive load, among other constructs.

The most frequently used eye-tracking metrics for workload evaluation are pupil size or diameter^49,50 and eye blink frequency.⁵¹ These measures have shown to be significantly affected by high workload; however, these measures are also extremely sensitive to other factors, such as the amount of light.⁵² Alternatives to pupil size include mean fixation duration and mean saccade length,⁵³ as well as the nearest neighbor index (NNI),⁵⁴ which can be used to determine how spread out or concentrated fixations are (see Table 1). NNI has been successfully used in airplane pilot studies and has also shown to be sensitive to workload changes in a medical scenario.²⁸ In addition, convex hull area⁵⁸ can also be used in order to see whether higher workload causes users fixations to be more dispersed. A higher value of all the mentioned metrics would indicate higher cognitive load.

Table 1.

Eye-tracking metrics that can be used to assess cognitive load.

Name	Description
Convex hull area (pixels²)⁵⁵	Minimum convex area which contains the fixation points. A larger convex hull area indicates more spread of gaze points and larger cognitive load as the user attempts to sample all the information available within the display.⁵⁴ However, note that there are opposing views related to workload and the visual field of view.⁵⁶
Nearest Neighbor Index54	The ratio between (1) the average of the observed minimum distance between points and (2) the mean random distance expected if the distribution were random. An NNI larger than 1 indicates high randomness and spread of gaze points, whereas values lower than 1 indicate grouping and narrowing of attention. Similar to convex hull area, a larger NNI would indicate larger cognitive load.
Mean saccade length (pixels)	Average of all saccades lengths. Higher mean saccade length indicates less efficiency in scanning and points to higher cognitive load.⁵⁶
Mean fixation duration	Mean duration of all fixations within a defined period. A higher mean fixation duration can indicate more difficulty processing and discriminating information, and points to higher cognitive load.⁵⁷

NNI: nearest neighbor index.

Effectiveness

While efficiency is related to speed, effectiveness is related to accuracy and correctness.¹² The most obvious measure of effectiveness is error rate. More complex and detailed measures of effectiveness do exist, however, such as Failure Modes and Effects Analysis (FMEA), where errors and risks at each stage of a product’s use are evaluated.⁵⁹

Satisfaction

Surveys and questionnaires are the most commonly used approaches when it comes to evaluating physicians’ satisfaction. For example, a survey was used to determine whether practitioners thought there were too many medical alerts in the EMR.¹⁸ In other cases, interviews with physicians were carried out to obtain their insight on data overload.¹⁹

The present study

Despite the body of work on EMR usability so far, there are gaps when it comes to assessing data density. As seen from the recent literature, high data density is discouraged given the risk of clutter or data overload, although, for medical personnel, high data density might help solve issues of poor navigation and assimilation of patient information. The overall research question in this study was whether higher EMR data density would necessarily lead to poorer EMR usability. Our hypothesis was that adding more information into one display space would not inevitably lead to poorer usability if the information is task-relevant and well-organized.

To test this hypothesis, a controlled experiment was conducted where the usability of the two EMR designs was assessed. In one version, each set of relevant items was placed in a separate tab (Display 1: low data density), whereas in the second version, all display elements were combined into one page (Display 2: high data density). This approach of using two versions of the exact same EMR is a unique feature of this study. A review of the literature on EMR usability highlighted the paucity of papers that have used this approach.³⁵ One notable exception is the work of Ahmed et al.,¹⁶ where two EMR versions were compared that purportedly have different levels of data; however, the EMR versions were significantly different in various aspects, making it difficult to attribute any benefits specifically to the quantity or distribution of data.

Moreover, in order to assess the usability of the two versions, we comprehensively assessed each of the different dimensions of usability, another approach that is not commonly encountered in the literature. The measures that we used in this experiment were the following:

Efficiency:

○ Physical: response time.

○ Cognitive: NASA-TLX, KLM-GOMS, and the eye-tracking metrics of Table 1.

Effectiveness: error rate.

Satisfaction: user preference.

Methods

Participants

The participants were 13 residents (6 males and 7 females) from the Family Medicine (FM) Department at the American University of Beirut (AUB) medical center. Their average age was 29 years (standard deviation (SD) = 2.68) with an average of 2.46 (SD = 1.00) years of experience in the FM Department. Participants had self-reported normal or corrected to normal vision. The eye-tracking data of one participant was not collected due to problems with the eye tracker software, resulting in a count of 12 residents for the eye-tracking results. Participants’ ratings of their proficiency with the EMR had a mean of 4.0 (SD = 0.39) on a scale of 1 (poor) to 5 (excellent), confirming that they were all proficient with the EMR.

Participants signed an informed consent document at the beginning of the experiment. As an incentive, participants were awarded a $25 gift card to a restaurant. This study was approved by the AUB Institutional Review Board.

EMR versions

Two different versions of the same EMR were used for Display 1 (data separated into tabs) and Display 2 (data combined into one location). The original (current) EMR being used in the FM Department was used for Display 1. This EMR contains several functions, including writing notes using the “Progress Notes” tab and reviewing patients’ medical history using the ROS tab. Each different function is separated into a dedicated tab. The complete set of tabs can be found in Table 2.

Table 2.

EMR tabs and their uses.

Tab	Usage
Front Desk	Open patient encounter for a new visit.
Triage	Enter current patient’s vitals, enter diagnosis, and check if he or she requires special services.
Preventive Services	Review all patient’s vitals, check his or her vaccination reports, and check if he or she requires any special services.
ROS	Review and update patient disease history.
Progress Notes	Review and enter notes related to the patient, check if the patient requires special services, check his or her current vitals, and enter diagnoses.
Dx	Review and update the diagnosis of the patient.
Referrals	Request referral to specialty physicians outside the FM Department.
X-Ray	Request X-rays images.
Lab	Request lab tests.
WebLab	Check the results of previous lab tests.
Sick Leaves	Give official sick leaves for hospital personnel.
Rx	Renew medications.
Med Hx	Review previous medications.
ER visits	Check and review the patient’s visits to the emergency room.

EMR: electronic medical record; FM: Family Medicine.

The EMR used for Display 2 was labeled the redesigned EMR. This EMR combines the data from the ROS, Progress Notes, and Preventive Services tabs into one tab, labeled the Physician tab. In an earlier preliminary study, these three tabs were found to be the ones that were (1) most commonly used by physicians and (2) required searching for certain information (as opposed to requesting services). The nature of the actual display elements (i.e. text boxes, drop down menus, etc.) was not changed between Display 1 and Display 2 (see Figure 1). However, the EMR elements in Display 2 were logically grouped and organized, following the principles of proximity compatibility.⁶⁰ Both EMRs were populated with real patient data that had been completely de-identified. The names of the patients were fictitious.

Figure 1.

(a) Current EMR display (Display 1) showing the Progress Notes tab and (b) redesigned EMR display (Display 2).

Experiment setup

This study used a Tobii T120 eye tracker, which is embedded in a monitor of size 17 in (resolution: 1280 × 1024 pixels). Participants were seated at a distance of 60 cm from the eye tracker.

Experiment design

The independent variable in this study was the type of display (original or Display 1 vs redesigned or Display 2), which was varied within subjects. Participants were asked to carry out four types of tasks on each of the EMR displays. These tasks were selected to be representative of those that physicians perform using their EMR, based on earlier interviews and observations conducted with the FM Department physicians. The tasks were presented to participants in sets of four that revolved around one fictitious patient, who was portrayed as presenting to the physician with a certain complaint. This set of four tasks was always connected to the same patient and was labeled a trial. All of the tasks involved the search for a given target item that is related to that patient’s diagnosis or treatment. An example of each of the four tasks can be found in Table 3. Each task took a maximum of 2 min, and participants were always given some background information about the patient before being given their task. The type of task was treated as a blocking variable:

Task 1. Review patient disease history: physicians used the ROS tab to review a patient’s disease history and identify whether or not the patient had a given disease/condition within a particular time frame.

Task 2. Read previous notes: physicians used the Progress Notes tab to go through the patient’s physician notes and find certain information.

Task 3. Check preventive services: physicians used the preventive services tab to check the blood pressure, weight, and height of the patients.

Task 4. Check vaccination: physicians used the preventive services tab to check whether a particular vaccine was given to the patients.

Table 3.

Tasks for trial 1.

Task 1: Review patient disease history	Before you ask the patient to come into the room, you want to find his chronic diseases. List his problems.
Task 2: Read previous notes	Your patient is presenting for dysuria; when you started to prescribe an antibiotic; he reports that few months ago, around May-June, he had a similar episode and the doctor gave him medication that caused itching. He did not report it to the clinic. He is afraid to take it again. Please find the antibiotics he was prescribed.
Task 3: Check preventive services	The patient is worried about high blood pressure today. Check his previous readings in the clinic. Give examples of previous blood pressure readings and their dates.
Task 4: Check vaccination	It is winter time and you want to give him flu vaccine and pneumococcal vaccine. He does not recall if he ever took them. Please check his vaccinations status.

There were six trials of each task (i.e. each task was repeated for six different patient scenarios), making for a total of 24 tasks for each experimental condition. In other words, there were 24 tasks (six trials) that were done using the current EMR display (Display 1) and the same 24 tasks were done using the redesigned display (Display 2). Four different fictitious patients were created for the six trials, meaning that three of the trials shared the same patient. Each trial was always associated with the same patient for both displays. The order of presentation of the trials was counterbalanced across participants using a Latin square approach. Participants first did all of the trials for one EMR design before moving on to the next set of trials on the other EMR design; the order with which participants used the displays was also counterbalanced.

The dependent measures in this study were response time, error rate recorded while performing the different tasks, the various eye-tracking metrics listed in Table 1, NASA-TLX measures, KLM-GOMS measures, and subjective preferences. Participants were asked to think aloud and give their answers orally during the experiment. The time between the start of a task and when participants gave their complete answer was manually measured as the response time. The raw eye-tracking data were obtained through the Tobii Studios eye-tracking system. The eye-tracking metrics of Table 1 were then calculated using MATLAB.

Experiment procedure

When the participants arrived to the lab to perform the experiment, they were first given an overview of the experiment and asked to read and sign the consent form. Next, participants were shown the redesigned EMR page with all the locations of the features and allowed to practice using it. They were then given the instructions for the experiment and told what the experiment tasks would entail. Each participant was next asked to do a practice scenario with both the original and new EMR pages to become familiar with the experiment process. After this step, the eye tracker was set up and calibrated using a nine-point grid. This first part of the experiment took around 10 minutes.

Participants then logged in to the EMR using their credentials and carried out 24 tasks that corresponded to one of the two displays. When all the tasks on that one display were done, participants were given a 5-minute break and then continued the experiment with the other display. At the end of the experiment, participants were given a questionnaire in which they were asked to provide NASA-TLX ratings of workload for each EMR display, as well as their preference for the version of the EMR.

Results

Efficiency

Both physical and cognitive efficiencies were assessed in this study. For physical efficiency, response time was calculated. We elected not to measure mouse clicks as the redesigned display was deliberately designed to have fewer mouse clicks. Response time data was averaged across the six trials for each participant and each task. Response time started after participants had read their tasks, so it was not affected by the length of the task description. Response time was analyzed using a repeated-measures analysis of variance (ANOVA). The assumptions for the ANOVA procedure were tested for each response measure and each task. There were some measures where the normality assumption did not hold, as evidenced using a Shapiro–Wilk test (p < 0.05) and a normality plot. The data of these measures were transformed using either the inverse or logarithmic transform.

The results of the ANOVA analysis can be seen in Table 4. For Tasks 1, 2, and 3, there was no significant difference in response time between the original and redesigned displays. For Task 1, however, the redesigned display had significantly lower response time as compared to the original display.

Table 4.

Response time ANOVA results.

Response time (s)	Original display, mean (SD)	Redesigned display, mean (SD)	Effect of task level
Task 1	6.39 (1.69)	4.76 (2.24)	F(1, 12) = 5.546, p = 0.038, $η_{p}^{2} = 0.313$
Task 2^a	24.48 (5.51)	26.37 (10.65)	F(1, 12) = 0.292, not significant (p = 0.559), $η_{p}^{2} = 0.024$
Task 3^a	12.06 (3.94)	12.53 (7.65)	F(1, 12) = 0.923, not significant (p = 0.356), $η_{p}^{2} = 0.071$
Task 4^b	6.15 (1.96)	5.22 (1.82)	F(1, 12) = 2.598, not significant (p = 0.133), $η_{p}^{2} = 0.178$

ANOVA: analysis of variance; SD: standard deviation.

Logarithmic transform was applied.

Inverse transform was applied.

Cognitive efficiency was measured using a combination of subjective and objective approaches: NASA-TLX (subjective), KLM-GOMS (objective), and several eye-tracking metrics (objective). For NASA-TLX, only one dimension, mental workload, was used with a scale from 1 (very low workload) to 5 (very high workload). The results of the NASA-TLX ratings were analyzed using a Wilcoxon Exact sign test for ordinal data. Results showed that there was a significant decrease in the mental workload (Z = −2.04, .04, p = 0.041), from means of 3.3 (SD = 1.3) for the original EMR to 2.46 (SD = 0.877) for the redesigned EMR.

In addition, GOMS analysis that was based on a CTA showed that the percentage of mental workload for each of the tasks was lower, on average, for the redesigned display as opposed to the original display (see Table 5).

Table 5.

Percent mental load (compared to the total time) in each of the different tasks (based on KLM-GOMS analysis).

	Percent mental load (%)
	Original display, mean (SD)	Redesigned display, mean (SD)
Task 1	55.86	52.94
Task 2	57.39	55.03
Task 3	57.15	54.02
Task 4	57.45	54.36

SD: standard deviation.

The final measures of cognitive load are the eye tracking metrics of Table 1. As with the performance measures, the data was analyzed using a repeated-measures ANOVA with list-wise deletion. After averaging the trials for each task and each participant, the assumptions of the ANOVA procedure were checked. In the case of violations of normality, as evidenced using a Shapiro–Wilk test (p < 0.05) and a normality plot, the data was transformed using logarithmic or inverse functions. A summary of the eye tracking results across all the tasks can be seen in Table 6. Across all tasks, several metrics indicated a significant decrease in mental workload for the redesigned display as opposed to the original display.

Table 6.

Summary of eye-tracking results. In all the metrics, a higher value indicates a higher cognitive load.

Eye-tracking metrics	Original display, mean (SD)	Redesigned display, mean (SD)	Effect of task level
Task 1
Convex hull area (pixels²)^a	4.97 (0.25)	4.86 (0.3)	F(1, 11) = 1.644, not significant (p = 0.226), $η_{p}^{2} = 0.13$
NNI	0.3 (0.13)	0.24 (0.1)	F(1, 11) = 1.457, not significant (p = 0.253), $η_{p}^{2} = 0.117$
Mean saccade length (pixels)	63.34 (27.988)	61.76 (26.30)	F(1, 11) = 0.062, not significant (p = 0.807), $η_{p}^{2} = 0.006$
Mean fixation duration (ms)*	126.57 (68.73)	96.66 (37.87)	F(1, 11) = 0.447, p = 0.025, $η_{p}^{2} = 0.378$
Task 2
Convex hull area (pixels²)*	280,855.43 (47,459.28)	226,138.60 (66,561.86)	F(1, 11) = 5.266, p = 0.042, $η_{p}^{2} = 0.051$
NNI	0.83 (0.34)	0.76 (0.48)	F(1,11) = 0.027, not significant (p = 0.873), $η_{p}^{2} = 0.002$
Mean saccade length (pixels)*	58.37 (18.90)	50.75 (13.70)	F(1, 11) = 5.263, p = 0.042, $η_{p}^{2} = 0.324$
Mean fixation duration (ms)	114.80 (67.45)	104.64 (58.57)	F(1, 11) = 1.035, not significant (p = 0.331), $η_{p}^{2} = 0.086$
Task 3
Convex hull area (pixels²)	221,684.02 (40,994.41)	196,213.71 (45,196.38)	F(1, 11) = 1.672, not significant (p = 0.223), $η_{p}^{2} = 0.132$
NNI²	4.15 (1.42)	4.035 (1.41)	F(1, 11) = 0.063, not significant (p = 0.806), $η_{p}^{2} = 0.006$
Mean saccade length (pixels)*	70.86 (21.24)	63.17 (20.13)	F(1, 11) = 5.887, p = 0.034, $η_{p}^{2} = 0.349$
Mean fixation duration (ms)	129.48 (66.27)	107.91 (54.23)	F(1, 11) = 5.713, not significant (p = 0.906), $η_{p}^{2} = 0.342$
Task 4
Convex hull area (pixels²)*	204,511.69 (20,855.20)	183,652.55 (25,531.58)	F(1,11) = 5.1, p = 0.045, $η_{p}^{2} = 0.317$
NNI^b	4.78 (0.80)	4.45 (1.32)	F(1, 11) = 0.681, not significant (p = 0.427), $η_{p}^{2} = 0.058$
Mean saccade length (pixels)*	71.65 (21.96)	65.31 (17.63)	F(1,11) = 4.744, p = 0.049, $η_{p}^{2} = 0.301$
Mean fixation duration (ms)	105.59 (55.01)	104.28 (49.8)	F(1, 11) = 0.02, not significant (p = 0.889), $η_{p}^{2} = 0.002$

SD: standard deviation; NNI: nearest neighbor index.

Asterisks indicate significant differences.

Logarithmic transform was applied.

Inverse transform was applied.

Effectiveness

Effectiveness was assessed using the error rate. However, the error rate was zero across all trials, with all participants carrying out their tasks correctly.

Satisfaction

Participants were asked to point out which EMR display was easier to use. All except two participants indicated that the redesigned display was easier to work with. Of the remaining two, one indicated that the original display is easier and the other indicated that they were both the same. Table 7 summarizes the findings from all of the different metrics.

Table 7.

Summary of results.

Metrics	Original display (data separated)	Redesigned display (data combined)
	Efficiency
Response time	Similar
NASA-TLX		Lower workload ratings
KLM-GOMS		Lower percent mental load
Eye tracking		Lower values of the eye tracking metrics
	Effectiveness
Error rate	N/A
	Satisfaction
Preferences		Physicians’ preference

Discussion

The overall research question that we were trying to address in this research study was whether higher EMR data density would necessarily lead to poorer EMR usability, as is commonly assumed. Given that usability is defined in terms of efficiency (both physical and cognitive), effectiveness, and satisfaction, several subjective and objective metrics were calculated that fall under these brackets. Our hypothesis was that combining information into one display space would not inevitably lead to poorer usability if the information is task-relevant and well-organized.

The study provided interesting results. From a user satisfaction perspective, the participant subjective feedback showed a clear preference for the redesigned display. In terms of physical efficiency, the response time suggested that the two displays are largely similar. In terms of cognitive efficiency, however, all three of the NASA-TLX ratings, KLM-GOMS, and the eye tracking metrics indicated that the redesigned display led to lower cognitive load. This result was obtained with just one change—combining the same elements into one page.

In summary, physical efficiency indicated that the two displays were similar, whereas user satisfaction and cognitive efficiency showed a clear preference for the redesigned display. Comparisons with the literature are difficult given that few studies have made a systematic analysis of the kind done here. The most similar study is Ahmed et al.,¹⁶ where, in contrast to this study, they found that higher data density was associated with higher response time and higher cognitive load ratings. However, that study examined two different versions of the EMR.

In contrast, the results of this study lead to three main conclusions. The first, as hypothesized and reiterated, is that good display organization is crucial to mitigating the effects of high data density. The second is that usability needs to be evaluated by more than just effectiveness and the physical efficiency measures of response time. While undoubtedly useful, there is so much more to usability than these measures, as evidenced in this study. The focus has understandably been on medical errors in the past, but as EMR systems become better established, the focus might need to shift more to user satisfaction and cognitive load. These factors can eventually lead to user fatigue, dissatisfaction with the work environment, and frustration, all of which contribute to the usability of a display. Usability professionals would thus be encouraged to include user satisfaction and especially cognitive load or cognitive efficiency as part of their usability analyses.

Third, and following from the last point, usability professionals and researchers are also encouraged to diversify the metrics that they use to measure cognitive load in EMRs. NASA-TLX is very useful, but we showed here that KLM-GOMS is also a very beneficial tool. Moreover, to our knowledge, no other study has used KLM-GOMS to compare the cognitive load of two EMRs, although previously it has been successfully used to evaluate a single display.⁴⁵ KLM-GOMS also has the added advantage that it does not involve users and is just based on the CTA, which is typically carried out in any usability study anyways. In other words, from a purely step-by-step inspection of the processes, the redesigned display was found to require less mental processing, which could include steps such as comparing information or searching for an item. Eye-tracking metrics are also a valuable addition to any usability study. Although they require more setup and analysis than other measures, they more than make up for that in the detailed insight they provide about the display. Such insight allows designers to decide where to focus their efforts—if there is a discrimination problem, as evidenced by mean fixation duration, perhaps the font size should be bigger. If there is too much spread of gaze points, as evidenced by convex hull area, then better grouping of information needs to be applied.

So, not only do the metrics indicate whether there is high or low cognitive load, they also provide suggestions for improvement. In addition, the use of eye tracking for EMR usability evaluation could form the basis of an adaptive, intelligent EMR display, where information is updated in real time based on user needs, as identified using eye tracking. There are several other eye-tracking metrics that could be further explored for their use as an indication of EMR cognitive load in real time.

Similarly, for the eye tracking metrics, only one other study has evaluated these metrics on two EMR versions. Moacdieh and Sarter²⁸ also found significantly lower mean saccade length for the higher data EMR display, although there was a significant interaction effect with stress. However, they found significantly higher convex hull area, which is the opposite of what we found here. They also found significantly higher NNI. The fact that NNI was not significantly affected in this study can be attributed to the very low duration of most tasks, as this metric is a lot more involved than the others on this list⁵⁴ Crucially, the difference between the two studies is that the higher data display was not well-organized in Moacdieh and Sarter.²⁸ The medical history display used in that study has very little grouping of relevant information, and physicians typically have to serially scan the display in order to find their target. In this study, what lower convex hull area and mean saccade length indicate is that good organization can significantly decrease the spread of eye movements, helping users to focus their gaze on what is important. Significantly lower mean fixation duration also suggests that there was a discrimination issue in the case of the current EMR. Although that was not replicated in Moacdieh and Sarter,²⁸ it is consistent with low-workload conditions across the literature.^61,62

This study presented a unique approach to evaluating the usability of EMRs; however, there are some limitations. The tasks were relatively simple in order to allow for a high degree of control. The zero error rate, the low response time, and the low NASA-TLX ratings can be attributed to the simplicity of the tasks given to physicians. The simple tasks were selected in order to be able to have highly controlled and structured tasks, as was done in other studies.^16,28,63 In addition, simplicity was favored in order to reduce the variability between residents with different levels of experience in the AUB FM Department. The downside of this simplicity was the difficulty in truly assessing effectiveness. However, more complex tasks would have meant a lower degree of control over what physicians do with their display, particularly as physicians tend to have vastly differing—yet still correct—approaches to diagnosing patients. Also, physicians typically interact with the patient while using the EMR, which was not captured here. Future work will consider a wider variety of tasks and more complex tasks.

In conclusion, this study provided support for the notion that combining EMR information into a compact, smaller location that minimizes navigation to different tabs leads to a more usable display, if the combined items are relevant and well-organized. The evidence presented here shows that this approach leads to lower cognitive load and higher satisfaction, two important aspects of usability that should not be neglected. This research can help inform EMR designers in their creation of the next generation of EMRs, while the methods used in this study can provide a framework for EMR evaluation. The idea of examining the three aspects of usability using a combination of subjective and objective measures proved to be very beneficial and can help better structure future usability studies in the healthcare domain.

Footnotes

Author’s note

Maher Al Ghalayini was earlier affiliated with American University of Beirut, Lebanon.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the American University of Beirut University Research Board (URB).

ORCID iD

Maher Al Ghalayini

References

Hillestad

Bigelow

Bower

, et al. Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. Health Aff 2005; 24(5): 1103–1117.

James

JT.

A new, evidence-based estimate of patient harms associated with hospital care. J Patient Saf 2013; 9(3): 122–128.

Alagiakrishnan

Wilson

Sadowski

, et al. Physicians’ use of computerized clinical decision supports to improve medication management in the elderly–the seniors medication alert and review technology intervention. Clin Interv Aging 2016; 11: 73.

Karsh

B-T

Weinger

Abbott

, et al. Health information technology: fallacies and sober realities. J Am Med Inform Assoc 2010; 17(6): 617–623.

Holden

RJ.

Cognitive performance-altering effects of electronic medical records: an application of the human factors paradigm for patient safety. Cogn Technol Work 2011; 13(1): 11–29.

King

Patel

Jamoom

, et al. Clinical benefits of electronic health record use: national findings. Health Serv Res 2014; 49(1 Pt 2): 392–404.

ISO 9241-161:2016. Ergonomics of human-system interaction – Part 161: Guidance on visual user-interface elements.

Middleton

Bloomrosen

Dente

, et al. Enhancing patient safety and quality of care by improving the usability of electronic health record systems: recommendations from AMIA. J Am Med Inform Assoc 2013; 20(e1): e2–e8.

Clarke

Belden

Kim

. What learnability issues do primary care physicians experience when using CPOE? In: Proceedings of the international conference on human-computer interaction, Los Angeles, CA, 2–7 August 2015. New York: Springer, pp. 373–383.

10.

Farley

Baumlin

Hamedani

, et al. Quality and safety implications of emergency department information systems. Ann Emerg Med 2013; 62(4): 399–407.

11.

Tasa

Ozcan

Yantac

, et al. A case study on better iconographic design in electronic medical records’ user interface. Inform Health Soc Care 2008; 33(2): 125–138.

12.

Belden

Grayson

Barnes

Defining and testing EMR usability: principles and proposed methods of EMR usability evaluation and rating. Chicago, IL: Healthcare Information and Management Systems Society (HIMSS), 2009.

13.

Hall

Walton

Information overload within the health care system: a literature review. Health Info Libr J 2004; 21(2): 102–108.

14.

Kroft

Wickens

CD.

Displaying multi-domain graphical databases: an evaluation of scanning clutter, display size, and user activity. Inform Design J 2002; 11: 44–52.

15.

Mack

Oliva

. Computational estimation of visual complexity. In: Proceedings of the 12th annual object, perception, attention, and memory conference, Minneapolis, MN: ACM, 18 November 2004.

16.

Ahmed

Chandra

Herasevich

, et al. The effect of two different electronic health record user interfaces on intensive care provider task load, errors of cognition, and performance. Crit Care Med 2011; 39(7): 1626–1634.

17.

Hammond

Efthimiadis

Laundry

. Efficient de-identification of electronic patient records for user cognitive testing. In: Proceedings of the 45th Hawaii international conference on system science (HICSS), Maui, HI, 4–7 January 2012, pp. 2771–2778. New York: IEEE.

18.

Singh

Spitzmueller

Petersen

, et al. Information overload and missed test results in electronic health record–based settings. JAMA Intern Med 2013; 173(8): 702–704.

19.

Van Vleck

Stein

Stetson

, et al. Assessing data relevance for automated generation of a clinical summary. AMIA Annu Symp Proc 2007; 2007: 761–765.

20.

Bobillo

Delgado

Gómez-Romero

Representation of context-dependant knowledge in ontologies: a model and an application. Expert Syst Appl 2008; 35: 1899–1908.

21.

Zhang

Pakhomov

McInnes

, et al. Evaluating measures of redundancy in clinical texts. AMIA Annu Symp Proc 2011; 2011: 1612–1620.

22.

Moacdieh

Sarter

NB.

Eye tracking metrics: a toolbox for assessing the effects of clutter on attention allocation. Proc Human Fact Ergon Soc 2012; 56: 1366–1370.

23.

Zhu

Cao

, et al. Influence of information overload on operator’s user experience of human–machine interface in LED manufacturing systems. Cognition, Technology & Work 2016; 18: 161–173.

24.

Clarke

Belden

Koopman

, et al. Information needs and information-seeking behaviour analysis of primary care physicians and nurses: a literature review. Health Info Libr J 2013; 30(3): 178–190.

25.

Roman

Ancker

Johnson

, et al. Navigation in the electronic health record: a review of the safety and usability literature. J Biomed Inform 2017; 67: 69–79.

26.

Rose

Schnipper

Park

, et al. Using qualitative studies to improve the usability of an EMR. J Biomed Inform 2005; 38(1): 51–60.

27.

Doyon-Poulin

Robert

Ouellette

. Review of visual clutter and its effects on pilot performance: a new look at past research. In: Proceedings of the IEEE/AIAA 31st digital avionics systems conference (DASC), Williamsburg, VA, 14–18 October 2012. New York: IEEE.

28.

Moacdieh

Sarter

Clutter in electronic medical records: examining its performance and attentional costs using eye tracking. Hum Factors 2015; 57(4): 591–606.

29.

Schumacher

Berkowitz

Abramson

, et al. Electronic health records: physician’s perspective on usability. Proc Human Fact Ergon Soc 2010; 54: 816–820.

30.

Walji

Kalenderian

Piotrowski

, et al. Are three methods better than one? A comparative assessment of usability evaluation methods in an EHR. Int J Med Inform 2014; 83(5): 361–367.

31.

Hollin

Griffin

Kachnowski

How will we know if it’s working? A multi-faceted approach to measuring usability of a specialty-specific electronic medical record. Health Informatics J 2012; 18(3): 219–232.

32.

Horsky

McColgan

Pang

, et al. Complementary methods of system usability evaluation: surveys and observations during software design and development cycles. J Biomed Inform 2010; 43(5): 782–790.

33.

Edwards

Moloney

Jacko

, et al. Evaluating usability of a commercial electronic health record: a case study. Int J Hum Comput Stud 2008; 66: 718–728.

34.

Harrington

Kennerly

Johnson

Safety issues related to the electronic medical record (EMR): synthesis of the literature from the last decade, 2000-2009. J Healthc Manag 2011; 56(1): 31–43.

35.

Zahabi

Kaber

Swangnetr

Usability and safety in electronic medical records interface design: a review of recent literature and guideline formulation. Hum Factors 2015; 57(5): 805–834.

36.

Grahame

Laberge

Scialfa

CT.

Age differences in search of web pages: the effects of link size, link number, and clutter. Hum Factors 2004; 46(3): 385–398.

37.

Chen

Barnes

MJ.

Supervisory control of multiple robots: effects of imperfect automation and individual differences. Hum Factors 2012; 54(2): 157–174.

38.

Oviatt

. Human-centered design meets cognitive load theory: designing interfaces that help people think. In: Proceedings of the 14th ACM international conference on multimedia, Santa Barbara, CA, 23–27 October 2006, pp. 871–880. New York: ACM.

39.

Baddeley

Working memory: looking back and looking forward. Nat Rev Neurosci 2003; 4(10): 829.

40.

Vergauwe

Barrouillet

Camos

Do mental processes share a domain-general resource. Psychol Sci 2010; 21(3): 384–390.

41.

Hart

Staveland

LE.

Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. Adv Psychol 1988; 52: 139–183.

42.

Card

Moran

Newell

The keystroke-level model for user performance time with interactive systems. Commun ACM 1980; 23: 396–410.

43.

John

Prevas

Salvucci

, et al. Predictive human performance modeling made easy. In: Proceedings of the SIGCHI conference on human factors in computing systems, Vienna, Austria, 24–29 April 2004, pp. 455–462. New York: ACM.

44.

Diaper

Stanton

The handbook of task analysis for human-computer interaction. Boca Raton, FL: CRC Press, 2003.

45.

Saitwal

Feng

Walji

, et al. Assessing performance of an electronic health record (EHR) using cognitive task analysis. Int J Med Inform 2010; 79(7): 501–506.

46.

Poole

Ball

LJ.

Eye tracking in HCI and usability research. Enc Hum Comput Interact 2006; 1: 211–219.

47.

Munn

Stefano

Pelz

JB.

Fixation-identification in dynamic scenes: comparing an automated algorithm to manual coding. In: Proceedings of the 5th symposium on applied perception in graphics and visualization, Los Angeles, CA, 9–10 August 2008, pp. 33–42. New York: ACM.

48.

Findlay

JM.

Eye scanning and visual search. In: Henderson

Ferreira

(eds.) The interface of language, vision, and action: eye movements and the visual world. New York: Psychology Press, 2004, pp. 134–159.

49.

Hampson

Opris

Deadwyler

SA.

Neural correlates of fast pupil dilation in nonhuman primates: relation to behavioral performance and cognitive workload. Behav Brain Res 2010; 212(1): 1–11.

50.

Hogervorst

Brouwer

van Erp

JB.

Combining and comparing EEG, peripheral physiology and eye-related measures for the assessment of mental workload. Front Neurosci 2014; 8: 322.

51.

Veltman

Gaillard

AW.

Physiological indices of workload in a simulated flight task. Biol Psychol 1996; 42(3): 323–342.

52.

Monfort

Sibleyb

Coyneb

JT.

Using machine learning and real-time workload assessment in a high-fidelity UAV simulation environment. Proc SPIE 2016; 9851: 1–10.

53.

Di Stasi

Marchitto

Antolí

, et al. Saccadic peak velocity as an alternative index of operator attention: a short review. Eur Rev Appl Psychol 2013; 63: 335–343.

54.

Di Nocera

Camilli

Terenzi

. A random glance at the flight deck: Pilots’ scanning strategies and the real-time assessment of mental workload. J Cogn Eng Decis Making 2007; 1: 271–285.

55.

Hegarty

Smallman

Stull

. Decoupling of intuitions and performance in the use of complex visual displays. In: Proceedings of the cognitive science society, Washington, DC: Cognitive Science Society, 23–26 August 2008.

56.

Coral

MP.

Analyzing cognitive workload through eye-related measurements: a meta-analysis, 2016, https://corescholar.libraries.wright.edu/cgi/viewcontent.cgi?referer=https://www.google.co.in/&httpsredir=1&article=2647&context=etd_all

57.

Niezgoda

Tarnowski

Kruszewski

, et al. Towards testing auditory—vocal interfaces and detecting distraction while driving: A comparison of eye-movement measures in the assessment of cognitive workload. Transport Res F: Traf 2015; 32: 23–34.

58.

Goldberg

Kotval

XP.

Computer interface evaluation using eye movements: methods and constructs. Int J Ind Ergonom 1999; 24: 631–645.

59.

Singh

Servoss

Kalsman

, et al. Estimating impacts on safety caused by the introduction of electronic medical records in primary care. Inform Prim Care 2004; 12(4): 235–242.

60.

Wickens

Carswell

CM.

The proximity compatibility principle: its psychological foundation and relevance to display design. Human Factors 1995; 37: 473–494.

61.

Reyes

Lee

JD.

Effects of cognitive load presence and duration on driver eye movements and event detection performance. Transport Res F: Traf 2008; 11: 391–402.

62.

Wang

Yang

Liu

, et al. An eye-tracking study of website complexity from cognitive load perspective. Decis Support Syst 2014; 62: 1–10.

63.

Hoyt

Adler

Ziesemer

, et al. Evaluating the usability of a free electronic health record for training. Perspect Health Inf Manag 2013; 10: PMC3692322.

Too much or too little? Investigating the usability of high and low data displays of the same electronic medical record

Abstract

Keywords

Introduction and motivation

The problem of data overload

Evaluating EMR usability

Efficiency

Effectiveness

Satisfaction

The present study

Methods

Participants

EMR versions

Experiment setup

Experiment design

Experiment procedure

Results

Efficiency

Effectiveness

Satisfaction

Discussion

Footnotes

Author’s note

Declaration of conflicting interests

Funding

ORCID iD

References