Abstract
Museum conservation guidelines restrict illuminance for sensitive artwork to levels that can cause objects to be perceived as less colourful, a phenomenon known as the Hunt effect. Previous colour rendering research identified red saturating gamuts that consistently increased perceived saturation and personal preference. A study was conducted to evaluate the visual experience of fine art illuminated by a red saturating gamut family constrained to be uniquely identified by their TM-30 gamut scores (denoted as
1. Introduction
To limit photochemical degradation, many museums restrict illuminance to as low as 50 lx. 1 Such low light levels reduce the perceived colourfulness of objects, a phenomenon known as the Hunt effect.2,3 Studies utilizing real artwork have demonstrated that illuminances of at least 200 lx are needed to create a rich viewing experience.4–6
Early work from Jerome 7 and Thornton 8 identified colour gamut as a predictor of colour preference, and Harper 9 explicitly identified red as having special importance. Lee et al. 10 found that observers prefer foods to appear with increased saturation, where the preference for increased saturation in red strawberries was in the direction opposite to the colour change a strawberry undergoes during decomposition. Recent colour rendering studies designed to probe quantities derived from TM-3011–13 demonstrated that a particular family of red saturating gamuts were consistently associated with higher preference and perceived saturation.14–19
Studies investigating the Hunt effect suggest that red saturating gamuts may be able to compensate for some of the perceived desaturation associated with the Hunt effect.20–22 This study utilized a similar family of gamut shapes constrained such that a particular TM-30 gamut score (Rg) corresponded to a reasonably unique spectra with a gamut shape distinct from other members of the gamut family. We denote this constrained version of the TM-30 gamut score as
Studies of preferred chromaticity suggest a red tint or negative Duv (chromaticity below the blackbody locus 23 ) is preferred.24–28 Spectra with lower Duv are more likely to exhibit larger gamuts, 29 indicating a potential confound that can be controlled, but is ignored by many studies that treat correlated colour temperature (CCT) as a proxy for chromaticity without considering Duv.
This study was designed to characterize the effects of varying Duv and
A priori hypotheses are stated below without embellishment, first in narrative form, then in the form of statistically tested hypotheses. Section 3.3 characterizes the experimental variables. Section 3.5 describes the statistical models utilized to test these hypotheses.
Values for
The stepwise reduced RSM model of preference will include significant
The response surface for preference will contain a local maximum.
Perceived saturation is directly correlated with
The stepwise reduced RSM model of saturation for both paintings will include a significant
Participants will not be able to consis-tently judge the naturalness of abstract paintings.
The naturalness model will exhibit poor fit statistics and marginal effect sizes.
Participants will evaluate preference, perceived saturation, and perceived naturalness similarly for both paintings.
The painting interaction terms in the mixed-effects model will be insignificant.
Red hues and objects will have the most influence on participant evaluations.
For participant rankings of the influence of various painting components, red hues and objects will have the highest mean influence rank.
2. Background
First quantified by Hunt in 1950, the Hunt effect is a perceived reduction in colourfulness as light levels decrease, describing a widely acknowledged aspect of our daily experience of colour.2,3,30 Decreasing the optical power of the visual stimulus is the primary physiological driver of the Hunt effect. Reduced light levels lead to reduced stimulus at the retina, which reduces the strength of colour perceptions. Colours become less distinct and more difficult to discriminate, compressing the range of distinct colours, an effect that is present at both photopic and mesopic light levels.2,31
Until recently, most electric light in museums and galleries was provided by halogen sources. 32 Spectral considerations mostly dealt with CCT and filtering of ultraviolet radiation to minimize photochemical degradation. Colour rendering was less relevant, given that most light sources were nearly blackbody radiators exhibiting high fidelity with little variation in their colour rendering qualities.33–35 Current multichannel light-emitting diode (LED) sources are capable of matching or potentially exceeding the visual qualities of halogen sources.
2.1 Colour rendering
Five recent studies15–19 evaluated colour perception under spectra at systematically mapped combinations of Illuminating Engineering Society (IES) TM-30 fidelity index (Rf), gamut index (Rg) and gamut shape as defined by the colour vector graphic (CVG). The CVG is composed of 16 hue angle bins, with colour rendering for each hue characterized by a vector with a radial component representing chroma (or saturation) shift relative to a same-CCT reference source (Rcs,hj) and an angular component representing hue shift relative to a same-CCT reference source (Rhs,hj).11,36 Local chroma shift in a red hue bin (Rcs,h1 or Rcs,h16) may be used as a proxy for gamut shape, with positive shifts generally associated with red saturating gamuts, and negative shift generally associated with green saturating gamuts. All five studies evaluated various lighting conditions using a preference scale and illuminance levels between 200 lx and 600 lx. Three of the studies additionally made evaluations using saturation and naturalness scales.16,17,19
Four of the five studies found Rg was a stronger predictor of preference than Rf.15–17,19 The study that found Rf better predicted preference than Rg employed the widest range of gamut shapes. 18 Four of the five studies found that Rcs,h1 or Rcs,h16 was as good a predictor of preference and saturation as was Rg, underscoring the importance of gamut shape, specifically with respect to red preference.16–19 Follow-up questions indicated that red objects were the most relevant to participants’ evaluations.
Each study also suggested a plateau in preference for increasingly large values of Rg and Rcs,h16. The most comprehensive study among those reviewed provided evidence at five chromaticities that preference decreased for especially high values of Rg and Rcs,h16. 17 Averaging the boundaries of the optimal regions suggests preference may be maximized in the region of 106 < Rg < 117 and 4% < Rcs,h16 < 17% for illuminance levels above 200 lx. This range is similar to but more restrictive than the recommendations proposed in IES TM-30-20 Annex E, 12 which references the same studies.15–19 This range is also compatible with other efforts to identify specification criteria for colour preference using measures from TM-30.18,37
Three studies investigated red saturating gamut families to determine if they could be used to compensate for the perceived desaturation associated with the Hunt effect. Each study suggested preference for larger gamuts at lower light levels.20–22 The most robust study of the group found an inverse linear relationship between preferred Rg and log-illuminance. Those results indicate a preferred region near Rg = 114 at 50 lx. The lighting conditions in that study did not carefully control Duv, which varied noticeably over a range of −0.0078 < Duv < 0.0018. 22 The two studies evaluating artwork only utilized a single red-dominated oil still life.20,22 Previous studies on the perception of lighting for fine art have identified differences in evaluations between paintings.5,6,38–40 To investigate the generalizability of previous results, this study evaluated a pair of paintings across systematically varied levels of Duv.
2.2 Chromaticity
Systematically quantifying position above or below the blackbody with Duv is relatively new within the lighting literature. 23 Along with CCT, Duv provides the second dimension necessary to uniquely specify chromaticity for sources near the Planckian locus. 23 Duv is salient within the context of colour rendering, where most metrics are calculated in comparison to a reference source at the same CCT, but not necessarily the same Duv.
Dikel et al. 25 found that Duv as low as −0.014 was preferred when participants were allowed to freely adjust chromaticity. Ohno and Fein 28 found the most natural Duv fell consistently below the blackbody locus (−0.02 < Duv < −0.01) for CCTs between 2700 K and 6500 K. Neither study attempted to control for colour rendering, prompting Wei and Houser 29 to demonstrate that Duv is inversely correlated with measures of gamut area, suggesting low Duv spectra are more likely to be associated with large gamut areas, both of which are correlated with increased preference. Ohno and Oh repeated an earlier experiment while controlling for gamut. The findings were consistent with their previous study, providing further evidence of a correlation between preference and negative Duv. 41 Zhai et al. attempted to characterize the effect of Duv on the visual experience of fine art. Their study did not attempt to control gamut and was conducted at only two levels of Duv (−0.02 and 0.0). They found only marginal evidence of preference for Duv = −0.02. 24 These previous results informed the Duv range for this study, which was selected to be −0.02 < Duv < 0.0.
3. Method
3.1 Materials
The experimental chamber was designed to imitate an art gallery, with wall-mounted fine art illuminated by a ceiling-mounted directional luminaire as illustrated in Figure 1. The walls of the chamber were painted with spectrally neutral Munsell N8 matte paint (GTI, Graphic Technology Inc., Newburgh, NY, USA).

Configuration of the experimental chamber shown during a trial run of the experimental protocol. The painting on the left was not used in this present study; the paintings utilized in the current study are shown in Figure 2.
The watercolour and acrylic paintings shown in Figure 2 were commissioned to include colour from each of the 16 TM-30 hue angle bins, with an emphasis on hue angle bins 1 and 16 (red hues). Both paintings were 40 cm by 50 cm and mounted in matching frames, allowing the paintings to be arranged symmetrically.

The watercolour still life (left) and the acrylic landscape (right) shown in their experimental arrangement. Though a glass sheet covering the watercolour is pictured, to minimize veiling reflections both paintings were viewed directly without glass covers.
The selected frames were matte black with a low profile to minimize shadowing. The watercolour still life was painted with Dr. Ph. Martin’s Radiant concentrated watercolours (providing highly saturated colours) on Arches 140 lb. cold-pressed watercolour paper. The acrylic landscape was painted with Grumbacher Academy Acrylic on an Artist’s Loft gallery wrapped heavy duty canvas.
Participants were seated in an immobile chair. The centre of the chair seat was positioned 120 cm from the wall and 40 cm above the floor. Since the average sitting eye height is 75 cm above the chair seat, 42 paintings were positioned with their vertical centres located 115 cm above the floor.
A single LEDCube-11 (Thousand Lights Lighting, Changzhou, Jiangsu, China) produced the experimental spectra. A Unistrut mounting frame was constructed with articulating joints to allow for careful aiming and complete rigidity after aiming was finalized. Theatrical barn doors were fastened to the front of the frame to prevent direct viewing, restrict waste light and reduce interreflections within the experimental chamber. The centre of the emitting face of the LEDCube-11 was positioned 80 cm from the wall and 230 cm above the floor, adequate to remove the LEDCube-11 from participants’ field of view.
A CL-500A illuminance spectrophotometer (Konica Minolta Sensing Americas Inc., Ramsey, NJ, USA) was purchased and factory calibrated for this experiment. The CL-500A was mounted on the wall between the paintings with the receiving sensor located 120 cm above the floor. This position for the CL-500A was utilized for all spectral measurements including the dimming profiles for the spectral optimizer, live spectral feedback and measurement during the experiment and the final reported experimental spectra.
Illuminance on the paintings was measured using a T-10MA illuminance meter (Konica Minolta Sensing Americas Inc., Ramsey, NJ, USA). Average illuminance (Eavg) across each painting was estimated by averaging a set of four measurements taken at each corner of the painting. Once the light level corresponding to Eavg = 50 lx was established, a point illuminance measurement taken between the paintings by the CL-500A for use as the target illuminance (Etarget). Values for Etarget are necessarily higher than Eavg because the CL-500A was centred in the beam of the luminaire.
For Eavg = 50 lx, the top-inside corner of each painting was illuminated to approximately 64 lx, whereas the bottom-outside corner was illuminated to approximately 37 lx. Luminance distributions were readily reproducible and exhibited little difference between lighting conditions at the same Eavg.
A workstation was placed just outside of the experimental chamber to allow the experimenter to observe participants, trigger scenes on the LEDCube-11, take measurements using the CL-500A and make live adjustments in LEDNavigator v6.3.7, the LEDCube-11’s native software. As the only non-experimental light source active in the laboratory, workstation displays were dimmed and confirmed to have no effect on the lighting conditions within the experimental chamber measurable by the CL-500A, and they were outside of a participant’s field of view.
3.2 Participants
Ethics approval for the experiment was provided by The Pennsylvania State University Institutional Review Board (study #10205). Each participant was paid $20.
The study included 31 naïve participants living in or near State College, Pennsylvania. The average age was 22 years with a range of 18 years to 43 years. Thirteen 13 (42%) participants wore glasses and three (10%) participants wore contacts; they were instructed to continue wearing them for the duration of the experiment. No participants reported cataract surgery or abnormal colour vision, and all participants passed a six-plate Ishihara test for colour-vision abnormalities. 43 No participants reported previous participation in colour-related experiments.
3.3 Variables
The experimental scenes illustrated in Figure 3 were systematically varied according to RSM as described in Section 3.5 along axes corresponding to Duv and
RSM point types and scene names for each lighting condition are presented in the first two columns. Coded coordinates for the geometric relationships based on a central composite RSM design are provided alongside the target values for Rg* and Duv.

TM-30 CVGs with associated SPDs superimposed in the centre for each experimental scene. Presented values are the average of all spectral measurements recorded as each participant evaluated each scene. The RSM coordinate geometry is shown behind the CVGs with target
Perceptual response variables and their semantic scales shown in Figure 4 were adopted from previous studies to allow the results to be more readily compared.16–19 Participants were provided a 6-point scale with no middle value to restrict centring bias.44,45 The scales represent the perceptual attributes of preference, naturalness and saturation, respectively. The goal of the gamut family was to enhance preference, making it the primary response variable. Recording saturation allows the presence of the Hunt effect to be directly investigated. Naturalness was included to make overall models more comparable to existing studies15–19 because it did not markedly influence an observer’s time commitment. The naturalness scale was included with a cautionary mindset informed by a priori reservations about the veracity of attributing participant responses to lighting conditions when evaluating impressionist paintings (e.g. see Hypothesis 3 in Section 1).

Semantic scales used by the participants.
3.3.1 Gamut family
This research considers the complete gamut shape characterized by the IES TM-30 CVG. To allow for RSM, it is necessary to adopt a single-number proxy value that identifies a reasonably unique gamut shape while also quantifying the relationship between distinct spectra along a linear scale. TM-30 provides a linear scale by quantifying relative gamut area, but Rg does not uniquely identify a spectrum or gamut shape. Additional constraints must be paired with Rg to restrict experimental spectra to a progression of related gamut shapes such that a particular TM-30 gamut score corresponds to a reasonably unique spectra with a gamut shape distinct from other members of the gamut family. The resulting proxy measure is denoted as
Rcs,hj = (Rg – 100) ± 5 for hue angle bins 1, 8, 9 and 16
ΔRcs,hj < 7% between any two of hue angle bins 1, 8, 9 and 16
Minimize Rhs,hj averaged across all hue angle bins
Rcs,hj > −1% for all hue angle bins (defer to constraint 4 when Rg < 105)
A sample gamut family produced by these constraints is illustrated in Figure 5. Individual gamuts were identified by an optimizer written in Excel 2016 (Microsoft Corporation, Redmond, WA, USA) by progressively increasing Rg with commensurate increases in Rcs in hue bins 1, 8, 9 and 16 while minimizing the average Rhs and desaturation in other hue bins.

Eleven members of one realization of the experimental gamut family over the range of 105 <
3.3.2 Spectral accuracy
Live spectral feedback was generated during the experiment by the CL-500A controlled by LEDNavigator v6.3.7. Etarget, CCT, Duv and TM-30 Rf and Rg reported in Table 2 were recorded from three measurement averages as each scene was shown to each participant. LEDNavigator v6.3.7 does not provide tenths for Rf and Rg on the readily available spectral summary, making it difficult to estimate the variability of these values. TM-30 Rg values without a reported interval were not measured to have any variation, though this does not rule out variation up to ±0.4 points. Reported illuminance values are averages of point measurements taken between the paintings. The higher number of replications (N) for the CEN scene is due to the RSM model utilized, which required five replications of the centre point for each participant.
Measured averages and associated standard errors for each experimental scene are reported alongside target values. Standard errors for Rg are not reported for several scenes because those scenes were not measured to have any variation from the target value. Duv values are rescaled for readability to reduce the number of zeros in the reported Duv values.
The average deviation in Etarget recorded during this study was 0.3% and the most extreme Etarget deviation was a 1.2% increase of 0.8 lx, both of which are more than an order of magnitude below the ±20% illuminance JND proposed by previous research for indirect comparisons of lighting conditions.47–49 The deviation in chromaticity is presented numerically in terms of CCT and Duv in Table 2. A chromaticity JND is presented graphically in terms of MacAdam ellipses centred on each cluster of points in Figure 6, which demonstrates that no lighting conditions measured during the experiment came close to exceeding the chromaticity JND, even for expert observers.30,50

Custom ellipses centred on the five distinct experimental chromaticities. A 4-step ANSI bin for commercial LEDs specified at 3000 K is provided alongside the 1-step MacAdam ellipses representing the expert chromaticity JND at each chromaticity.
3.4 Procedure
Experimenters read from a prepared script to reduce experimenter bias and ensure all participants received the same information and instructions. Each experimental session began with an explanation the experimental scales, focused on the terms anchoring the scales to reduce the differences between participants’ interpretation of the questions’ scales. Participants were then twice shown the full range of experimental scenes in 3-second stepped intervals; participants were instructed to focus on the watercolour for the first run-through and to focus on the landscape for the second run-through. Participants were directed not to evaluate brightness, glare, shadow or the content of the paintings and to instead focus on the effects of each lighting scene on the colour of the paintings.
Each participant was informed their first scene was practice affording them a chance to become familiar with the procedure and questions. The practice scene was not one of the nine experimental scenes, and practice responses were not analysed.
A 1–2- second blackout was used when switching between scenes to prevent instantaneous comparisons. Participants were instructed to observe the paintings for 90 seconds before recording their evaluations to allow for chromatic adaptation. 51 A timer was utilized to notify participants of the conclusion of the observation period. The first 15 seconds of the observation period was also used by the experimenter to take a preliminary measurement and make small adjustments to LED channel levels in LEDNavigator to ensure the spectral accuracy presented in Table 2. Deviations were largely due to warmup-related LED colour shift in the 595 nm and 635 nm LED channels.
Participants were presented with the 13 lighting scenes that systematically varied in
After evaluating 13 experimental scenes, participants were asked to reflect on the entirety of the experiment while indicating which hues and elements within the painting most influenced their evaluations. The entire experiment took between 50 minutes and 60 minutes for each participant.
3.5 Statistics
Experimental lighting conditions were selected based on the distribution of a central composite design for RSM, which was the primary statistical analysis. RSM is designed to characterize interactions and second-order terms indicating curvature, which was expected for
Rank data for the influence of specific objects and hues were analysed with the Friedman test in conjunction with a multiple pairwise comparison test utilizing the Bonferroni correction. 53
3.5.1 Response surface methodology
RSM provides a framework for efficiently characterizing the relationship between a response variable and multiple continuous, quantitative predictor variables. RSM was originally developed for use in the chemical industry; while uncommon in human factors research, RSM has been used successfully in a variety of vision and colour experiments.54–56 Because each participant formed a complete replication, a participant blocking term that enables the model to account for inter-observer variation could be included. 57
The central composite design for RSM provides a geometric relationship between treatment levels, illustrated in Figure 3, that enhances the model’s statistical power for detecting curvature. Corner points form a two-level full factorial design effective for estimating first-order linear effects as well as interaction effects. Axial points allow estimation of second-order quadratic effects that characterize curvature. Replicating the centre point allows for a pure error estimate and therefore a lack-of-fit test, which is used to characterize the validity of the response surface. A complete replication for this study was composed of 13 runs, including nine unique experimental scenes and four replications of the centre scene.53,57
Effect sizes characterize the magnitude of the effect of the experimental treatments (in this case, the lighting scenes) on the response variables. Effect sizes also provide somewhat standardized values for comparison between studies. To account for different sample sizes between the scenes for this research, effect size was calculated using Hedge’s G. This research considered reference values of <0.2 (no effect), 0.2–0.5 (small), 0.5–0.8 (moderate) and >0.8 (large). The significance of effect sizes was determined using Tukey’s test for multiple pairwise comparisons.53,58,59
Some prior colour rendering research utilized regression models based on mean response levels for each scene.16–19 Those models collapsed across participant, discarding the majority of the observer variance for the final model and inflating the R2 values for those models. It is nearly certain that participants would have had significantly different responses; therefore, participant should remain in the analysis of variance (ANOVA) model as a random term. This allows the model to randomize intercepts for each participant, smoothing out differences in the use of semantic scales and retaining inter-observer variation for p-values, R2 and lack-of-fit tests. 53
For this reason, rather than only comparing model fit against R2 values from recent work on colour rendering, this research also considers thresholds proposed by meta-analyses of preference studies that did not collapse observer variance.60,61 Reference values for the quality of correlation indicated by R2 include <30% (poor), 30%–50% (moderate), 50%–70% (substantial) and >70% (excellent).
3.5.2 Mixed-effects model
The mixed-effects model is a type of general linear model that allows for the inclusion of random variables in addition to the fixed and continuous variables available in RSM. By not estimating the lack-of-fit and the significance of the order of the regression equation, the mixed-effects model has additional degrees of freedom that enable inclusion of nuisance variables that would otherwise saturate an RSM model. A restricted mixed-effects model was utilized. Restricted maximum likelihood was used for estimating the variance components (random variables and interactions with random variables) and Kenward–Roger was used for estimating fixed effects (fixed and continuous variables); both methods include correction factors for small sample sizes. 53
Significant nuisance variables indicate a need to subdivide the respective RSM models. For example, a significant painting term indicates that separate response surfaces must be fit for each painting. Insignificant nuisance variables allow the RSM models to utilize larger data sets collapsed across the insignificant variable, which is then omitted from the respective RSM models. For these studies, a term is considered significant when that term’s p < 0.05, marginally significant when 0.05 < p < 0.10 and is otherwise insignificant.
3.5.3 Rank order data
Rank order data are discrete and ordinal, and evaluations are not independent. The nonparametric Friedman test is commonly used to determine whether the mean ranks of listed items are equal. The test is similar to a one-way ANOVA; significance indicates that mean ranks are not all equal. A Friedman test is typically paired with the Bonferroni multiple pairwise comparison test to produce 95% confidence intervals that demonstrate which mean ranks significantly differ from one another.53,62
4. Results
Statistical analysis was performed in Minitab 19.1 (Minitab Inc., State College, PA, USA) and Excel 2016, consistent with the principles suggested by Royer et al. 63 Though ordinal data were collected, parametric statistics were employed since they have been shown to be robust and acceptable for analysis of Likert scale data.64,65
4.1 Artwork perceptions
The mixed-effects model presented in Table 3 was utilized as a preliminary analysis to investigate interactions with the painting term and other potential nuisance variables. Painting was modelled as a fixed variable.
ANOVA was run separately for each response variable. Results in bold text indicate significance with p-value less than 0.05, and bold italic text indicates significance with p-value less than 0.10.
No interactions with the painting term reached significance, allowing the RSM models to span both paintings. The significant painting and participant terms in each model (p < 0.01 in all cases) indicate the mean response level that was distinct between paintings and between participants for each response variable. Due to this significance, the terms cannot be removed from the model. To allow the RSM models to account for these differences in opinion, a painting–participant blocking term was included, forming a total of 62 blocks. The blocking terms allow the RSM models presented in Table 4 to average across the distinct means for both participants and paintings. No valid RSM models were found that did not include the blocking terms. Terms without reported p-values were eliminated via stepwise model reduction with an inclusion threshold of 0.1. Some insignificant terms were included to maintain a hierarchical model. Collapsing the models across the paintings allows for a more straightforward understanding of the variable relationships at the expense of increased lack-of-fit. The preference and saturation models are acceptably well-fit with fair R2 values and insignificant or marginally significant lack-of-fit. The naturalness model is the least viable given the significant lack-of-fit.
An RSM ANOVA model was fit separately for each response variable, forming three distinct RSM models. Bold text indicates significance with p-value less than 0.05 and bold italic text indicates significance with p-value less than 0.10. Lack-of-fit is a test of the regression model fit; regular text indicates an acceptable fit, bold italic text indicates a marginal fit and bold text indicates a lack-of-fit. R2 coefficients of correlation are formatted according to the thresholds established in Section 3.5.1; bold text indicates good correlation, bold italic text indicated fair correlation and other text indicates poor correlation.
The preference model exhibits a linear

The preference model illustrated by a response surface and contour plot with the model presented at top. The contour plot is a plan view of the response surface with preference represented by colour contours that are quantified by the legend at top right. The locations of the experimental scenes are marked with black points on the contour plot.
Moderate-to-large effect sizes and good distinction between Tukey’s groupings in Figure 8 reinforce the validity and predictive power of the response surface in Figure 7. The most preferred scenes were A1 (

Scenes are ordered by mean preference rating. Hedge’s G was used to calculate effect sizes between each pair of scenes. Colour coding corresponds to effect size; green cells indicate large effects and yellow cells indicate small or no effect. A mixed-effects ANOVA model was used to calculate Tukey’s multiple pairwise comparisons for all scene pairs with significance marked by bold effect sizes.
The saturation model is composed of only a linear

Predicted saturation versus Rg*. Mean response levels for each experimental scene are marked with by blue shapes. Error bars represent ±1 standard error.
Moderate-to-large effect sizes and good distinction between Tukey’s groupings in Figure 10 reinforce the validity and predictive power of the linear regression model. The scenes perceived as most saturated were A1 (

Scenes are ordered by mean saturation rating. Hedge’s G was used to calculate effect sizes between each pair of scenes. Colour coding corresponds to effect size; green cells indicate large effects and yellow cells indicate small or no effect. A mixed-effects ANOVA model was used to calculate Tukey’s multiple pairwise comparisons for all scene pairs with significance marked by bold effect sizes.
Naturalness exhibits marginal curvature in both

The naturalness model illustrated by single-factor plot models, which were fit separately for

Scenes are ordered by mean naturalness rating. Hedge’s G was used to calculate effect sizes between each pair of scenes. Colour coding corresponds to effect size: green cells indicate large effects and yellow cells indicate small or no effect. A mixed-effects ANOVA model was used to calculate Tukey’s multiple pairwise comparisons for all scene pairs with significance marked by bold effect sizes.
Participants were provided space below each set of semantic scales to share reasoning for their evaluations. They were not required to provide comments. Comment language mirrored the language of the semantic scales, and their descriptions provided prior to start of each experimental run. Most comments could be grouped into one of three types, labelled in Table 5 with the most common language used by participants. For saturation-related comments, only those comments were counted that indicated general under or oversaturation; comments about single colours were not included in the counts. The most common participant reasoning discussed colour contrast and detail including colour diversity, depth and range.
Participant comment summary is presented with the left section of the table providing the scene names accompanied by the designed levels of their predictor variables for ease of reference. Comments are grouped by painting and then by comment type. Reported percentages are the percent of total possible comments, which includes blank comments. Empty cells indicate that no comments of that type were recorded the scene and painting in question.
The difference in rates of perceived undersaturation and oversaturation reflect the saturation model means for each painting; the landscape painting had a higher mean saturation rating and comments suggest it was perceived as more saturated. A small number of comments regarding overall warmth or coolness appear to be related to Duv. Scenes described as warm included C1 (6% of participants, Duv = 0.0) and A3 (6% of participants, Duv = 0.0036). Scenes described as cool or cold included A4 (11% of participants, Duv = −0.0211), Cen (4% of participants, Duv = −0.0088) and C4 (2% of participants, Duv = −0.0175).
4.2 Object and colour focus
Participants reported significant trends in their ranking of hues as shown in Figure 13, still life painting objects as shown in Figure 14, and landscape painting objects as shown in Figure 15. Red hues and objects consistently held the highest mean rank; orange, yellow, green and purple hues and objects held similar intermediate influence; blue and cyan hues and objects consistently ranked near the bottom but may have been underrepresented in the object lists. The object list for the landscape painting contained no red-dominated objects; the object with the highest mean rank was composed of mostly orange and red-orange hues. These results are in agreement with previous studies.16–19 Unqueried hues and compositional elements noted in participant comments include black, white or neutral hues (19% of participants) and colour contrast and/or blending (42% of participants).

Importance of different hues. Mean rank is denoted by bar height and 95% CIs by black error bars per values provided in the Supplemental Material.

Importance of different objects in the watercolour painting. Mean rank is denoted by bar height and 95% CIs by black error bars per values provided in the Supplemental Material. Colours for each bar were obtained from Adobe Photoshop CC 2015 by subdividing each object into 1–3 colour zones and averaging the hue values within each zone.

Importance of objects in the landscape painting. Mean rank is denoted by bar height and 95% CIs by black error bars per values provided in the Supplemental Material. Colours for each bar were obtained from Adobe Photoshop CC 2015 by subdividing each object into 1–3 colour zones and averaging the hue values within each zone.
5. Discussion
The results demonstrated the viability of RSM as a statistical tool for colour rendering studies by identifying a valid response surface for preference. The model fit was acceptable with a moderate R2 = 0.37 and an insignificant lack-of-fit statistic (p-value = 0.10). Previous TM-30 studies that also presented lighting conditions independently and provided ANOVA models for preference reported R2 values of 0.94, 16 0.6117 and 0.86. 19 Each of these studies collapsed their datasets across participants, eliminating substantial variance and markedly increasing the reported R2 values. When this study’s preference model was recalculated using the same approach, the R2 term increased to 0.97, indicating the model fit was at least as robust as previous studies. This reinforces the use of reference values for R2 of <30% (poor), 30%–50% (moderate), 50%–70% (substantial) and >70% (excellent).60,61
Results were mixed with respect to the primary hypothesis (1). Significant curvature was identified for Duv but not for
The interaction between
No evidence was found of curvature in
Colour rendering studies conducted at illuminances below 200 lx indicate desaturation related to the Hunt effect likely drives preference for larger gamuts at lower illuminances.20–22 None of those Hunt effect studies suggested preference should continue increasing above 120 Rg, even at 20 lx. Kawashima and Ohno
20
likely saw preference decrease for large gamuts because they presented familiar objects at a minimum of 100 lx. Wei et al. and Bao and Wei likely found preferred Rg < 120 because their single painting was dominated by red hues subject to the largest hue and chroma shifts under the largest gamuts studied.21,22 All three of these studies utilized forced choice or free choice methodologies, allowing participants to directly compare spectra. In context, these results suggest that for this gamut family illuminating independently viewed high saturation abstract paintings at 50 lx, increasing
Hypothesis (2) was confirmed by the saturation model, which featured a significant (p < 0.001) direct linear correlation (R2 = 0.40) with
Hypothesis (3) was confirmed by the naturalness model, which exhibited the expected lack-of-fit, featured only second-order terms with somewhat marginal significance and a moderate R2 = 0.33 (0.57 recalculated). No large effect sizes and no significant differences were seen between any of the presented scenes. Previous studies suggest perceived naturalness decreases at both high and low Rg,19,46 which is likely a factor in those studies’ ability to identify an upper bound for preferred Rg. Because the artwork was abstract, it was not possible for the participants to judge naturalness, which may be partially responsible for lack of an identified upper bound for preferred Rg in this study.
Given the abstract and saturated composition of the paintings and the lack of external reference for the painting’s colour, naturalness was expected to be difficult to judge consistently. Participants evaluated the naturalness of the two paintings differently (mean naturalness rating for the landscape = 3.6 and mean naturalness for the still life = 4.1). It is possible that a participant did not respond to naturalness of the colours in the painting, but to the naturalness of the painting itself. We intend to expand on this consideration with an additional painting that will be reported in a companion paper that is under development.
Hypothesis (4) was confirmed by the mixed-effects model presented in Table 3, where no painting interaction terms reached significance. This indicates that the effects of varying
Hypothesis (5) was confirmed by results presented in Section 4.2, where participants consistently focused on red hues and the reddest objects while making their evaluations. Differences in rank between objects of similar hue may have been driven by a combination of visual prominence within the painting, centring in the field of view and object lightness or brightness. For the landscape painting, the left-side (innermost) trees ranked significantly higher than the path despite being composed of the same hues. The left-side trees were both more centred in participants’ field of view and are foreground objects within the painting. For the watercolour painting, the less prominent and slightly darker strawberries were significantly lower ranked than the tomatoes and apple despite being painted with the same watercolour paint at different dilutions. Participant comments indicated that colour contrast and detail were central to their evaluations. These results and observations hint at the influence of the composition and content of the paintings.
6. Conclusions
Four of five a priori hypotheses (2–5) were confirmed by a robust empirical dataset in good agreement with the results of important prior research on colour rendering within the context of TM-30.15–19 Those findings support the general validity of these results, suggesting the decrease in preference for large values of
By identifying a valid response surface for preference, this study demonstrated the viability of RSM as a statistical tool for mapping complex interactions in colour rendering studies. The model fit was acceptable with R2 similar to previous colour rendering studies when calculated using the same statistical methodology. The model predicts Duv = −0.013 was preferred at
To support the generalizability of these results, careful selection of more diverse and representative artwork is necessary. A Renaissance style painting would provide an improved representation of artwork typically found in museums and galleries. Inclusion of less saturated colours and objects depicted in their natural hues would improve the diversity of colour provided for participants to evaluate and improve their ability to accurately judge colour appearance. The lack of expected curvature in
Having established an experimental protocol and statistical analysis suitable for investigating the complex effects of colour rendering for artwork at light levels subject to the Hunt effect, the next step is clear: investigation of the experimental gamut family across a range of conservation illuminance levels. The present study leaves open two key questions. Firstly, will increased preference remain for high values of
Supplemental Material
sj-pdf-1-lrt-10.1177_14771535231172100 – Supplemental material for Fine art under low illuminance: Gamut and tint
Supplemental material, sj-pdf-1-lrt-10.1177_14771535231172100 for Fine art under low illuminance: Gamut and tint by J Mundinger and K Houser in Lighting Research & Technology
Footnotes
Acknowledgements
Jim Mundinger is gratefully acknowledged for painting pro bono and providing the paintings used in this study.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Kevin Houser is a co-founder and shareholder of Lyralux, Inc. and inventor on U.S. Patent No. 11,064,583, Japanese Patent No. 7,198,212, U.S. Patent Application No. 2021/0345461 A1 and European Union Patent Application No. 18760286.7, which disclose intellectual property related to colour rendering under dimmed lighting conditions.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This material is based upon work supported by the United States Department of Education under Award No. P200A180031. Any opinions, findings and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the United States Department of Education. Funding for the experimental apparatus was provided by The Pennsylvania State University College of Engineering’s ENGineering for Innovation & ENtrepreneurship (ENGINE) grant. Funding for participant honoraria was provided by Project CANDLE.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
