Abstract
We studied whether human observers can estimate the illumination direction from
Introduction
We will consider texture due to the illumination of rough surfaces. The appearance of such three-dimensional (3D) textures is dependent on the illumination and on the viewing direction and can be characterized by bidirectional texture functions or BTFs (Curet, 1997; Dana, van Ginneken, Nayar, & Koenderink, 1997, 1999). Conversely, the texture appearance might provide us with cues about the illumination and viewing directions. Luminance distribution or histogram-based cues are, for instance, the width, average, and skewness of the luminance distribution (Ho, Landy, & Maloney, 2008; Kim & Anderson, 2010; Motoyoshi, Nishida, Sharan, & Adelson, 2007; Pont & Koenderink, 2005, 2008; Wijntjes & Pont, 2010). In addition to such relatively easy derivable cues, the spatial properties of 3D textures also provide cues about the illumination, material, and shape (Chantler, Schmidt, Petrou, & McGunnigle, 2002; Gerhard & Maloney, 2010; Karlsson, Pont, & Koenderink, 2008, 2009; Kim, Marlow, & Anderson, 2014; Knill, 1990; Koenderink, 2012; Pont & Koenderink, 2008; Shepard & Campbell, 1998; Varma & Zisserman, 2004). The second-order statistics of shaded 3D textures provides us with estimates of the azimuth of the average illumination orientation, that is, the direction modulo 180° (Koenderink & Pont, 2003; Koenderink, van Doorn, Kappers, te Pas, & Pont, 2003; Koenderink, van Doorn, & Pont, 2004, 2007). The 180° ambiguity is due to the convex-concave-ambiguity. Furthermore, for arbitrary textures (i.e., statistically inhomogeneous sets of textures), the elevation cannot be estimated due to the bas-relief ambiguity (Belhumeur, Kriegman, & Yuille, 1997).
Illumination direction estimation is an important prerequisite for estimates of the light field, shape from shading, and material judgments. In this article, we investigate how well human observers are able to estimate the illumination direction from 3D textures, in connection to our interest in light field perception. The light field (Gershun, 1939), or plenoptic function (Adelson & Bergen, 1991), is defined as the irradiance as a function of position and direction and might serve as a radiometric framework for perception. Texture provides us with cues which are additional to shading. Note that Lambertian shading (Horn & Brooks, 1989) is dependent on the normal component of the local light vector, while texture due to surface roughness is also dependent on the tangential component of the local light vector. The ensembles of local illumination orientation estimates over rough 3D objects form patterns, the illuminance flow (Pont & Koenderink, 2003, 2004). The illuminance flow depends systematically on the light field and on the shape of the object (Karlsson et al., 2008, 2009; Pont & Koenderink, 2003, 2004, 2008; Pont, van Doorn, Wijntjes, & Koenderink, 2015) and provides cues about the light field and object shape (Koenderink, 2012; Pont & Koenderink, 2008).
In a previous study, we derived how second-order statistics based on the squared gradient and on the Hessian relate to the illumination direction (Koenderink & Pont, 2003). This relation suggests that the illumination orientation can be derived from 3D textures via responses of so-called Textures of real materials from the CURET database at the top and rendered textures of Gaussian surfaces at the bottom. All textures were viewed frontally. The Gaussian textures were rendered for a single scale of the roughness, for illumination polar angles of about 0°, 30°, 50°, and 70° (from left to right), varying azimuth (depicted by the red arrows), and for three reliefs, increasing from above to below.
In still another study (Koenderink et al., 2004), we carried out a similar experiment for frontally viewed computer-generated Gaussian surfaces, see, for examples, Figure 1 bottom, and again found that human observers’ estimates were close to the fiducial orientation values (interquartile intervals of the deviations of the azimuthal estimates were below 14°). However, for the Gaussian textures in the shadowing regime, we found that observers were able to resolve the convex-concave-ambiguity. The main difference between the textures in the shadowing and shading regimes was the presence of cast shadows. This suggests that observers make use of the difference between the boundaries of the cast shadows and the body shadows, the latter being much more gradual than the former.
Next, we tested illumination direction estimation for textures of frontally viewed Gaussian anisotropic rough surfaces (Koenderink, van Doorn, & Pont, 2007). For such textures, one expects systematic errors of the settings as a function of the anisotropy (Karlsson et al., 2008, 2009). Our expectations were fully borne out, in that the observers committed the predicted systematic errors. The results were precise enough to allow the inference that illumination direction detection is based on second-order statistics, that is, of edge detector (rather than line detector) activity.
Figure 1 shows examples of real materials and of rendered Gaussian surfaces. The Gaussian textures look somewhat artificial and as if photographed out-of-focus. The main reason might be that the surface roughness of these textures is restricted to a single scale, while in natural materials one typically finds undulations over a range of scales (Green, Padilla, Drbohlav, & Chantler, 2007; Kube & Pentland, 1988; Padilla, Drbohlav, Green, Spence, & Chantler, 2008; Wainwright & Simoncelli, 2000). Therefore, in this article, we tested whether a deviation from the theoretical assumption of Gaussian geometry (while the theoretical assumptions of Lambertian reflectance and uniform albedo were fulfilled) will systematically affect the estimates of human observers. We rendered images of height profiles resulting from linear superpositions of a range of Gaussian surfaces of different scales. Due to the effect that larger bumps might put smaller ones in cast shadow, such a “Brownian image texture” is not simply a linear superposition of the image textures of the composing Gaussian surfaces. Can human observers estimate the illumination orientation for these, more realistic surface profiles, containing roughness at a range of scales?
Methods
Stimuli
We generated 250 images of frontally viewed Brownian surfaces, see Figure 2 for examples. To create these images, we first constructed statistically independent random surfaces (i.e., surface height profiles) for each of them. The surfaces were generated by linear superposition of seven random Gaussian reliefs of different scales. Each Gaussian relief component was generated with normally distributed heights and an isotropic Gaussian correlation function (Longuet-Higgins, 1957). Hereafter, we set Examples of the stimuli, with arrows depicting the true illumination direction. The cut-out was circular in order to prevent biased responses due to oriented contours. A depiction of the powerspectrum of the 

The stimuli were prepared through a
The illumination directions of the 250 images were distributed randomly over the hemisphere of potential illumination directions (see the polar plot in Figure 4). All stimuli were presented in randomized order. The rendered images were shown in a circular mask in order to avoid a possible bias due to the square shape of the images. The test was conducted using a linearized monitor (unit gamma; implemented via software and checked with a gray scale and Koninca Minolta luminance meter).
The parameters of the 250 stimuli. The grid specifies 15° increments in azimuth and elevation, using equal-area projection. The convention for the specification of the azimuth (zero direction toward the right, increase in counter-clockwise direction) is used throughout the article. Elevation is measured by the polar angle, that is, the distance to the direction of normal incidence (at the center of the graph). The elevation and azimuth specify the direction toward the light source.
Observers
Six observers, the authors and three naive observers, participated in the experiment. Authors S. P. and A. D. were naive with respect to the stimulus parameters. All six observers had normal or corrected-to-normal vision. The experiment was done in accordance with local ethical guidelines, Dutch Law, and with the Declaration of Helsinki.
Experimental Setup
The setup consisted of an Apple Macintosh G4 and a luminance linearized, 22″ LaCie Blue Electron monitor, at 75 Hz and 1600 × 1200 resolution. Participants were seated with their heads in a chinrest, 83 cm from the screen. Vision was binocular, and the head was fixed through the chin rest. The stimulus and probe (see next subsection) images extended visual angles of 8.6° × 8.6° each. The room was dark during the course of the experiment.
Design and Procedure
We defined a “natural” interface in the form of a monochrome rendering of an illuminated hemispherical boss on a plane (see Figure 5). The boss and plane were rendered using (Lambertian) shading, with body and cast shadows, without reflexes. The observer could use the mouse in order to adjust the direction of the (simulated) source. The task was to let the illumination of the hemispherical boss appears the same as the illumination of the texture. This proved indeed to be an intuitive interface to all observers in our former and present studies. The median time for a judgment was less than 8 seconds.
The interface with a stimulus (left) and interaction panel (right). The observers could adjust the direction of the virtual light source that determined the rendering in the response panel. This rendering served to indicate both the elevation and the azimuth of the illumination.
Results
Figure 6 shows the settings (dots) of the azimuthal angles against the stimulus azimuths, per observer. The drawn lines represent the veridical values modulo 180°. Surprisingly, most settings seem to lie close to the true values, while one would expect about half of them to be 180° off due to the convex-concave-ambiguity. Figure 7 shows the polar histograms of the deviations of the azimuthal settings from the actual illumination azimuths, for all six subjects. Obviously, the number of deviations near 0° is different from the number of deviations that are 180° off, and clearly outnumbers it, confirming that most responses were clustered around the fiducial illumination orientation. Since this result contradicts naive expectations, we did some further analysis on these data. Figure 8 splits the data in three groups for three separate elevation ranges: with a polar angle of 0° to 30° (in which range shading dominates) in the left plot, 30° to 60° (in which range neither shading nor shadowing dominates) in the middle plot, and 60° to 90° (in which range shadowing dominates) in the right plot. It is clear from this figure that in the shading regime indeed (following our expectations), about half of the data is 180° off. However, in the intermediate and in the shadowing regimes, almost all settings are close to the veridical illumination orientations. We calculated the ratios of the numbers of datapoints in the first plus fourth quadrant with respect to the stimulus values to those in the second plus third quadrant. We found that the ratios in the shading, mixed, and shadow regimes were 185/217 = 0.85, 156/594 = 0.26, and 41/307 = 0.13. Thus, the presence of shadows in the image seems to resolve the convex-concave-ambiguity.
Scatter plots of the azimuth settings (vertical axes) against the ground truth (horizontal axes) for each observer. The drawn lines depict the lines of expectation (ground truth modulo 180°). Polar histograms of the azimuthal errors per observer (thus each plot totals 250 trials). Notice that the veridical direction is toward the right. Polar histograms of the azimuthal errors committed by all observers, split into three regimes: polar angles smaller than 30° (the shading regime, left plot, and based on 402 settings), between 30° and 60° (both shading and shadowing happen, middle plot, and based on 750 settings), and larger than 60° (shadowing dominates, right plot, and based on 348 settings).


Figure 9 shows the settings (dots) of the polar angles against the stimulus values, per observer. Theoretically, the elevation cannot be estimated due to the bas-relief ambiguity (Belhumeur et al., 1997), so here we expected no clear relation of the settings with the veridical values. Because the data seem to show some correlation with the stimulus values we did a regression on the data. The lines represent linear fits to the veridical polar angle θ, for which we found: 37°+ 0.28 θ with Scatter plots of the polar angle settings (vertical axis) against the ground truth (horizontal axis) for each observer. We fitted the data linearly (drawn lines). 24°+ 0.51 θ with 44°+ 0.30 θ with 44°+ 0.32 θ with 32°+ 0.40 θ with 17°+ 0.48 θ with 
To understand this slight but significant correlation, we looked at the correlation between the data and a few possible effective cues that the observers might have used. Since the stimulus set is homogeneous in terms of statistics, observers might have used cues such as the average gray level, contrast, shadowed fraction of the total area. Figure 10 shows the average polar angle settings (horizontal axis) of the observers against the shadow fraction, median intensity, and Michelson contrast (from 5% to 95% percentiles instead of the absolute minimum and maximum). It is clear that the settings correlate nicely with the shadow fraction. The median intensity also acts as a cue in the shading regime, not just in the shadow regime. For the contrast, we find a distinct picture. The contrast Scatterplots of the average polar angle settings of all observers and the percentage of shadow-filled area (left), the average intensity (center), and the root mean square contrast (right). Notice that shadowing sets in at a polar angle of about 30°, and that the settings correlate nicely with the shadow fraction. The median intensity also acts as cue in the shading regime, not just in the shadow regime.
Correlation Coefficients for Comparison Between the Illumination Orientation Estimates (E) and the True Azimuths (T), Between the Illumination Orientation Estimates and the Observers’ Azimuthal Settings (E–O), and Between the Observers’ Azimuthal Settings and the True Azimuths (O–T). The illumination orientation estimates were computed at three different scales (second column). These correlations were computed separately for the shading, intermediate, and shadowing regimes (Columns 3–5).
These numbers confirm that observers could estimate the illumination orientations rather well (O–T correlations are quite high). The correlations of the observers’ settings and of the illumination orientation estimates with the true estimates were consistently higher for the intermediate regime than for the shading or shadowing regimes. Moreover, the correlations for the intermediate regime were most robust under variation of the differentiating scale. The correlations for the shading regime show the largest decrease with increasing scale. The correlations of the estimated illumination orientations at the largest scale (E–O and E–T) were clearly lower than those of the observers (O–T). Summarizing, we find that the second-order statistics correlated well with the observers’ settings, especially at lower scales, and especially in the intermediate regime.
Conclusions and Discussion
The main conclusion from this study is that a deviation from the theoretical assumption of Gaussian geometry does not affect the estimates of human observers systematically. Human observers can estimate the illumination orientation for our more realistic surface profiles containing roughness at a range of scales. Moreover, the presence of a range of scales, instead of a single scale of the undulations, prevented complaints by the observers. In our previous work on random Gaussian surfaces, we found that observers “did not ‘like’ the samples because they appear somewhat ambiguous” (Koenderink et al., 2004). The stimuli in the current study are probably more pleasant to view because they do look sharp and they do offer a “hold” to the eye (as distinct from the Gaussian surfaces). The results from the current study confirm earlier results using texture images from the CURET database (Curet, 1997). Moreover, as in the case of our study using rendered random Gaussian surfaces (Koenderink et al., 2004), we found that observers were quite capable at elevation and azimuthal direction (instead of orientation) estimation.
The observers’ sensitivity to light source elevation cannot be interpreted as an absolute sensitivity to the height of the light source. Such sensitivity is impossible in view of the
The azimuthal settings in the intermediate and shadowing regimes did not show a 180° modulus. Thus, observers were able to estimate the illumination direction, not just the orientation, if cast shadows were present. Probably the difference between cast and body shadows was used as a cue to the illumination direction, resolving the convexity-concavity-illumination-direction-ambiguity, see Figure 11. Cast shadows have a sharp boundary, while body shadow boundaries are generally more gradual. Noncollimated lighting might thus influence how well the convexity-concavity-illumination-direction-ambiguity direction ambiguity can be resolved because the difference in sharpness of the cast and body shadows might become less clear. The transitions of light-to-dark and dark-to-light in the direction of the tangential component of the light vector are cast shadow edges and body shadow edges. The asymmetric shapes of the shadow and light patches might be another cue for this resolution. In the case of our more natural Brownian surfaces, these differences between cast and body shadows can be much less salient due to the nonlinear combination of such image effects on a range of scales. Figure 12 shows a visualization of how small-scale shadows may Surface profiles with arrows representing illumination directions (the first row for each set). The second row for each set shows the images which correspond with the illumination directions represented above them. The first set shows images of a bump and the second set of a trough in the shading (left), intermediate (middle), and shadowing (right) regimes. Surface profiles with arrows representing illumination directions (the first row for each set). The second row for each set shows the images which correspond with the illumination directions represented above them. The first set shows images of a bumpy bump and the second set of a bumpy trough in the shading (left), intermediate (middle), and shadowing (right) regimes.

In all regimes, we found that the observers’ estimates were accurate in terms of orientation, which suggests that they used shading as well as shadowing cues. The second-order statistics correlated well with the observers’ settings for stimuli in all three regimes, especially at lower scales. The decrease of the correlations for increasing scale suggests that shading and shadowing cues at smaller scales are needed to arrive at the observed accuracy of the observers’ settings and may be combined with large-scale cues—especially in the intermediate and shadowing regimes. It would be interesting to study this point in more detail in combination with eye tracking, to see whether observers look at specific locations in the image. A mechanism combining shading and shadowing cues at a range of scales is of course very convenient with regard to light field estimates in natural scenes.
Our results show that it is very plausible that ensembles of illuminance flow estimates are an important cue to the light field in natural scenes. Figure 13 shows the computational gradient-based illuminance flow estimates (for the algorithm, see Koenderink et al., 2003) for three photographs: a flat piece of plaster, a mountain area seen from above, and a mountain seen from the ground. The flow estimates are represented by ellipses, with the orientation of the major axis representing the estimated irradiation direction and the eccentricity representing the confidence. On the basis of our findings, we hypothesize that weighted combinations of such flow estimate ensembles at multiple scales are probably an important cue for the visual light field (Koenderink et al., 2007). Recent findings show that the visual light field is simplified in comparison to the physical light field and that observers are sensitive to converging, diverging, and uniform fields (Kartashova, Sekulovski, de Ridder, te Pas, & Pont, 2016; van Doorn, Koenderink, Todd, & Wagemans, 2012), which suggests that such relatively simple topologies might represent templates for (often more complicated) natural light fields. Also, it was shown that scene layout and object properties can influence illumination estimates (Schutt, Baier, & Fleming, 2016; Xia, Pont, & Heynderickx, 2016), which is to be expected if the visual light field is inferred from shading and shadowing patterns. In future studies, we will further study the extrapolation from textures to (perception of) illuminance flow over 3D objects and natural scenes, which is far from trivial due to, for instance, foreshortening and local occlusion effects.
Surface illuminance flow estimates for a flat plaster surface, a mountainous area viewed from above, and a mountain viewed from the side. The ellipsoids’ semimajor axes represent the estimated illuminance flow orientations. The eccentricity represents the confidence level.
Finally, these observations are of course related to material and shape perception, and not just light. Since we simultaneously infer higher dimensional material, shape, and illumination properties from two-dimensional images, it is to be expected that such inferences interact. Many studies have shown that it is indeed the case that material, shape, and illumination perception is confounded (Gerhard & Maloney, 2010; Ho et al., 2008; Kim et al., 2014; Motoyoshi et al., 2007; Pont & te Pas, 2006; te Pas & Pont, 2005; Wijntjes, Doerschner, Kucukoglu, & Pont, 2012). Looking at our stimuli in the current study, see, for examples, Figure 2, we saw that many of our stimuli did not look as being made of matte material. Many of them tend to look quite glossy or shiny. We studied this illusory gloss in depth in another article (Wijntjes & Pont, 2010), in which we tested gloss perception for Brownian surfaces as a function of the depth range and illumination direction. We found that an interpretation in the context of the bas-relief ambiguity (Belhumeur et al., 1997) could explain our gloss perception data; on average perceived gloss increased with increasing relief and decreased with decreasing source elevation.
Interreflections were ignored in that experiment, as well as in the current experiment. We expect that the addition of interreflections (or an
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by Delft University of Technology.
Author Biographies
