Abstract
Moon Mode, an algorithmic program pre-installed on Huawei’s flagship smartphone P30 Pro, intelligently detects and enhances images of the moon captured by the phone. A heated social media discussion was triggered after a Chinese tech critic interpreted Moon Mode as photoshopping/superimposing details onto the original shot. The controversy centered on the line between AI enhancement and superimposed alteration when black-boxed algorithms stand between the user/viewer and the world viewed. The controversy is analyzed, together with Huawei’s marketing materials. Drawing on MacKenzie and Munster’s idea of distributed invisuality, AI-enabled photography is examined as a multiplicative data-processing event that traverses hardware and software, eliding any singular, meta-observational position. The author argues that algorithmic photography can be understood as a dynamic event of algorithmic processuality, indicating a new form of human-nonhuman entanglement in meaning-making practices, which cannot be discussed under the rubric of indexical representation.
Keywords
Introduction
The Huawei P30 Pro smartphone, unveiled in March 2019, surpasses its counterparts with its periscope zoom lens, SuperSpectrum sensor, and—perhaps most importantly—its self-developed artificial intelligence (AI) system, which is designed to automatically recognize scenes and optimize camera settings accordingly (Li, 2019b). Moon photography is notoriously challenging to manage on a smartphone—results most often resemble featureless, celestial ping-pong balls. Moon Mode, part of what Huawei calls its Master AI, 1 is a pre-installed algorithmic program that intelligently detects and enhances images of the moon captured by the phone. Its introduction sparked a heated debate.
This is where the controversy began: Chinese tech critic Wang’s (2019) posting on Weibo, the Chinese equivalent of Twitter, made quite a splash. In his post, Wang put forward a shocking argument: he said that Huawei’s Moon Mode actually photoshops moon images. He contended that, based on his self-conducted experiments, the system ‘paints in pre-existing imagery’ onto photographed takes, re-constructing details that are not captured in the original shots. Huawei immediately refuted these claims, stressing that the Moon Mode system ‘operates on the same principle as other Master AI modes that recognize and optimize details within an image to help individuals take better photos. . .. The shot can still be taken without AI mode because of the periscope lens’ (Huawei, 2019, cited in Brown 2019). Huawei’s (2019, cited in Brown 2019) rebuttal emphasized that Moon Mode ‘does not in any way replace the original image’; it would not be able to do that, considering the unrealistic storage requirements necessary to fulfill such a function for its over 1300 recognizable scenarios.
This article examines the Huawei Moon Mode controversy discourse and theorizes how algorithmic intermediation recontextualizes photographic practices. I will conduct a close reading of how Huawei’s marketing materials defined algorithmic photography and how the Chinese public negotiated what constitutes AI enhancement as opposed to superimposed alteration in the context of Huawei’s P30 AI-enabled camera. Huawei’s P30 user guide and relevant promotional materials are analyzed, followed by an in-depth examination of the heated discussions about Huawei’s Moon Mode that took place on social media platforms Weibo and Zhihu. Rather than recapitulating concerns about the apparent loss of photography’s evidentiary value owing to its translation into ‘digital code’ (Manovich, 2001), I explore how the Chinese public tried to understand the operative relations of human vision and algorithms (or computer vision) in AI camera practices.
This article aims to contemplate the algorithmic turn in photography and the ways in which the Huawei Moon Mode controversy foregrounds the entanglement of human and machinic agencies in AI-enabled photographic practices. Previous conceptualizations of the photographic image as an indexical record of a past moment appears increasingly inadequate when confronted with photography’s diffusion into computation. The algorithmic image cannot be fully understood through the dissolution of indexicality, since its final appearance does not derive from the direct agency of light but instead is the result of algorithmic processes (Røssaak, 2011: 193). We now witness the reorientation of photography, from its historical status as a visual rhetoric whose meaning has to be anchored via an esthetic/interpretative practice, to a condition wherein the image increasingly operates as ‘a form of data regulated by statistical/algorithmic processes’ (McQuire, 2015: 125; emphasis in original). The taken-for-granted idea of photography as a practice dictated by solitary operators responsible for pressing the shutter release is clearly shattered (Palmer, 2013). Machine learning-based algorithms neither uniformly act upon images, nor simply single out pixels hinged on chrominance or luminance levels. Instead, they are programed to intelligently identify what those clusters of pixels signify, enabling detected regions (such as the moon) to be accentuated or processed differently in comparison to surrounding areas. The extent to which algorithms automate the photographic practice brings into question the redistribution of agencies within the camera-operator assemblage: To what extent is the user still the creative agent responsible for creating a photograph when at the same time AI is undertaking pixel-by-pixel modification of the resulting image? Does this indicate a change of kind or merely a change of degree? MacKenzie and Munster’s (2019) idea of distributed invisuality describes the operationalization of contemporary visual culture as a highly distributed architectural form of data practice. Drawing on this idea, I argue that algorithmic photography needs to be seen as a dynamic, multiplicative event of algorithmic processuality, indicating a new form of human-nonhuman entanglement in meaning-making practices, which cannot be unpacked under the rubric of indexical representation.
Photography as data assemblage
At the outset, consistent with other innovative technologies, smartphones tended to be experienced as disruptive. Users gradually became accustomed to them as tools for daily practice on a personal level. Existing scholarship on smartphone camera practices tends to focus on emplacement and locative media (Frith, 2015; Wilken and Goggin, 2014), self-presentation (Ito et al., 2006; Villi, 2010), and the social affordances and constraints of emerging forms of interactive visual communication (Chester, 2012; Hjorth and Pink, 2014; Peters and Allan, 2018; van Dijck, 2008). One problem recurs in this previous research: it is largely developed out of concepts both of images as representational and humans as the primary perceivers within networked media flows. But the camera is no longer merely an optical device to inscribe light; rather, it is a sensing technology capable of carrying out multilayered forms of data processing, distribution, and integration (Hjorth and Hendry, 2015; Yoshida, 2019). An AI camera observes. It also partakes in the creation of new content, together with its human counterparts.
The development of AI has arrived at an intriguing point: while it has not yet reached the hypothetical point of technological singularity, where an intelligent agent enters an irreversible ‘self-improving circle’ (Vinge, 1993), it has indisputably encroached on previously human-dominated realms. On their face, AI-generated images have become increasingly indistinguishable from their human-authored counterparts. Ritchin (2009) has proposed that online databases are transforming photography into a ‘hypertextual medium’ that reworks the authorial image through an ongoing conversation among a ‘multiplicity of perspectives’ (p. 180). Similarly, Uricchio’s (2011) study of Microsoft’s Photosynth and image-recognition-based augmented-reality programs shows that these applications offer a robust alternative to the visual economies of the past. Our once ‘transcendental’ vision of viewing has been fundamentally replaced by an ‘algorithmic regime’ that structures the parameters of our engagement with image assemblages (p. 31). AI assists the human selection process by labeling and appropriating images from massive collections. The signification of (networked) images is not only sustained by its visual content but also includes information such as time labels and geographic tags.
Existing studies have examined how algorithmic media facilitate a collaborative process of meaning making after a photograph has been uploaded online (Arriagada and Ibáñez, 2020; Uricchio, 2011). However, how the public (not to mention a Chinese public) perceives and negotiates the technical recontextualization of the human-machine relation in the production of vernacular images (in ordinary photographs such as snapshots, typically taken by smartphones) is rarely discussed. Through a case study of Google’s Pixel 3 smartphone, Taffel (2021) investigated the ways in which computational photography alters the representational and social functions of photographic imaging. Aside from this, little work has yet been carried out on the theoretical reflection of the entanglement of human and machinic agencies in algorithmic photographic practices.
This article aims to rectify this scholarly dearth, taking the Huawei Moon Mode controversy as a point of entry to rethink the human subject’s position in photography’s almost complete algorithmic conditionality. Heidegger’s (1977 [1938]) well-known essay, ‘The Age of the World Picture’, describes how a key characteristic of the modern age was inaugurated by two entwined events: the human becoming a subject, and the world being rendered as a picture. Representation, as per Heidegger (1977 [1938]), does not merely mean viewing a picture, but rather operates as a diagram that defines the contouring of the very possibilities through which the human subject is capable of objectifying the world (p. 127). If we follow Heidegger’s (1977 [1938]: 133) point about the Welt-bild—where the world as picture, ‘when understood essentially, does not mean a picture of the world but the world conceived and grasped as picture’—programs such as Moon Mode profoundly resituate our position as subjects vis-à-vis the world in a new AI age.
The algorithmic turn puts forward a departure from conventional conceptualizations of photography that frame photography’s defining character as its technical capacity to mechanically capture moments of time at the instant of shutter release (Palmer, 2015: 145; Taffel, 2021: 243). The epistemological connection between photography’s automatism and indexicality is elucidated succinctly in Bazin’s (1960) essay, ‘The Ontology of the Photographic Image’: This production by automatic means has radically affected our psychology of the image. The objective nature of photography confers on it a quality of credibility absent from all other picture-making. (pp. 7–8)
Likewise, Barthes (1981 [1980]: 76) described the medium as a ‘photographic referent’, which comes into being through its rhetorical life; it articulates an essential relation to an object/scene for its having-been-there. The algorithmic process does not simply undermine the oft-imagined indexicality of the photograph by decomposing the causality between representation and the scene represented. 2 More importantly, with an AI-enabled camera-device, when the image becomes a kind of program, a process expressed in and through algorithmic software, photography is reinstituted into an open-ended (and largely unknown) processuality that operates on the raw data collected by the light-sensitive sensor in a camera-computer. In this vein, a digital camera can no longer be considered a passive recording device. It does not simply capture pictures; rather, it makes them (Hayes, 2008: 94). On this note, Berry (2011: 11) called algorithmic photography ‘instrumentalist’ and ‘computationalist’. The algorithmic image is now mathematically programed to produce visual effects that would otherwise have required a physically larger camera sensor or a professional photographer (Taffel, 2021). Considered computationally, the image never reaches a state of finitude. Rather, it operates in a constant state of deferral of data processing through which it is no longer clear where the image is even located and when it will be finalized (Rubinstein and Sluis, 2013: 27). Because the algorithmic image is now extracted out of an endless stream of data and then re-inserted into sequences of data processing, it becomes continuous and processual, and can no longer be discussed under the rubrics of indexical representation.
By tracing how the Chinese public negotiated the difficult reconciliation of machinic vision and human intention in smartphone photographic practices, in the following sections, I will suggest that the very notion of photograph as an artifact that can be contaminated by algorithmic manipulation rests on a conceptual mistake: an erroneous understanding of the operativity of generative algorithms and how machinic visuality is carried out.
Moon mode: algorithmic enhancement or photoshopped alteration?
Palmer (2012) has reminded us that the history of photography is also ‘a history of automation’ (p. 37; emphasis in original). With the release of devices such as the Olympus OM-D camera, the automatism of the camera as a prosthetic apparatus and the availability of computational filters have enabled an experiential form of image capture, whereby amateur photographers can immediately take part in in the creation of the world in visual terms. As Palmer suggests, these experiential ways of framing photographic practices as a revitalization of individualist, world-making acts were constructed from a marketing point of view. They were part of an ongoing endeavor to elevate photography as a leisure activity that made investing in a dedicated consumer device worthwhile in order to pursue what Palmer (2012: 3) calls ‘photographic individualism’—a kind of ‘“I was there” “procession-based” ideology’ of claiming one’s visual agency, which is in contrast to Sontag’s (1977) critique of camera as ‘an act of non-intervention’, a ‘detached’ way of seeing the world.
The promotional rhetoric of the Huawei P30 Pro smartphone followed a similar strategy. By delineating the new Leica Quad camera system as a ‘smart’ extension of the user’s perceptions, Huawei’s (2019) advertisements highlighted how the model ‘rewrites the rules of photography’. More importantly, the material illustrates that the combination of the periscope telephoto lens, Huawei’s SuperSpectrum Sensor, and the AI-based Moon Mode can bring the splendor of the moon before the user’s eyes, symbolizing ‘a new peak of smartphone photography’. Theoretically, astrophotography is often carried out under the condition of exposure stability, meaning that camera needs to be mounted on a tripod to allow maximized exposure (Schröder and Lüthen, 2009). But the combination of Huawei’s hardware and image-processing programs overcomes this limitation. Specifically, Huawei’s promotional rhetoric claims that the new model can: . . . zoom in to explore the mystery of the celestial at night . . . [and] capture the best things in the moment and create your vision for the future. . .. With up to 50× zoom, the Huawei P30 can capture the moon for you. Fall into the romantic moonlight and be amazed by the clarity. . .. Even in extreme darkness, you are able to discover a vast array of colors.
Indeed, Huawei’s marketing material frames smartphone photography as a practice that partakes in the discovery of the world. It is open-ended and at the same time potentially tailor-made for individual expression. Importantly, not only is the meaning of photographic activity reconfigured as a participatory experience, but the camera apparatus itself is also promoted as an enabling yet ‘transparent broker of the real world’.
However, such transparency has been called into question. On April 13, 2019, the FView tech critic Wang (2019) claimed that the moon images were in fact photoshopped by algorithms, despite Huawei’s claim that the phone camera’s 50× zoom capabilities captured all the astonishing details. To photoshop an image (most often abbreviated as ‘P’ image, or P图 in Chinese) is a Chinese colloquialism that means to make substantial modification to an image after its creation, so much so that the modified image becomes less real. Yuekun Wang is a content creator of FView, an evaluation agency specializing in reviewing cutting-edge technological products. Peng (2019), the founder of FView, subsequently posted a 25-minute video on Weibo, confirming Wang’s statements, in which he said ‘it is impossible for these details, added through the phone’s algorithms, to be captured under the model’s original image-taking capacities’. Peng called for Huawei to rectify its promotional material by clearly stating that the system was adding visual effects to the original shots. Huawei did not respond to Peng’s comments directly.
Wang’s claim against the model was predicated upon a series of self-conducted experiments. In one of his experiments, Wang superimposed FView’s transparent logo on a moon photograph captured by the phone. It turned out that even though the superimposed distraction did not in any way resemble the texture of the moon surface, the system would automatically rectify it, making it look something like a lunar crater. To further corroborate his statement, Wang uploaded a one-minute video on Weibo of himself taking pictures of the moon with a Huawei P30 Pro without a tripod. In the video, the phone camera grossly overexposed the moon at first. The captured moon appeared to be a blurry light bulb. But immediately after the algorithm detected the object, the edge of this ‘moon light bulb’ dramatically sharpened. The system even created thinner, less noticeable halos around the edges of the moon. Wang therefore concluded that Moon Mode automatically recognized the moon, applied autofocus, and cleared the outlines of the moon under different light conditions. That is, the system algorithmically superimposed an archetypal moon on the existing photographic image and yielded a result emulating the optical qualities of a painting or, in Wang’s words—photoshopping. The AI camera was thus no longer a transparent broker to master, but a computational apparatus that sensed similitude and constructed cookie-cutter products. 3
What constitutes algorithmic enhancement as opposed to photoshopped alteration became the focal point of the public discourse. The top, most frequently asked question on Zhihu, a Chinese question-and-answer knowledge-sharing platform, was: What is your opinion on Yuekun Wang’s claim that the moon image taken by Huawei P30 Pro is being photoshopped? Another tech columnist, Xiaocheng (2019), responded. His answer was most well received and became the spotlight article of Zhihu’s daily newsletter. Xiaocheng partially corroborated Wang’s claim that algorithms did indeed accentuate the visual details recognized as moon-like, but contended that the structural groundings of Moon Mode were much more complex than merely all-encompassing, prototype-superimposition techniques. According to Xiaocheng, the defining parameter distinguishing what had been photoshopped lay in whether external information had been introduced to the existing image, which Xiaocheng referred to as the most ‘commonsensical line of demarcation shared by the general public’. With respect to what constituted an ‘external element’, Xiaocheng further explained: If the enhancement is entirely built on existing information, we cannot call the image a ‘photoshopped’ piece. By ‘photoshopping’, it means you bring in information that is absent from the existing photographic image. If you optimize the color intensity for sky, water, and flower, this does not constitute ‘photoshop’. But if AI ‘thinks’ the sky should be cloudy and ‘paints in’ cloud for you, this constitutes an example of ‘photoshopped’ image.
To figure out whether Moon Mode introduced any external information into the photographic images, Xiaocheng prepared a modified moon image to test how the algorithms worked. He reversed the direction of the Aristarchus crater, attached two heart-shaped icons (‘noise’) onto the moon surface, and removed the traces of a crater chain that lay between the Mare Tranquillitatis and the Mare Serenitatis. He then applied a Gaussian blur to the image. Once he triggered the Moon Mode, the image was instantly repaired back to its untouched condition: not only was the contrast of brightness and shadow sharpened, but the heart-shaped ‘noise’ was eradicated, the direction of the Aristarchus crater was corrected, and the removed crater chain reappeared on the moon surface. For Xiaocheng, this resulting image compellingly demonstrated that Moon Mode was capable of ‘recovering deleted details, correcting reversed directions, and removing unwanted distractions’. As a prerequisite for such algorithmic manipulation, the system must know how the moon should look in the first place. In other words, algorithms have to be trained to semantically segment, recognize, and augment the moon as a pixelated data repository. Xiaocheng considered this to be the most technically challenging part of an AI program. He concluded that Moon Mode can be seen as a kind of photoshopping software and Huawei’s advertisement should have informed consumers more appropriately about the fact that it could algorithmically generate details that were not captured in the original take. To conclude his post, Xiaocheng encouraged his readers to contemplate the fact that something larger was at stake.
Algorithms (discriminately) ascribe value to different segments of the moon: some details are accentuated while some are displaced. After all, it is clear that the system does ‘add external information’ to the photograph taken by the camera phone. But whether such manipulation destroyed the image’s documentary feel, and whether the resulting image could still be considered as my work are complex questions, to which I cannot provide a solid answer.
Another way to recapitulate Xiaocheng’s questions would be: if AI cameras are programed to reproduce some kind of archetypical representations, does this technology still enable individual expression? Are human intention and hardware capability being rendered merely superfluous in the process? When photography transforms from a mode of visual rhetoric to numeric abstraction and pattern recognition, what does it even mean to take a photograph? For Chinese tech critics, what was at stake was no longer the illusion of the photograph as the factual manifestation of reality. Rather, it was rather the tension of human and algorithmic agencies in photography as a programmable apparatus for world making. If the moon photograph produced by this Huawei device exemplified what Flusser (2000) posited as a ‘technical image’—namely, a photograph that corresponded to certain functionalities of an apparatus contingent on automation—snapshots of the moon taken by the user would then belong to the realm of ‘redundant informative excess’ and carry no new information other than programed conventions. According to Flusser (2000: 26), the majority of photography is redundant, exhausting itself stylistically to reproduce clichés established by the apparatus. In that vein, Flusser (2011: 20) pessimistically held that ‘the photographer can only desire what the apparatus can do. . . [and] the intention of the photographer is a function of the apparatus’. Flusser’s critique of so-called creative photographs is not merely a critique of image, but a critique of vision and the world as seen through them.
Distributed seeing
Xiaocheng’s and Yuekun Wang’s arguments share the same assumption: there exists an ‘untouched’ image created with the release of electronic shutter that awaits to be ‘corrupted’ by algorithmic manipulation. The operative logics of most commercial software are proprietarily enclosed on the grounds of corporate intellectual property. Hence, it is necessary for the public to infer whether algorithms bring qualitative transformation to what a ‘raw photograph’ is ‘supposed to’ express. Their assumption resonates with Osborne’s (2010) idea on the constitutive disjunction between capture and production in photography. For Osborne, the ontological anxiety over the impending demise of photograph’s privileged relation to ‘the real’ arises, at least partially, from the rise of digital computers in the late twentieth century is inherently misplaced. He contends that in order to understand the so-called ‘digital turn’ of photography, it is crucial to divide photography into two separated phases: the stage of image capture and the stage of post-capture manipulation. The photograph at the instant of shutter release still ‘retains both the casual and deictic aspects of photographic indexicality’ (Osborne, 2010: 63), whereas it is rather the subsequent processing phase that shatters the meaning of the photograph (Osborne, 2010: 64).
However, such dualism of capture and manipulation was called into question in the debate. Among the top-rated answers on Zhihu, Xiaoxiong Lin’s post counterargues Xiaocheng’s view concerning the ways in which Moon Mode applies inappropriate alteration to the moon image via superimposing and reassembling details foreign to the original photograph. According to Lin, the so-called ‘original photograph’ does not even exist. To corroborate his point, Lin makes reference to a GAN-augmented image sourced from Twitter, which shows how generative algorithms reconstruct details of bark texture on a low-resolution image of a wood cabin. Image-specialized, generative adversarial networks (GANs) are generative algorithmic architectures capable of filling in new, realistic details on low-resolution images. Generative algorithms work in contrast to discriminative algorithms, which are instructed to classify input data into correlations and assign classes/categories to given datasets (Nicholson, 2020). Generative algorithms including GANs are carried out in a double-feedback-loop convolution. On the one hand, one neural network, the generator, attempts to generate new data instances, hoping them to be deemed authentic, even though they are purely constructed. On the other hand, another neural network, the discriminator, evaluates each instance for authenticity, predicting the relative realness of each ‘hand-written’ digit against actual, ground-truth datasets in analytical terms. The discriminator takes in both ground and constructed digits and returns prediction of authenticity labels and their relative possibilities (Matcha, 2019; Nicholson, 2020). That is to say, unlike discriminative models such as taste recommendation algorithms that learn and set up boundaries between classes, generative models like GANs shape the distribution of individual classes relatively.
Lin (2019) compared the operativity of Moon Mode to generative algorithms like GANs, attempting to show that we need to take AI-enabled photography as a set of dynamic mediations, rather than as finished objects. As Lin stressed, for programs like GANs, there is no ‘standard answer’ to work with because the augmented details are indeed traceless in their ‘original place’. If we follow Lin’s thread of thinking, it is unproductive to even draw the line between ‘enhancement’ and ‘alteration’ because algorithms have technically recontextualized the once discretely bounded, structured, and stable pictorial artifacts into a set of fluid, relativist probabilities of digital datasets. The intruding of ‘new information’ becomes a new technical norm when the once disjunctive stages of capture and manipulation have merged into a set of dynamic, encompassing processes of mediation and generation that extends far beyond photography’s representational orientations.
Lin’s rebuttal to Xiaocheng’s answer did not cause much sensation because it lacks an accessible language that colloquially explains how GANs function and their comparability to Moon Mode. But, nonetheless, Lin alludes to a productive way of thinking—in and through the so-called ‘machine vision’, a multiplicative matrix of mediative processes that is dynamic and impossible to see its operativity in any meta-observational display. According to Huawei’s (2019) patent document, the Moon Mode system is primarily operationalized under the principles of pattern recognition and high dynamic range (HDR) methods. The HDR imaging method is a technique used to accentuate the trace of shadow and light of an image through laying multiple surfaces of ‘a scene’ with different exposures (Germen, 2013). The method is useful for photographing the moon because this real-world scene involves a very bright, shining moon, very faint nebulae, and an extreme shade of midnight sky. Standard single-exposure techniques only allow for differentiation within a certain range of luminosity (Guthier et al., 2013). Outside the range, no ‘information’ is visible because the brighter areas appear pure white, while the darker areas are obscured in pure blackness. The moon image produced by the P30 Pro is a computational rendering derived from capturing and then combining several narrower-range exposures of the moon (Li, 2019a; Qingjia, 2019). To ensure that the pixel locations of the resulting image are perfectly aligned, Moon Mode would algorithmically rearrange the radiance map and saturation levels. The hardware as well as the computational architecture of the smartphone work together to permit the synchronization of the camera sensor and the processing pipeline at the microsecond timescale (Wang and Zuidun, 2019).
The Huawei Moon Mode controversy points to the difficulty of reconciling the ‘visions’ returned by the algorithms with the view of a human subject in smartphone photography. However, if we closely interrogate the operativity of image-processing algorithms and Huawei’s HDR method, the moon image in fact emerges out of a multiplicative matrix of data rendering across levels of hardware (i.e. camera sensor and CPU) and software platforms. The claim for an ‘autonomous’ machinic vision associated primarily with artificial intelligence is fundamentally delusive. MacKenzie and Munster’s (2019) idea of distributed invisuality, which describes the operationalization of contemporary platform-based visual culture, can be helpful here. Drawing on Bergson’s (1991) idea of image-matter as ’relational entity’, MacKenzie and Munster (2019: 1) investigate how transformations in the accumulation of image datasets as ensembles by platforms have a generative force for formatting the emergent sociotechnicality of platform cultural forms and ‘perception’. Platforms consist of levels, and individual image has little significance for platforms. The meaning of the image derives from its relational positionality within data ensembles that flow across different levels of the platform (MacKenzie and Munster, 2019: 5). For example, AlphaGo’s accomplishment in the game of Go has little to do with abstract game intelligence. It was imagistically trained on an immeasurable archive of previous games of Go. AlphaGo’s deep convolutional neural networks learn to ‘see’ the Go board positions as 19 × 19-pixel image snapshots (Silver et al., 2016). At core, the (dis)placement of Go pieces is translated into quantifiable, local spatial correlations that can be associated with actions and rewards of certain actions (MacKenzie and Munster, 2019; Mnih et al., 2015). That is to say, the ways algorithms and platforms see fundamentally challenges the representationalist notion of ‘perception’. It is impossible to grasp the processes of image datasets from any holistic standpoint (MacKenzie and Munster, 2019: 15). In a similar vein, the ways in which Moon Mode ‘sees’ the moon—which Yuekun Wang and Xiaocheng attempt to decode—should be understood as a mode of ‘observation’ events, distributed throughout and across the user interface, hardware, human agents, and computational architectures, eliding any singular coordination position. The moon ‘photograph’ is an expression of data assemblage, something that cannot be discussed under the rubric of indexical representation. Thus, the answer to the Moon Mode debate lies not in an interpretation of image integrity in terms of how much the image has been altered, but in a reconfigured understanding of algorithmic photography as a kind of dynamic, multiscalar process of mediation, through which human and machinic agents are entangled, to negotiate and claim the visuality of the moon. The question we should be asking through this public discourse, then, is not ‘Does Huawei’s Moon Mode “alter” the photograph?’, but rather, ‘How does AI technically recontextualize photographic practice as a way to see and express the world?’
Not a conclusion: AI photography as an object of politics rather than progress
The Moon Mode discourse shows that the (tacit) rules underwriting appropriate algorithmic intermediation in photography are open to negotiation; governing norms are still consolidating. As the Chinese tech critics self-reflexively recognized in the public discourse, the thresholds that differentiate a photo and an image dataset are not yet clear. The moon image produced by the Huawei P30 Pro undergoes a series of mediations: the light refracted by the illuminous celestial body is captured and converted to an electronic signal, and abstracted into a mathematical representation. It then transforms into the onscreen digital inscription that reaches our eyes.
Underlying this transformation of the camera from a light-inscription device to a fully fledged computational workstation is a shift of great significance. When we consent to allow algorithms to decide what are or should be seen as moon-like features, while discriminately weeding out atypical and extraneous details, the very notion of seeing and imaging the world has been technically recontextualized onto a new horizon. The visuality of generative AI unfolds as a multiplicative, mediative process of data processing: it distributes the activity of seeing transversely across camera lens, sensor, file, database, screen, and the human operator. In charting how the moon image is produced, we observe how image-capturing and -processing has become a site of sorting operations and predictive analysis. Algorithmic programs like Moon Mode substitute the visual stimuli of the moon for a mathematics of the moon, reducing its meaning to measurable, predicable criteria. But its system accuracy and performance still depend on the smartphone operator to choreograph the moon in a way that is recognizable for the algorithms. That is to say, the human operator still remains an indispensable component in the distributed seeing events of AI.
In Nonhuman Photography, Zylinska (2017: 5) conceived of an expanded concept of the medium, one that extends beyond the limitations of representationalism and humanism to encompass images not made by human actors, of human subjects, for human viewers. On one level this broadly resonates with what Paglen (2019) has recently termed ‘invisible visuality’, a new kind of machine-intelligent visuality constituted by a world of autonomous image-interpreting systems (e.g. QR codes, artificial imaging intelligence, satellite imaging, and more). This mode of visuality is paradoxically largely incommensurable to the human eye. To a certain extent, Zylinska’s redefinition of photography gives heed to nonhuman agents without counterpoising human vision and a machinic one. Photography, as per Zylinska, expresses the possibility of understanding human-nonhuman entanglement as co-constitutive becoming. Moving away from the conventional wisdom on photography that purports to see the medium as an ‘agent of Death’ (Barthes, 1981 [1980]: 92), bearing testimony to human mortality, Zylinska (2017: 7) makes a case for the ‘ontological singularity’ of photography. Leaning on the Bergsonian/Deleuzian idea that ‘time, duration, and movement, stand precisely for life itself’, Zylinska (2017: 5) conceives of photographic practice as ‘a formative practice of life’ (emphasis in original). In this sense, the practice of photography is, then, ‘a form of cutting’ in the flow of time (Zylinska, 2017: 72). It is a way of ‘temporarily stabilizing matter into forms’ that articulates a non-anthropocentric creative impulse (Zylinska, 2017: 75).
If we really take cognizance of Zylinska’s idea that photography temporarily stabilizes the vital process of life, can the moon image be considered as an interruption that cuts into the multiscalar processes of data labeling and calculating that take place beyond the threshold of human perception? If so, what can the moon image tell us about what Zylinska calls the ‘human-nonhuman entanglement?’ By positioning AI-enabled smartphone photographs as a set of multiplicative processes of mediation rather than as finished objects, we can explore more explicitly the ways in which practices and affordances of smartphone devices spur each other on, and ponder how they become entwined in dynamics that constitute our particular ways of engaging the world. In other words, Huawei Moon Mode foregrounds the shift from technology-as-material-artifact to technology-as-infrastructural-interface, a new territory that extends beyond conventional image capturing to embrace instant pattern recognition—pixel manipulation through which human and enactive machinic agents interact in functional, sociocultural, and aesthetic terms (Magaudda and Piccioni, 2019).
The algorithmic turn in photography correlates with broader technocultural transformations regarding the increasing automation of communicative agents. The proliferation of AI-enabled technologies and people’s interaction with them—through automated chatbots, conversational virtual agents, and language-generation software—point to a new paradigm of human-machine communication (HMC) in which enactive machinic agents join forces with human in sociocultural endeavors (Guzman, 2019; Natale and Cooke, 2021). Lomborg and Kapsch (2020) have shown that investigating the ways in which people think, feel, and do about algorithms can ignite concerns, ‘raising consciousness and public debate around algorithmic fairness and transparency, and by extension, the role and ethics of algorithms, data, and technology in broader sense in shaping our society’ (p. 748). In this vein, the Huawei Moon Mode controversy should neither be read as merely a criticism of a particular technology nor an interrogation of whether Huawei has committed photographic chicanery. Instead, it needs to be interpreted as a (re)negotiation of a larger trend of human-nonhuman entanglement that data-oriented HMC brings into being. Algorithmic photography, or, more generally, HMC is about meaning-making (Guzman, 2018: 21). The Moon Mode controversy is an instantiation of an ongoing ontological and ethical-political negotiation that unfolds at multiple fronts: how the technology was received and mobilized by its users; how the autonomy of algorithmic operations was conceived in the public arena; and, perhaps more importantly, how the very nature of representation is shifting vis-à-vis the new forms of human-nonhuman entanglement expressed in HMC, a multiplicative, mediative event of computation whereby human beings still occupy an indispensable position. Hence, what is important about the Moon Mode controversy is that it offers us a compelling heuristic point of entry to specify the questions at stake in the algorithmic turn of meaning-making practices. Our relationship to technology is dynamic, and the sociocultural practices that take shape in and through our interaction with computational devises are subject to ongoing negotiation (Lewis et al., 2019). The relational dynamics through which we make sense of and associate with algorithmic technologies, in turn, (re)defines ourselves and our relationship with others and the world (Guzman and Lewis, 2020: 70; emphasis in original).
Kember (2014: 197) has called for treating algorithmic technologies, such as the face-recognition smart camera, as objects of politics rather than of progress. The algorithms of face recognition technology reinforce and materialize a way of seeing/thinking that is fixed, essentialist, and ultimately discriminatory. The faces in face-recognition systems are sorted and recognized under an either/or logic—faces as female-male-Black-White-old-young—thereby both gendering and racialising the stereotypical appearance of terror. Huawei’s Moon Mode works in a parallel logic: it discriminately classifies pixel information into two categories: moon-like and non-moon-like. But the moon cannot be reduced to a fixed and finished object. Rather, it is a celestial phenomenon that is forever changing and is forever being perceived and recognized by different people with different faculties of sight and mind. In this light, what is at issue in the Huawei Moon Mode controversy is not about whether it simply replaces a photograph with a virtual ‘sticker’ of the Moon, or even how to draw the line between AI enhancement and superimposed alteration. These questions about visual representation risk oversimplifying the real issues at stake. They miss an opportunity to engage with the ontological and ethical-political implications about the new forms of relationalities entailed in smartphone photography as a multiplicative process of human-nonhuman-entanglement. As Deleuze (1994) proposed in Difference and Repetition, thinking does not comprise techniques and practices of problem solving. On the contrary, true thinking emerges in and through problem-posing—that is, proposing the right problems and thinking diagrammatically, without limiting oneself to predetermined, essentialist views (De Landa, 2000). In light of this, we should instead ask: do existing AI-enabled image-processing programs function in the way that Deleuze and Guattari (1987) called an ‘abstract machine’, generative of energetic possibilities that are yet to come, instead of attempting to superimpose essentialist views on the genesis of forms? Can the moon image operate as a dynamic site of becoming that derives from the collaboration of human sight/mind and machinic visuality?
Footnotes
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
