Abstract
City digital twins (CDTs) are attracting considerable attention as a tool for the management of cities facing many kinds of crises. CDTs are three-dimensional, animated digital models of cities that display the big data generated by urban sensors in near-real-time. Most of the academic literature on CDTs consists of technical discussions of their design or implementation. This paper instead proposes a different context for understanding CDTs. It adopts an intersectional feminist and techno-cultural analysis of the visual form of power enacted by CDTs, as exemplary of a much wider turn in contemporary visual culture towards visualising the world digitally in three dimensions. Many critical accounts of this visual and volumetric regime are attentive to the forms of corporealised human life that emerge in relation to the picturing of various volumes. This paper examines how CDTs visibly co-constitute a number of digitally-mediated forms of human life, including its user and its human inhabitants. The paper argues that, while both the technological affordances and the cultural imaginary of CDTs perform the kind of powerful white masculinity that sees space as transparent and actionable, discussions of CDTs also persistently generate an excessive form of human life that is understood as untwinnable. The paper argues that masculinist anxieties around this excess erupt in another form of digitally-animated volumetric city facing disaster: the disaster movie. The paper thus makes an argument about the specific form of white masculinity co-constituted with CDTs and insists on the necessity of critical techno-cultural analyses of urban management technologies.
Introduction
City digital twins are a digital technique for urban management. City digital twins (CDTs) are ‘realistic digital representations of cities (including their assets, processes, and systems) that aid decision-making aimed at delivering city-level outcomes (urban planning, management, and associated services) and provide improved insights for decision-making’ (Nochta et al., 2021: 264). They render urban streets, buildings, vegetation, people and infrastructure as digital three-dimensional visual objects in geometric space, calculated using Cartesian coordinates, and they display real-time data feeds from urban sensors. The digital twin technique emerged in the manufacturing sector: three-dimensional, animated digital models of physical objects were designed to display real-time data about that object in order to monitor and manage its performance (Ketzler et al., 2020; Korenhof et al., 2023). By the mid-2010s the concept was also being applied to cities, and CDTs are currently being positioned as the next best practice in urban management by commercial vendors, city authorities, policy thinktanks and academic scholarship: as the World Economic Forum opines, ‘through the precise mapping, integration and interaction of real and virtual environments, digital twin technology presents the opportunity for safer, more efficient cities, more convenient and inclusive everyday services, and a reduced environmental footprint’ (2022: 4; see also Arup, 2019; Batty, 2018; Kitchin and Dawkins, 2024; Nochta et al., 2021). Like smart cities, CDTs aim to use big, real-time data to manage urban infrastructure and populations.
The design and deployment of CDTs displaying near-real-time urban data face considerable technical and other challenges. The availability and interoperability of data is a major concern in the literature, and so is the issue of data privacy (Weil et al., 2023); moreover, many planners have neither the skills nor the resources to create and utilise a CDT (Kitchin et al., 2021). Nonetheless, several major software companies are marketing city digital twin software, university research centres are designing them, and so too is the market-leading vendor of Geographical Information Systems, Esri. While few cities have developed city-wide twins animated by the flows of near-real-time data, very many cities have small-scale twins, perhaps at a low level of visual detail, perhaps just for a neighbourhood, or for a particular purpose such as traffic management (Ferré-Bigorra et al., 2022) (Figure 1).

This graphic appears in White et al.'s (2021) paper discussing their work on a CDT for Dublin as an explanation of the different data ‘layers’ of a CDT and is often reproduced in CDT discussions.
Most of the academic literature on CDTs consists of technical discussions of their design or implementation. This paper places CDTs in a different context, however, in order to develop a different critique; it discusses them in relation to contemporary popular visual culture, in order to explore their specific visual qualities. CDTs require complex data integration and processing, and can also host simulation models. This paper, however, focusses on their power as digital animations. Animation is ‘the dominant medium of our time’ (Levitt, 2018: 1). As animated three-dimensional models of cities, they are part of a much broader shift towards three-dimensional imagery: efforts to re-create and connect physical objects and places inside the computer are incrementally superseding historically flat formats of maps and plans, across myriad domains: cartography (Google Earth, 3D GIS, LiDAR scans), industrial training and management (virtual factories), art and retail (3D collectibles), furniture specs (3D measurements), museums and real estate (360° virtual-reality tours), architecture, engineering, and construction (building information models), and ‘smart’ city planning (city information models) [ie city digital twins]. Even the planet has been made in model form for precise resource extraction (3D mining and geological simulations). (Ng, 2022: 298)
Screens offering animated views of (digitally-generated) three-dimensional images of the material world are ubiquitous. To this list could be added the continuing popularity of 3D and IMAX movies, 3D billboards, computer games, the post-production visual effects of all films and TV programmes, facial recognition devices, and the scanners of 3D image capture apps on smartphones: iPhones and iPads have had scanners and software to generate 3D volumetric models from their cameras since 2020, and Microsoft and Google have experimented with similar. So extensive is this three-dimensionality that Rocha and Snelting (2022b: 13) have described it as a ‘volumetric regime’ in which ‘proliferating technologies, infrastructures and techniques of 3D tracking, modelling and scanning are increasingly hard to escape’. As well as this volumetrisation, CDTs also enact the highly mobile point of view that sees three-dimensional images onscreen and which has been described as the ‘default value of digital vision’ (Elsaesser, 2013: 240).
CDT software and hardware are parts of a much wider turn to the animatic and volumetric in contemporary visual culture, and the paper will be attentive to the visual and spatial representations of cities enabled by the technological affordances of 3D computer graphics software. However, the paper also focusses on the hype about, the scholarship on and the advocacy of CDTs, which form a powerful technocultural imaginary through which the CDT is also defined. Indeed, the extensive discussion and promotion of CDTs overshadow the fact that there are very few fully-operational CDTs: recent reports by leading scholars actually emphasise their impossibility (Batty, 2024; Caldarelli et al., 2023). This paper thus assumes that CDTs must be understood as a fantasy as much as a feasible technique for urban management, as an imaginary as much as a technology. Which begs the question, what else is going on in the enthusiasm for CDTs?
Critical scholarship on technologies pays particular attention to the forms of human subjectivity that are co-constituted with technologies and their imaginaries. This paper develops an explicitly feminist account of CDTs which examines what kind of humans emerge alongside CDTs. Suchman coins the term ‘affiliated humans’ to refer to ‘particular objects of attention and concern and [their] inseparable knowing subjects’ (2011: 134). The paper will argue that several affiliated humans emerge from CDTs, as well as a specific vision of the urban: the technocratic urban manager who uses a CDT; the algorithmically-generated images of humans who inhabit a CDT; and a more unruly form of human life which is understood as excessive to the CDT. The paper further argues that these CDT-affiliated humans should be understood as particular forms of white masculinity. In this, the paper is inspired by McKittrick's (2006, 2021) critique of the whiteness of transparent geographies, as well as Mirzoeff's (2023) ‘tactical mapping’ of ‘white sight’. Mirzoeff suggests that there are various forms of different-but-related (masculinist) white sight: for example, the cartographer, the naturalist and the connoisseur, which emerged in part as different ways of seeing the world mediated by various kinds of visual technologies in the context of empire and plantation slavery. This paper explores what happens if the masculinist visibility of CDTs is similarly conceptualised as multiple and enacted across contemporary visual culture.
In their critique of the masculinism of both the tech industry and digital theory, Bassett, Kember and O'Riordan describe techbros as a ‘set of demi-gods who remain stubbornly invested – pyschologically and culturally – in habits of splitting and projection’ (Bassett et al., 2020: 19). This paper proposes that a stubborn investment in the fantasy of the CDT as both offering, yet failing to deliver, technocratic control over a city splits the CDTs’ affiliated demi-gods into the manager and his other. It proposes that this other becomes visible as the hero of the disaster movie genre of popular films. While this might seem an unlikely analytical manoeuvre, CDTs and disaster films are visual companion pieces in many ways. Both share the same volumetric organisation of urban space and offer the same mobile aerial viewpoint to their viewers. Both use digital visual effects to animate cities – indeed, some use the same computer graphics programmes. Both address cities as in crisis: many justifications for the adoption of CDTs argue that only a CDT can manage a city when its infrastructure is at breaking point and its climate at tipping point, while the films show cities actually overwhelmed by various geologic and meteorologic disasters. Both are attentive to the human bodies inhabiting these digital urban volumes. Most importantly, though, the paper will argue that there is also a relationality between disaster movies and CDTs because both make heroes of white men in relation to digitally-mediated cities, in contrasting but necessarily related ways. The paper thus contributes to critical urban scholarship that explores the constitutive role of imaginaries in contemporary urban technologies, by focussing specifically on the racialised and gendered power relations that are constituted in the volumetrisation of urban space (Kinsley, 2010; Kurgan, Brawley, House, et al., 2019; Rose, 2019; Wigley and Rose, 2020; Wilson, 2014).
The paper begins by flagging the extensive literature that identifies the racialisation and gendering of digital technologies since the 1960s as masculine and white and then tracks the symptoms of white masculinity in the figure of the user of a CDT. That user is not quite secure, however, and his admission that not all human life can be captured in a model motivates the paper's turn to the urban disaster movie and its repetitive storyline of (a particular kind of) human life surviving the spectacular, digitally-generated destruction of a city. Looking at different kinds of visual materials in this way entails a lack of detailed discussion of each – for example, the paper will not explore the possibility of ‘dimensional heterogeneity’ in CDTs (Bier, 2022: 681) or any individual film. But, as Gergan et al.'s (2020) discussion of apocalyptic imaginaries and the Anthropocene demonstrates, there is a need to risk generalisation in order to call out the powerful imaginaries which shape both popular and scientific discourses: in this paper, to evidence the enactment of race and gender in relation to visible, volumetric cities. The discussion of CDTs is based on an extensive review of academic and grey literature, as well as repeated viewings on websites and YouTube channels of digital animations produced by cities and software vendors illustrating the capabilities of their CDTs. The disaster movies were identified from searches on the Internet Movie Database and Wikipedia, as well as on various streaming platforms; I watched nineteen made in Hollywood between 1998 (Deep Impact and Armageddon) and 2022 (Moonfall), paying particular attention to their recurring narrative and visual themes.
White masculinities and digital technologies
There is extensive literature that examines the association of sophisticated technology with whiteness and masculinity. In relation to digital technologies specifically, this association has been analysed in terms of a number of interrelated processes: the discursive racialisation and gendering of certain kinds of skills and technologies; labour market practices that overwhelmingly employ white men in certain roles; various kinds of masculinist workplace cultures which often alienate non-white and non-male employees; the design of technologies which assume, embed and reproduce gendered and racist outcomes; and a widespread imaginary of technology as somehow neutral (key references include Benjamin, 2019; Buolamwini and Gebru, 2018; Cave and Dihal, 2020; Ensmenger, 2015; Ford and Wajcman, 2017; Hicks, 2018; McIlwain, 2020; Nakamura, 2007; Noble, 2018). This paper will focus on the technological affordances of CDTs as well as their imaginary, and on how both co-constitute gendered and racialised human affiliates (Figure 2).

Two figures from a report on the digital twin of Helsinki's Kalasatama district (Helsinki City and kira-digi, 2019: 11).
For Rocha and Snelting (2022b), the white masculinism of the volumetric regime consists of a combination of software affordances, commercial imperatives and a violent desire to reduce difference and multiplicity to the same by applying quantified computational metrics to the world. All of these processes are at work in relation to CDTs. In terms of software, the circulation of volumetric forms and images of urban space is in part enabled by the extensive use of computer graphics software for (what were once) quite distinct kinds of images (Cardoso Llach, 2015; Gaboury, 2021). An example in relation to CDTs is Unreal Engine. Owned by Epic Games, Unreal Engine is a software programme originally designed to build and play computer games. However, its ability to render animated, complex images extremely quickly – and the fact that a free version is available – has meant that it is becoming more and more widely used for many kinds of urban imaging, including computer games, special effects in movies and television programmes, architectural renders and fly-throughs, virtual reality environments, adverts and demos, museum displays – and for city digital twins. Esri, the company behind the most widely-used GIS product ArcGIS, has called its city digital twin offer ArcGIS CityEngine, and has allowed its GIS maps to be imported into Unreal Engine as well as specifying the workflow to import twins built in ArcGIS CityEngine. Unreal Engine's also enables the creation of virtual reality CDTs and attractive city branding imagery. There are also commercial imperatives at work in the promotion of CDTs. One is software companies’ need to create new markets for their products once their first customer-base becomes saturated (this dynamic is discussed in relation to smart cities by Luque-Ayala and Marvin [2020]). This is of course part of the broader privatisation of urban management to tech companies criticised by so much smart city scholarship (Figure 3).

A screenshot from Wellington's city digital twin demonstrating the display of near-real-time data, made by digital studio Buildmedia using Unreal Engine (https://buildmedia.com/work/Wellington-digital-twin).
Rather less attention has been paid in recent scholarship on digital urban technologies to the white masculinism that Rocha and Snelting also attribute to the volumetric regime. There is significant work by geographers and others who have discussed the extractive, surveillant and militaristic histories and technologies that have ‘secured the volume’ of urban, maritime, subterranean and aerial spaces (Adey, 2010; Elden, 2013; Jackman and Brickell, 2022; Lotfi-Jam, 2022; McNeill, 2020), but broadly speaking this tends to focus on the power of the state. However, some feminist contributions to this literature on volume are more attentive to the affiliated humans co-constituted with volumetrising technologies and emphasises that those affiliates are double or more. Jackman's (2023) discussion of the use of drones to identify ‘clandestine graves’, for example, argues that not only does the drone operator emerge during drone flights, so too do bodies dead and bereaved. Ensmenger (2015: 52) identifies a ‘spectrum of masculinity’ which emerged as computer programming was professionalised in the 1960s and 1970s, in particular a distinction between the computer nerd/hacker and the academic computer scientist. Ensmenger suggests that each of these figures was defined in contrast to the other: one a bearded intuitive genius, the other a besuited rigorous theoretician. This paper similarly argues that multiple affiliated humans emerge alongside CDTs. The paper argues that the anxious masculinism of urban computation is split and projected into different visual forms: the CDT and the disaster movie.
The use of a particular kind of geometry to compute volumes and render them visible onscreen is central to Gaboury's (2021) history of computer graphics software. Gaboury emphasises that volumetric digital objects are mathematical; they are geometries calculated computationally. Making such objects visible on a screen according to (culturally-specific) norms of human perception required an extended process of hardware and software innovation. The analytic separation of the visual and the spatial in Gaboury's account of computer graphics is a useful heuristic device and allows the next two sections to specify the enactment of white masculinity with CDTs.
The volumetric regime and the city as an object
According to Gaboury (2021: 12), computer graphics assume that ‘the world is understood as a relational system of objects capable of discrete forms of interaction’. Moreover, it is assumed in both academic and commercial descriptions of digital twins that the city that is to be twinned is itself a physical object. Terms frequently used in the literature to refer to the twinned city include the real (physical) world, existing urban structures, the physical fabric, the built physical realm, a real/physical system, a real-world physical counterpart, the physical city or, simply, the real world. The city to be twinned is ‘a real world (physical space)’ (Kikuchi et al., 2022: 838). It is not surprising then that CDTs convert city environments into volumetrically-defined objects, since this aligns with both software affordances and imaginary conceptualisations: both concur that geometric objects can accurately represent a city.
The mode of conversion speaks to Rocha and Snelting's emphasis on computational optimisation as part of what enables volumetric models to circulate. They point to ‘a particular way to arrange volumetrics in the interest of optimized computation, including drawing hyper-real surfaces on top of extremely simplified structures’ (Rocha and Snelting, 2022a: 221). This describes exactly the basic process of generating three-dimensional onscreen images, including CDTs, as also described by Gaboury and detailed in CDT scholarship (and also visualised in the thousands of breakdown videos showing the creation of digital visual effects in movies as a series of layers appearing over a massified version of the scene). A CDT is built from a 2D mesh of points, lines and polygons, generated from satellite data or aerial photography, or from existing datasets such as Open Streetmap or national land surveys. Elevation data is added (generated from aerial photography too, usually, or Lidar scans), this becomes a 3D wireframe model. Many CDTs do then drape ‘hyper-real surfaces’ over such simplified structures (see for example Helsinki City and kira-digi, 2019). Other objects are then added: buildings, road, infrastructure and so on. Digital twins assemble cities such that each ‘is an aggregate of smaller-scale objects like buildings, streets, trains, buses, cars, each of which could feasibly have their own Digital Twin, each itself a complex aggregate of further sub-components’ (Dawkins et al., 2018: 2). Various techniques establish the CDTs thus composed as accurate representations of the real urban environment.
These volumes are tethered to the material world using latitude and longitude or a location identified by a Global Positioning System, which is also understood as making them locationally accurate. According to a user guide to CityGML (an open standardised format for CDTs), ‘coordinates always have to be given with respect to a coordinate reference system that relates them unambiguously with a specific position on the Earth’ (Heazel, 2021). This process is ‘the construction of a reliable algorithm for translating between representation and reality’ (Kurgan, 2013: 13), and it depends on and reiterates long histories of hegemonic cartographic practice: ‘geographic information has largely been defined and implemented in a manner that constitutes independent entities with intensive properties, contextually indexed by location in an absolute space, and known objectively through a geographic gaze’ (Bergmann, 2016: 973).
A consequence of the assumption of absolute space is that the space of the twin must be unified and singular: it must be complete. There are no ruptures, gaps or folds: a CDT shows the city as ‘a single fabric’ (Yossef Ravid and Aharon-Gutman, 2023: 1455). Much of the technical literature on twins is precisely about fixing gaps or discrepancies in spatial organisation of a twin. Conflicts between different data sources must be resolved, either by manual checks or by automated edits (Argota Sánchez-Vaquerizo, 2022). All of the data must be consistently georeferenced, using the same system of co-ordinates. Photographic or Lidar data about surface textures and qualities must be correctly located, often by using ground control points which must themselves be carefully calculated for locational accuracy. Any inconsistencies in geometries must be fixed: for example, inaccurate occlusions among objects must be mended (Kikuchi et al., 2022), gaps and overlaps between buildings removed (Scalas et al., 2022), conflicting descriptions of road junctions resolved (Argota Sánchez-Vaquerizo, 2022). Distinct objects must be correctly identified from LiDAR point clouds (Xue et al., 2020). This spatial organisation enables a complete view.
The human affiliates that are pictured as the inhabitants of a CDT are also constituted as digital objects located in the twin's digital space. They can thus be described as one ‘subasset’ among others: ‘a city can be thus considered as an asset that integrates different subassets such as buildings, utilities, transportation infrastructure, and people’ (Lu et al., 2020: 4). The CityGML user guide has a list of ‘prototypic objects’ which includes street furniture, traffic signals, vegetation objects, cars and ‘people and animals’ (Heazel, 2021). The human inhabitants of CDTs can be procedurally generated as visual content that follows rules related to 3D volumetric objects, so that for example feet have to stay on roads and pavements, and bodies can't pass through each other; an Esri CityEngine demo video shows a yellow line added to a road in order to generate images of cars, and another yellow line added to a pavement to generate pedestrians (CityEngine User Meeting, 2022). Some CDTs also host agent-based models, especially of the movement of crowds. In both cases, humans are ‘individual mobility objects’ (Lee et al., 2022: 2), and can be substituted by various modes of transport: ‘Although topography, buildings, and roads are the main components of urban spaces, the most important factor therein is people. Vehicles, bicycles, and motorcycles that people use daily for transportation are the main elements comprising a city’ (Lee et al., 2022: 3; Papyshev and Yarime, 2021). Humans pictured in CDTs are thus computationally-generated objects equivalent to all the other moving objects in a twin (Figure 4).

Two screenshots showing the generation of cars and pedestrians in CityEngine (CityEngine User Meeting, 2022).
CDTs are constituted as objective in a double sense, then. They turn cities and their inhabitants into objects, and their imaginary assumes that this is appropriate because cities are indeed objects. Attention is then focussed on the techniques understood as accurately converting those material objects into digital objects. Such claims to know the world by converting it into Cartesian coordinates in two or three dimensions have long been associated with the power of the modern Western state and capital; there are extensive accounts of the complex imbrication of cartographic geometric abstraction with colonialism. ‘Economy and state have been intricately intertwined historically in the production of abstract spaces (of commodities, of private property, of state administration, of judicial power, among others)’ (Pickles, 2003: 153; see also Bier, 2022). The next section turns to the visuality of CDTs to unpack their particular patriarchal power in more detail.
City digital twins and seeing like a god
Volumetric CDTs render the twinned city fully transparent: visually as well as spatially unitary and coherent. There are no visual blanks or ruptures in twins. This transparency is argued to offer a more comprehensive view of city locations that enables better decision-making: ‘digital reality allows the user to navigate the city and toggle between different perspectives (e.g. birds’-eye view, first/third-person view) as well as change their scale. The possibility of interacting with the digital twin city increases situational awareness and can lead to better data-driven discoveries’ (Ketzler et al., 2020: 562–563; and see Mohammadi and Taylor, 2017). ‘A 2D display is sometimes convenient, but does not often provide a complete picture of the observed area’ (Komadina and Mihajlović, 2022: 140). A similar claim is made by the outreach lead of the UK's National Digital Twin programme: ‘when you start to connect the datasets from these digital twins, you can build a bird's eye view of a city, which gives you better information about the consequences of your decisions’ (Frearson, 2021). If a twin risks being ‘a blurry mess’ (Pimentel, 2020), it is simplified. If draping terrains and buildings with photographic-like imagery is too computationally demanding to achieve – or perhaps deemed superfluous to the aim of a particular CDT – it is abandoned, the level of detail is reduced, and instead buildings are textured as simple volumetric blocks with little or no surface detail.
The user of a CDT is affiliated visually with this accurate way of seeing by being aligned with a highly mobile point of view. The point of view of the user in promotional animations is the same as the position of what the software terms a ‘virtual camera’ in the twin's geometric space, and the camera is itself situated in, and able to move through, the twin's three-dimensional computed space. The camera's position generates the viewpoint, and what is seen is rendered visible according to norms of geometric perspective. Almost without exception, city digital twins have a mobile aerial view as default. Proposals for new buildings can be seen from ground level but users can always then fly. Indeed, promotional materials for digital twins display relentless visual mobility; their point of view is constantly on the move. The virtual camera software that provides the point of view can pan across a model. Or the camera may remain steady and the user of a digital twin can rotate the twin-object around onscreen to look at from different angles. Unlike the CDT inhabitants, chained to the twin's terrain, the CDT user is configured ‘doing away with horizons, suspending vanishing points, seamlessly varying distance, unchaining the camera and transporting the observer’ (Elsaesser, 2013: 237) (which description was written as part of a discussion of 3D cinema).
That spatially mobile gaze can also zoom, and the zoom enables visual mobility through things that would be impossible in the material world. This zoom gives users of the CDT a visual perception that can penetrate an object's surface. A CDT is seen as ‘a holographic, high-definition, high-precision urban digital space, covering ground and underground, indoor and outdoor, and two-dimensional and three-dimensional structured entities’ (Deng et al., 2021: 127). Some city zooms pass through the exterior walls of buildings to picture interior rooms, or the infrastructure embedded within walls, ducts or tunnels; subterranean infrastructure is also rendered visible (particularly in CDTs that include Building Information Models). In some animations, the camera can look underneath digital twins which float in space, to be inspected from any direction. As a report by the consultancy Arup says, ‘we can now render our geometric designs virtually, in breathtaking fidelity, including previously unfeasible perspectival changes. We can move from a bird's-eye view of the entire building and its surroundings to zoom in on the smallest detail of a room’ (Arup, 2019: 26). This entails ‘the virtualization of the eye into a metastatic virtual camera able to view an object from any point of view whatsoever’ (Galloway, 2014: 66); it renders urban space entirely transparent to the CDT user's gaze.
This highly mobile and all-seeing gaze evokes Haraway's (1988) discussion of ‘the god-trick’. While it is important to neither over-generalise its prevalence nor exaggerate its power (Bier, 2022; Kurgan, Brawley and Kirkham-Lewitt, 2019; Pickles, 2003), it does appear that the CDT imaginary is precisely premised on ‘a conquering gaze from nowhere’: This is the gaze that mythically inscribes all the marked bodies, that makes the unmarked category claim the power to see and not be seen, to represent while escaping representation. This gaze signifies the unmarked positions of Man and White, one of the many nasty tones of the word ‘objectivity’ to feminist ears in scientific and technological, late-industrial, militarized, racist, and male-dominant societies. (Haraway, 1988: 581)
This penetrating gaze gives urban knowledge from a CDT to its users, offering a totally comprehensive view of the city as an object. The power of this gaze in the context of CDTs needs further elaboration, however.
One aspect of this power, in the context of urban management, is the ability to act on the city. This move from transparency to action is interrogated in Söderström's (1996) short history of the forms of visibility embedded in professional planning practice as it emerged in the late nineteenth century: This new space of representation simultaneously opened up a new space of action: urban space. Situated within the same simulated space, scaled down so as to be readily assimilable at a glance, forms that had hitherto belonged to incommensurable categories could now be apprehended by the mind and could therefore be manipulated as parts of a whole. (Söderström, 1996: 258)
Both the spatial and visual organisation of a CDT make the twinned city ‘a whole’, and Söderström's association between seeing the city at a glance and acting on the city is a core dynamic of the CDT and the professional vision of planners in which it is operationalised (Carlsson, 2022). The promotional materials that picture CDTs in use often show the twin as a glowing holographic model sitting on a table, with its users gathered at its edges managing the city by moving objects with a gesture, or by interacting with it via a tablet interface or a VR headset, or just looking at it benevolently. That is, the bodies of users are pictured as outside of the twin looking in, and therefore in a position to transform it from its outside. After all, ‘it is widely accepted that to improve the quality of life and sustainability of cities, urban planning involves manipulating their physical form’ (Batty, 2024: 195). The visual integrity and inspectability of objects are imperative if action is to follow from seeing a CDT (Figure 5).

This Shutterstock stock image is frequently reproduced on websites discussing city digital twins.
The association of transparent visibility and absolute space with being able to take action on what is thus visualised must be understood as part of powerful masculine whiteness of the gaze mobilised with CDTs. CDTs are based on ‘the white spatial imaginary [which] idealizes “pure” and homogeneous spaces, controlled environments, and predictable patterns of design and behaviour’ (Lipsitz, 2011: 29). McKittrick argues: If we imagine that traditional geographies are upheld by their three-dimensionality, as well as a corresponding language of insides and outsides, borders and belongings, and inclusions and exclusions, we can expose domination as a visible spatial project that organizes, names, and sees social differences (such as black femininity) and determines where social order happens. (McKittrick, 2006: xiv)
As McKittrick remarks, volumetric spaces are constituted both from, and reiterate, a racial, sexual and economically dominant vantage point which allows other forms of life to be managed.
Further, the CDT imaginary universalises this way of seeing when it is claimed that CDTs simply offer ‘a very real, detailed, specific, and impactful visual experience’ (Deng et al., 2021: 128) which is seen in the same way by all CDT users. ‘An open and transparent visual experience helps everyone clearly see a scheme in context and understand how it will work’, according to VU.CITY's website page for local authorities (VU.CITY, 2023). For example, it is argued that a CDT enables effective urban management also because it shows the same data to all its users. (There is no critical discussion of the production of near-real-time urban data in the CDT literature.) Tkacz (2022) has observed that the multiple screens in the smart city control room actually generate considerable uncertainty for urban managers, who have to make sense of constantly changing data, distributed across different screens, in different formats. In contrast, the CDT shows its data in one interface – the 3D model – and it is argued by proponents of CDTs that this eliminates what Tzack describes as the ‘doubt’ generated by control centre dashboards. Instead, a twin ‘orchestrates’ urban management by showing the same information to everyone: Orchestration is the harmonious organization of activities (good planning) that enables informed decisions and helps to avoid costly ad-hoc problem solving. Digitization helps planning of activities by keeping track of essentials, and by facilitating the inclusion of stakeholders, because everyone can be updated to have the same and the latest information. (Lehtola et al., 2022: 1)
This is claimed to offer both resource efficiency and democratic inclusion. Displaying the same data is claimed to generate ‘a universal experience’ (Buildmedia, 2021) or a ‘common referential’ (Dassault Systèmes, 2017). The effect is ‘to uniformize the vocabulary used among the very different profiles involved in the urban sciences, be they planners, architects, policymakers, citizens, lawyers, computer scientists, and so on’ (Meta et al., 2021: 13). 3D city modelling enables enrichment of digital city models with external data and presentation of more knowledge-based city scenarios… Consequently, more informative and realistic city scenarios enable elimination of emotional statements and opinions during the decision-making process. (Hämäläinen, 2021: 6)
In short, ‘3D city models and digital twin technology assist urban developers in objectifying and forming a shared vision and understanding of the city design subject matter’ (Hämäläinen, 2021: 6; see also Dembski et al., 2020; Kikuchi et al., 2022). These claims assume that removing differences among ‘very different profiles’ is an advantage of CDTs (as per Rocha and Snelting's [2022b] description of the volumetric regime).
Moreover, not only is it assumed that seeing the city in the same way results in better decisions about city management but it is also assumed that this specific mode of seeing is the only way to manage cities. This extends to a key claim made by the CDT imaginary, which is its efficacy as a tool for community participation. This is mentioned repeatedly in discussions of city digital twins, by researchers, commercial software vendors and local city authorities, which suggests that criticism of the lack of democracy in many versions of the smart city is acknowledged (see for example Cardullo and Kitchin, 2019; Greenfield, 2013; Kitchin, 2015; Sadowski and Pasquale, 2015). However, the example of such participation that is repeatedly used in the literature is the ability to show planning proposals to citizens and communities. 3D model allows for the easy removal and addition of newly proposed buildings… Any proposed building plans can easily be added to the digital twin using the BIM model. This model would then allow citizens and public officials to walk around the digital twin and see the effect that the new building would have on the skyline from a number of different locations. (White et al., 2021: 5)
Computer-generated, photo-realist images of planned new developments can be inserted into existing models so that everyone can see what they will look like when they are built; with GIS functionality, the sightlines and shadows cast by proposed buildings can also be calculated and visualised (Ketzler et al., 2020; Scalas et al., 2022; Schrotter and Hürzeler, 2020; and see Carlsson, 2022). In this further erasure of differences among the various users of CDTs, both professionals and ‘citizens’ are assumed to engage with the CDT's image of the city entirely visually. According to the World Economic Forum (2022: 14), CDTs allow residents to ‘visualize urban congestion in real time, allowing them to adjust their travel plans. They can also “go sightseeing” to any of the world's tourist destinations regardless of where they are in reality’. This persistent emphasis on the urban as what can be seen is underlined by the fact that when, very occasionally, urban humans are invited to contribute to a twin's database, this is usually a request for photographs in order to deepen the visual realism of a twin or to include more objects (Belcher, 2019; Dembski et al., 2020; Ham and Kim, 2020; Lu et al., 2020). Citizen feedback is sought only on the appearance of the city; managers act only on the basis of its various data displays, both numeric and visual. Thus, all the viewers of a CDT are assumed to see the twin in the same way, ‘sustained by the idea that vision is stable and shared, and that visual experiences are largely determined by physical form and objects, including buildings, trees and hills, that can be mapped, measured and compared using numerical coordinates’ (Carlsson, 2022: 228).
The assumptions that seeing the city as a spatially coherent and visually transparent animated object is the most effective way to manage it, and that everyone will see the city in that same rational way, constitute one version of the production of similarity that Rocha and Snelting (2022a) identify as central to the volumetric regime. The emphasis in the CDT imaginary on entirely rational decision-making is masculinist, as is its technocratic insistence that it offers the single, truthful, data-derived vision of the city (according to Siemens (2020), its Mindsphere CDT offers ‘a single source of truth for all’). This managerial affiliate of the CDT is given the power to act on the city by seeing it and manipulating it. The world of the CDT is therefore surely a contemporary version of that produced by colonial and capitalist visual techniques which also envisioned the world as abstract, transparent and actionable (Pickles, 2003).
Excessive affiliates
Previous sections have argued that CDTs configure powerful forms of human life, in the shape of their users. In both their technical affordances and in their users’ professional vision, CDTs are seen as objective onscreen models which enable rational decision-making. This way of seeing is also extended in the CDT imaginary beyond the professional vision of planners, managers and analysts to a universal form of experiencing cities: from a mobile point of view which can see the city from everywhere, but only as an accumulation of geometrically-organised objects. While many of the images that picture urban managers of various kinds using CDTs often picture those managers as diverse – women and people of colour now feature frequently in the in-stock imagery of and adverts for CDTs, even if they are always wearing white, blue or black smart clothing that are the visual markers of professionalism – the visibility embedded in the software and discourses of CDT also enacts the transparent space of white masculinity. ‘The goal is a diffuse omniscient gaze engulfing a precisely modeled object world’ (Galloway, 2014: 62).
And yet… there are many suggestions in the CDT imaginary of a form of human life that exceeds this rational user. For example, there is widespread enthusiasm in promotional and demo videos for Unreal Engine's ability to move a city twin from day to night and back again just using a slider control. Look, I can make it go dark! And light! And dark again! This is a more playful enactment of the power to see and act on a city, an example of the ‘causal pleasure’ of software programming, generated when a computer produces ‘an internally consistent if externally incomplete microworld’ (Chun, 2005: 39). Chun suggests that this pleasure is ‘an effect of programming languages, which offer the lure of visibility, readability, cause and effect’ (Chun, 2005: 39), and it is without doubt at play in discussions of CDTs. It does not quite align with the emotion-free decision-making process that CDTs apparently enable.
Moreover, some commentators remark that ‘contemporary approaches to creating digital twins are often surprisingly “materialistic” or “physicalistic,” typically based on measurement data about the functioning of buildings, streets and natural environments, whereas human, social and cultural activities that drive the sociodynamics of cities are often barely featured’ (Caldarelli et al., 2023: 375). In fact, CDT scholarship and corporate storytelling are also full of references to what CDTs cannot contain (cf Rose, 2017). This excess is a third kind of human affiliate of the CDT. It is a form of human life that cannot be modelled, according to CDT advocates, because is unquantifiable: too unpredictable, irrational, collective, creative, pluralist, serendipitous, individual, random. As scholars plaintively remark, ‘humans are more complex than manufacturing processes… social conflicts, sociopolitical issues, social inequality and environmental sustainability cannot be explained by numbers’ (Botín-Sanabria et al., 2022: 12, 20). Given the importance of rendering the CDT entirely visible, it is symptomatic that these aspects of human life are described as invisible: ‘urban systems like many social systems are highly resistant to detailed observation and show a degree of invisibility that is much more problematic than in many physical systems where we are able to instrument most features of any relevance’ (Batty, 2021: 19). This is the point underlying recent comments that the fantasy of the CDT is unachievable. CDTs cannot manage cities because they cannot model this version of human life.
This persistent reference to forms of human life that exceed the CDT's ability to capture is intriguing. This paper proposes to treat this excess as a feature of CDTs, not a bug. While it ostensibly refers to the inadequacies of the procedurally generated humans in a CDT, it can also be read as an anxiety about the CDT user. After all, in a way, the managers of CDTs are also positioned as products of the CDT. While they can see and manipulate its content from whatever angle, the aim of the CDT is to reduce doubt and make decision-making straightforward. If a city's data is displayed so effectively that there is little doubt about the best form of action that must follow, well, there's a sense in which user-managers are simply carrying out the agency of the model, as much as the procedurally-generated or agent-based pedestrians that inhabit the CDT. In other words, not only does the CDT population lack a creative, irrational, emotive, embodied version of the human, so too may its managers. There are two specific anxieties that can be identified here. The first is the threat that, if no insight and creativity are involved, CDT management is little more than the rote carrying out of instructions, and thus risks becoming feminised (as was programming at the beginning of the computing industry [Ensmenger, 2015]). A second threat is that, in a historical moment when showing emotion is a strong imperative for many forms of white masculinity, being too rational risks being seen as inhuman (Greenblatt, 2016). Perhaps then, lurking in the confident advocacy of CDTs are anxieties not dissimilar to those identified in relation to artificial intelligence (Book, 2023): the all-seeing, powerful white masculinity enacted with cutting-edge digital technologies is nevertheless inadequate, at once overly rational and threatened by feminisation. Hence, perhaps, proposals for cage fights (BBC, 2023) or suggestions that an AI-generated beard makes Mark Zuckerberg look ‘less robotic, more alive’ (Huet, 2024): another kind of masculinity is required to ward off these anxieties, in order to enact full humanity.
Identifying this anxiety pushes this paper towards what it proposes is the twin of the twin: the urban disaster movie. This genre of movie remains hugely popular despite doing little more than repeating its formulaic structure with ever more elaborate digital visual effects (VFX) (Crogan, 2017). In these movies, humans are precisely the kinds that the CDT cannot see: visibly creative and unpredictable. But this is hardly a radical alternative to the white masculinity of the CDT. Rather, as the next section will argue, disaster movies picture that excessive form of human life as precisely what will survive the digitally-rendered disaster – in the form of a different kind of white masculine hero.
Disaster movies as heroic survival in the volumetric regime
This section approaches Hollywood disaster movies as a symptom of the anxiety that white rational masculinity may be too rational, or itself become automated: it is at risk from computation. As already noted, there are several convergences between disaster movies and CDTs. Both picture cities as volumetric; both show cities under threat; both use similar computer graphics software and other digital techniques to compile a range of different imagery into one coherent scene. Also, both mix claims to the real with elaborate visualisations only possible with software (Crogan, 2017). However, while the rationality of the protagonist of the CDT – the manager acting on the city for good – generates an emotional, unpredictable other, in the disaster movie, the hero of the narrative is all emotion and creativity, encountering the horrors of a VFX city in collapse and overcoming them using his strength and wits. These movies have been extensively discussed by film scholars, and this section examines just a few of their visual, spatial and narrative elements.
Disaster movies picture cities as sites of concern; indeed, not only of concern but of life-threatening disaster. The first films to make extensive use of digital visual effects to show a city being destroyed were Deep Impact and Armageddon in 1998. Both were about the threat of huge meteors hitting the planet, and both focussed mostly on New York. New York was also the setting for much of the hugely popular The Day After Tomorrow in 2004, and disaster movies continue to picture actual cities using 3D computer models in which ‘impressive, exaggerated scenes of destruction were followed by similar, impressive, exaggerated scenes of destruction’ (Keane, 2001: 99). Since then, watching cities like New York, San Francisco and Hong Kong as they are overwhelmed by meteors, earthquakes and tidal waves has become standard movie fare, in a huge number of films like Earthstorm (2006), Final Storm (2010), Into the Storm (2014), Geostorm (2017), 10.0 Earthquake (2014), San Andreas (2015), San Andreas Quake (2015), Destruction: Los Angeles (2017), Shockwave (2017), Cold Zone (2017), How It Ends (2018), Greenland (2020), Moonfall (2022), and so on and so on. The pleasures of these highly generic movies are debated by scholars and indeed their viewers, but their spectators’ dramatic immersion in apocalyptic scenarios is often visceral, edge-of-the-seat stuff – the mirror opposite, it might be suggested, of the calm look at CDT as table-top model. It is also rendered in large part by the spectacular transformation of actual cityscapes by digital software.
Such movies share the same volumetric geometry and photographic-like dressing as CDTs. ‘In digital cinema, all representations are mapped on a 3D digital environment, irrespective of whether the reality is a 3D real-world space or a 2D pictorial space: the default value of all digital representation, including the cinema, has become 3D computer graphics’ (Elsaesser, 2013: 37–38). The films show cities with straight roads and tower block buildings that appear in so many CDTs: as distant panoramas, to set the scene and to show subsequent destruction; or from above as if from a drone or a satellite. They frequently use the same mobile aerial viewpoint as the CDT. This is somewhat more spectacular – and entertaining – than the volumetric CDT. It creates an ‘emphatic or hyper-architecture of space’ (Strutt, 2019: 124), which organises the movie's scenes through ‘extreme spatial coordinates’ (Whissel, 2014: 25). ‘The new verticality’ is more intense, more expansive, generates falls and leaps and swoops through apparently immense spaces (Whissel, 2014: 57). Nonetheless, all elements in a scene are created as digital objects which are then composited together in a single, coherent, onscreen volumetric space. Buildings form tunnels through which crowds panic, planes crash and water inundates. More recent films show entire tower blocks toppling like dominoes one into another along a city street, giving a strong sense of these city buildings as objects. These cities are volumetric objects of objects, and even as they shatter and topple, their spatial and temporal organisation are singular and consistent, like those of the CDT (Schoonover, 2021).
As many commentators remark (Whissel, 2014), disaster movie spectators are invited to marvel at both the horrors unleashed by environmental disasters and at the visual special effects that visualise them. They place both technological wizardry and material vulnerability on the show. Their characters are thus seen in relation to both digital visuals and urban apocalypse, and they focus on corporeal bodies, driven by the desire to survive the digitally-rendered disaster. There are the main characters, whose efforts to escape the impending disaster are the main focus of the film. There are various secondary characters: neighbours left behind, fellow escapees encountered en route, and fellow workers. And there are the nameless and numberless crowds of victims of the disaster, who run distraught through city streets as destruction is unleashed. All of these humans are pictured as photo-real and are live-action filmed (especially the star actors), but many crowds will be computer-generated, like the inhabitants and other urban assets of CDTs. It is in this sense that Elsaesser suggested that ‘Jurassic Park's predatory dinosaurs, for instance, are less retro-evolved from, say, reptiles or birds, and instead are animated mutants of pick-up trucks, motorcycles, or earth-moving vehicles’ (Elsaesser, 2013: 37).
The spectacularity of the digital special effects is perfectly integrated with human bodies in these movies (usually anyway – in a couple of low-budget films, the layering of for example animated cracks or flickering fires or meteor trails on top of filmed streets or city scenes or skies is obvious). However, these films also establish the existence of a human body as distinct from this digitally-rendered destruction. Human life is shown as able to understand and mitigate catastrophe by interpreting its signs, for example: every one of these highly generic movies has extensive scenes of TV newscasters explaining the situation and scientists staring at screens before announcing what must be done – just like CDT advertisements. Indeed, many movies show these scientists as female – a visualisation of that sense that being too data-driven risks femininity, perhaps. And of course, not everyone dies in these environmental/digital apocalypses. There are always survivors, almost always pictured together at the end of the movie as new life is promised and the credits roll.
The bodies that survive are particular, however. As several commentators have remarked, recent disaster movies centre on a male hero figure who can outrun the disaster with their family (Courcoux, 2015; Hanchey, 2022). Very nearly without exception, this is true of all of the movies thematised here. (One anomalous film is Shockwave, in which the geologist who understands the risk of cataclysm is a woman, and a mother: however, the film explicitly condemns the prioritisation of her professional life over caring for her daughter.) In most of these films, survival depends on physical prowess and technological skills and is consistently associated with rugged, straight white masculinity. With the exception of the main character in San Andreas – played by Duane Johnson who self-identifies as Black and Samoan – all the hero figures in these movies are white, and very nearly all are men. While the more recent movies have secondary characters who are Black or Hispanic, a description of 2012's disasters remains applicable to them all: ‘a set of challenges that are prompt to testify, if not in the end, to naturalize the physical and moral pre-eminence of the white middle-class man’ (Courcoux, 2015: 135; see also Gergan et al., 2020).
In all these movies, ‘the physical and moral pre-eminence’ of the white man consists in risking death to ensure others will survive. Indeed, ensuring that human life will survive after the digitally-rendered apocalypse is what drives the narrative in all these films. Human life is pictured in these movies not only as the white male hero, therefore, but also as his white heterosexual family. ‘Heterosexuality is foregrounded in these films to such an extent that it develops an equivalent place within the narrative alongside the disaster itself’ (McKinnon, 2017: 506). Many of the male heroes are fathers and divorced or divorcing, but facing catastrophe allows them to see the error of their ways and to admit their continuing love for their (ex-)wife. Their reunion often happens as they work together to rescue their child(ren) from the disaster. The hero-father displays physical prowess and skill to overcome disaster, while both mending his marriage and fighting for his children also allows these men to demonstrate emotion. This physicality and emotionality embody precisely those qualities that the CDT imaginary repeatedly asserts that its inhabitants and managers cannot.
So while these movies might explore ‘the end of the human subject and its replacement with nothing less than a superhuman subject who can survive the hardships of a dying world’ (Bjering, 2018: 830), that superhuman subject looks disconcertingly like the twin of the god-trick technocrat. The movies’ picturing of their physically dexterous, muscular masculine hero is a split projection generated by the anxieties of data-driven masculinity worried about being both too rational and insufficiently emotional. Instead, live-action filming of actors alongside their recognisability as straight, cis and white produces a figuration of human embodiment that guarantees the reproduction of a particular kind of corporeal, emotional, unquantifiable human life, fully aligned with white masculinity.
Conclusion
This paper has developed a feminist critique of the white masculinity enacted in two sites of digital visual culture: city digital twins and urban disaster movies. It has argued that as volumetric urban digital models are technically and discursively made, and as they are deployed in a range of imagery including that which aims to manage cities using digital data and to entertain with spectacular urban visual effects, they enact different-but-related forms of human life. Their shared sense that the city is in crisis generates a pair of human affiliates who offer two kinds of fixes: data-driven management; or digitally-enhanced heroism. The gendered and racialised corporealities of these fixe(r)s are particular. The CDT universalises a disembodied, rational, white masculine gaze that acts on cities, infrastructure and populations based on objective, and objectifying, data. There are hints that this affiliate is not entirely what he claims, however. Some of his pleasures do appear in discussions of CDTs, and at the margins of those discussions lurks the unavoidable but unmodellable other of this rational manager. This is the other which becomes visible in the Hollywood disaster movie. Disaster movies display a different kind of white masculinity in the crisis-ridden city, muscular and emotional, and whose embodiment and feelings are entirely invested in reproducing his family as a means of surviving spectacular disaster. This displays in another register white hetero-patriarchal power in disastrous times.
There are complexities at play here that the paper has not sufficiently acknowledged in order to sustain its broader argument. It has not explored the details of CDT design, for example, nor how CDTs or disaster movies are experienced by their users, managers, participants and spectators. Certainly, manipulating a CDT and watching a disaster movie are not entirely similar. The CDT manager is actively managing the space he sees, while the spectator of a disaster movie pleasurably succumbs to its thrills, swept up and into its spectacular scenarios. This is work that needs to be done, particularly if the diverse inventions of contemporary urban digital visual culture are to be acknowledged (Rose, 2017).
Nevertheless, this paper offers a necessary form of critique. Arguing that the CDT manager and disaster movie hero must be understood in relation to each other is the paper's key critical manoeuvre. If the disaster movie hero displays the human life that is the other of the CDT user, the specificity of the human affiliates of the CDT becomes visible. No longer the view of everywhere from nowhere, invisibilised and universalised by its imaginary framing as simply the most realistic way of seeing cities, the gaze at the CDT becomes situated as particular. It abstracts, it rationalises, it calculates, it controls. All this becomes visible as the disaster movie hero feels, struggles, invents and survives. Both have agency in the digitally-visualised city: and the crushingly limited alternative they offer between brains or muscles – mind or body – makes the specificity of white masculinist ways of seeing and its violently limited ways of engaging with the world recognisable.
The paper is therefore also an argument for the necessity of work on the imaginaries, fantasies and discourses which profoundly shape contemporary approaches to digitalised urban management. Such work complements other scholarship that focusses on the political economy of platform urbanism as well as emerging work on AI in cities (e.g. Cugurullo et al., 2024). However, it extends that scholarship in its insistence that these technologies are not simply exploitive and extractive digital devices, though they are also that. They are also configured through fantasies about and promises of specific forms of human life. ‘Computational images are not pictures of the things they represent; they are pictures of the world that produced them, and they execute a theory of that world in the world’ (Gaboury, 2021: 9). The theorisation of the world pictured by CDT disasters might be summarised as follows: one way or the other, by data or by muscle, the white dudes will save the city. Making such fantastical, distributed theorisations recognisable is a critical necessity as volumetric digital mediations of the world proceed apace.
Footnotes
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
