Sage Journals: Discover world-class research

Abstract

This study evaluates the spatial behavior of an intermodal transportation hub in Carapungo, one of the densest neighborhoods in Quito, Ecuador. This public infrastructure is deficient and lacks adequate equipment for the people who use, occupy, and transit within and around it, as well as for the numerous activities that occur, particularly at Carapungo’s Entry Park. Traditional methods for analyzing urban dynamics and land use are typically rigid and fail to grasp the complex and nonlinear nature of public spaces, especially in informal Global South cities. However, recent advancements in Artificial Intelligence and Machine Learning, combined with aerial drone videos, have enabled the modeling and prediction of urban dynamics beyond state regulations and formal planning. In this context, we developed a model using Computer Vision Technology and the YOLOv5 algorithm, incorporating Deep Learning training. The objective is twofold: firstly, to detect people, their movement and speed; and secondly, to produce “Occupancy” and “Count & Speed” cartographies that highlight commuters’ spatial patterns. These situated cartographies provide valuable insights into urban design, mobility, and interaction within a conflicted public space’s-built environment. The generated data offer planners and policymakers quantitative spatial information to consider local practices and dynamics in urban planning, particularly in situations of informality and insufficient urban infrastructure.

Keywords

Artificial intelligence unmanned aerial vehicles image detection urban occupation YOLOv5 public space

Introduction

Carapungo as case study

Carapungo is a neighborhood of Calderon parish located on the most northern border of Quito (Ecuador), inhabited mostly by low-income population sectors. In an area of 79.17 km² (Municipio de Quito, 2021), the parish has expanded from over 150,000 inhabitants in 2010 to almost 190,000 inhabitants in 2022 according to INEC’s (2022) projections. Located at 2610 m.a.s.l., Carapungo is the northern main entrance to Quito, in which the Pan-American Highway passes by since 1971, becoming a referential point for urban development and expansion. The construction of this highway coincided with the oil boom in Ecuador, which brought significant flow of capital for construction, including the development of social housing in areas like Carapungo. The Pan-American Highway divides the district in two: north and south sides; but it also is the main infrastructure to connect Colombia and Peru. Therefore, the vehicular traffic is high and constant, with the continuous presence of trucks, buses, trailers, and vehicles. On a normal weekday, three of 10 inhabitants in Quito use private vehicles to mobilize in the city, while in Calderon, about 110,000 people use public transport to reach the city center or to commute to other circuits (Definición de La Estrategia De Comunicación a Operadores y Usuarios Sobre La Implementación Del Sistema Integrado De Transporte y La Tarifa Integrada De Transporte En El Distrito Metropolitano De Quito, 2021; Transporte de pasajeros, 2015; Secretaría de Movilidad, 2017).

This situation is complex, chaotic, and insecure at a daily basis, where commuters, informal vendors, buses, taxis, and alternative—informal—means of transport at Pan-American Highway, particularly at Carapungo’s Entry Park, is high in numbers (Medina Gavilanes and Cano-Ciborro, 2022). During rush hours and because of the informal urban aspects of the area, in Carapungo, the presence of a formal intermodal transportation hub does not exist, nor even signs or panels that indicate the location, the circuit, or arrival time of public transportation. Instead, there is a blurred text—BUS—written on the asphalt as a vague indicator. Along the sidewalk, people wait for buses in a long and winding line, without necessarily forming a proper queue. This urban scene is characterized by a bustling movement of vehicles, individuals, and goods. Buses sporadically halt in the middle of the highway, prompting people to run after them or even cross the highway in front of moving vehicles. Meanwhile, both formal and informal taxi drivers seek potential clients among those who missed the bus. Amidst this commotion, vendors attempt to entice passers-by with food or supplies (Figure 1).

Figure 1.

Left: Situation of Carapungo district and Pan-American Highway in Quito. Top Center: Cartography of Carapungo entry park emphasizing informal BUS lane. Bottom Center: Carapungo platform and bridge. Top Right: Drone view showing people running to catch three separate buses. Bottom Right: Aerial view of people struggling to enter through both the bus exit and entrance zones, which is prohibited. Authors, 2022–23.

Carapungo case study is just not an eloquent example of small-scale urbanism or subaltern urbanism (Roy, 2011), but rather how pertinent is the implementation of novel and contemporary tools to make visible, analyze and look for soft design¹ solutions in such complex urban contexts. If urbanism has been mainly understood and represented in the 20th century through platonic forms and objects, Carapungo situation is able to broaden those conceptions in favor of more bodily, dynamics, informal and unexpected approaches with a wide repercussion in the urban fabric but also at people’s mobility and use of public space, which is the small scale. In such conditions, we propose this AI approach as an analytic, predictive, and make-decision tool to recognize and identify patterns of movement, occupation, and spatial relationships developed by passers-by, commuters, and traders.

State-of-the-art technologies in urban contexts: UAV and AI vision tools

Over the past decade, studies on city efficiency and urban flow have increasingly employed AI methods, utilizing recordings and images from both pedestrian and vehicular perspectives (Alahi et al., 2014). Our focus centers on people detection in urban scenarios, leveraging advanced algorithms like SSD (Liu et al., 2016) and YOLO—You Only Look Once—(Redmon et al., 2016), which can detect people with high accuracy and speed compared to semantic segmentation methods. Nevertheless, effective spatial analysis demands systems that incorporate plan representation (Pacheco Junior & Costa do Amaral, 2021). While satellite images are valuable for large-scale urban planning, decision-making on a smaller scale demands a deeper understanding of human actions, interactions, and dynamic object occupation. In this context, drone recordings as part of the field of Remote Sensing through Unmanned Aerial Vehicle have proven to be a precise alternative (Craye and Ardjoune, 2019; Tzelepi and Tefas, 2017), though they require specialized datasets and training models (ICG - DroneDataset, 2019).

Unlike satellite images, drone views often require monitoring, quantification, and individual detection rather than focusing on class proportions. YOLO object detection on UAV-captured video effectively addresses this, showing utility in micro-urban scales, such as people detection in crowded scenes (Gouiaa et al., 2021), monitoring physical exercises (Park and Ewing, 2017), mapping behavior in parks (Park, 2020), rescue operations (Półka et al., 2018), detection of illegal activities in urban areas (Tan et al., 2018), active target search in urban environments (Geyer, 2008), even the formulation of public policies (Butilă and Boboc, 2022). More recently, Hosseini et al. (2023) used aerial imagery to map sidewalks, aiming to understand pedestrian behavior and improve urban streetscapes’ friendliness.

Furthermore, Deep Learning (DL) and Computer Vision Techniques (CVT) have been applied to quantify passenger movements in urban environments using overhead cameras in train stations (Velastin et al., 2020). This study accentuated the deficiency in widely available datasets capturing aerial images of people, thus underlining the persistent challenge of detecting individuals from aerial perspectives. In their comprehensive review, Xiao et al. (2021) delved into recent strides in DL for pedestrian detection, with a predominant focus on footage from vehicle cameras. While their analysis underscored the efficacy of modern DL models, they highlighted the ongoing need for enhancements in datasets and image annotations.

This underscores a persistent gap in the analysis of aerial imagery, which serves as a critical information source for urban planning. Moreover, there is a notable absence of studies focusing on human scale and human behaviors in contested urban contexts within South America and the Global South—where such studies could fully realize their potential and utility due to informal urbanism. Often, research in this domain prioritizes larger-scale scenarios for the development of road, green, or blue infrastructure policies, neglecting the scale of the human body.

Specific goal and methodology

We aim to make visible the informal occupations, unpredictable displacements, and frugal relations in the public space of Carapungo’s Entry Park. To achieve this goal, we propose a methodology to automatically localize, identify, and visualize the following features: people detection, people counting and speed representation, flux quantification, area of occupation, and spatial relationships. By harnessing AI vision tools, we will create a variety of situated cartographies, which will capture both occupancy patterns (“Occupancy Cartographies”) and user count and speed tracking (“Count & Speed Cartographies”). These cartographies will be automated to provide insights to designers because they show areas with different occupational characteristics or current fluxes and uses. The overlap of these maps offers a crucible of opportunities that reflect diversity and respect at a simple glance for designers.

Methodology is based on a range of advanced DL algorithms. These include the Fully Convolutional Network (FCN) focuses on converting an-input image to pixel-level segmentation (Long et al., 2015); SegNet, built on an encoder-decoder architecture and employs a “skip-layer” to combine the encoder’s spatial and semantic information (Badrinarayanan et al., 2017); and U-Net, a popular biomedical image segmentation architecture (Ronneberger et al., 2015). While we share some similarities with the FCN, SegNet, and U-Net architectures mentioned above, we have customized our networks to address the specific challenges of our project more effectively, such as object detection and semantic segmentation for small informal urban contexts. Once the objects (pedestrians) have been identified, a distance-based approach is used to identify individual paths for each person. After the paths have been identified, flows and velocities can be calculated. The following section provides a detailed explanation of the entire process.

Data and methods: UAV-sourced data, computer vision techniques, and deep learning

Data obtained by the drone flight

For this experiment, we used high-accuracy UAV images obtained from drone-based aerial recordings. The drone—DJI model—was flown over the border of the Pan-American Highway and Carapungo Park Entry on February 9, 2022, between 6:25 and 8:21 a.m. The UAV reached an altitude of 30 m during the entire recording, being able to cover an image of 70 × 30 m, with a definition of 4K pixels. We recorded 9 areal videos of approximately 60 s, and 6 videos showing the environment. To this article, we selected one aerial video that presents spatial tensions among the sidewalk that performs as a bus stop, people trying to get in and out of a bus, passers-by trying to avoid the passengers’ current, and how buses go by and stop in between the highway lines and not in the supposed area to stop. These different situations happen in the lapse of 60 s, portraying the complexities of such space, which can be replicated and extrapolated to other similar situations.

For this project, we selected a one-minute video that captured the complexity of the area, notably highlighted by the arrival of several buses during rush hour. To process the data, we extracted 1452 frames, at a frame rate of 24 fps, from the original one-minute video (60.5 s). For the background removal process, we reserved 252 frames to train the background subtractor. The remaining 1200 frames, equivalent to 50 s, were utilized for people counting and velocity calculation. The original one-minute full-size aerial video (4096 × 2160) was also cropped into a modified version (4096 × 1640). This choice was made because the Pan-American Highway at this location features twelve traffic lanes, including high-speed lanes, which posed a challenge for the machine learning model in accurately recognizing people’s movements.

In this one-minute drone video, captured at an altitude of 30 m, we employed two methods: Computer Vision Techniques (CVT) and a Deep Learning (DL) model based on YOLOv5 training. This choice was due to CVT being more efficient in counting and tracking moving individuals, while YOLO yielded better and more accurate results in detecting people when they were in stationary groups, thus, it was used for spatial occupancy analysis. In the following section, both methods will be outlined, utilizing two pipelines to illustrate the process of segmenting the moving actors in the urban scene, identifying them, and ultimately quantifying and visually representing them (Figure 2).

Figure 2.

“Computer Vision” and “Deep Learning” techniques from video to shape detection and data analysis. Authors, 2024.

Computer vision techniques

CVT were employed for individual identification in the video. The extraction of moving object shapes within video frames enabled real-time motion capture and analysis, eliminating the need for predictive model training (Bramich, 2008; Singla, 2014). This process was conducted using the OpenCV library.

First, a Gaussian blur filter was applied to optimize background subtractor performance. Subsequently, background subtraction yielded a binary image displaying only the moving objects. As has been shown in the study of Thakur et al. (2015), morphological operations can improve the segmentation of moving agents. Erosion reduced noise by shrinking foreground object boundaries, while dilation filled gaps and connected fragmented regions (Aslani and Mahdavi-Nasab, 2013).

We utilized a contour detector to outline shapes with bounding rectangles, which enclose shapes with minimal additional space. By imposing specific size criteria on these rectangles, we isolated human shapes while excluding vehicles. Tracking humans involved computing the centroid of each retained rectangle, serving as a central reference point for individual locations. Finally, we stored the collected positional data for subsequent analysis of people’s behavior within the designated area.

Deep learning model

The experiment centered around processing and representing moving actors has various merits for activity measurement and circulation prediction. However, challenges arose in distinguishing between occupancy roles and accurately quantifying stationary individuals, often leading to confusion with the background surroundings. To address these challenges, detection mechanisms based on DL were adopted to identify both moving and static individuals. Consequently, a parallel approach was taken, utilizing the YOLOv5 Deep Learning model ((Zhu et al., 2021) Ultralytics/Yolov5, 2020/2022).

We manually labeled specific frames using the YOLO format in Label Studio (Label Studio, 2023), resulting in a dataset of 523 images, each containing 1 to 20 individuals. Additionally, a validation set of 93 images with corresponding labels was prepared for model evaluation (Mokayed et al., 2021).

The images were standardized to a default width size of 640 pixels. Data augmentation, including mirroring techniques, quadrupled the dataset to enhance the model’s generalization capabilities. A batch size of 16 was employed during training. The model achieved optimal loss convergence at epoch 200, with precision and recall reaching stability after epochs 50 and 100.

We emphasize the metric “val/box_loss,” which offers insights into how effectively the object detection model can align bounding boxes with real objects in the validation images. Notably, “val/box_loss” decreased from 0.101 to 0.017, signifying improved object localization accuracy (Table 1).

Table 1.

VAL/box_loss metrics and results along epoch.

Post-training, the model accurately detected individuals in videos, providing bounding boxes with centroid coordinates for seamless integration with CVT for further analysis and tracking.

Discerning distinct paths of individuals in the video

Both CVT and DL provided identical formats for determining people’s positions: the centroid of the bounding box. Leveraging this data, our objective is to track individuals’ positions throughout the video. Initially, we calculated the distances between a single point in the current frame and all points in the subsequent frame using the Euclidean distance:

d = \sqrt{{(x_{2} - x_{1})}^{2} + {(y_{2} - y_{1})}^{2}}

In each frame, we compared the positions of points (x1, y1) and (x2, y2) to find the closest pair. If their distance was below a predefined threshold, we linked them, assuming they represented the same person, and assigned a unique identifier (ID). This process continued for all points in each frame, allowing us to track individuals consistently throughout the video sequence.

Situated cartography: Contextualizing Carapungo entry park

As a link between the method and the results, the production of a cartography that serves to situate the research emerges, a foundational cartography we term “situated cartography.” Building on the concept of “situated knowledge” (Haraway, 2013), we stress the significance of understanding the spatial context to attain a profound, realistic, and “situated” understanding of Carapungo Entry Park dynamics, which is divided in the next regions: Bus Stop L (left), Bus Stop R (right), Ramp Entrance, Cantina, Bridge, Platform, and Building Edge.

We chose an unsupervised learning method based on proximity and two-dimensional distances happening in X and Y coordinates—Euclidean distance. This method, known as “Nearest Neighbor” (G. Gutin, A. Yeo and A. Zverovich, 2002), associates each individual with the subject at the shortest Euclidean distance with respect to itself. It is a recursive solution that makes possible to identify different relationship schemes and areas of occupation, as well as the interaction trend between people.

It is considered that, beyond architectural barriers (buildings, roads, lakes, etc.), the space occupied by each individual encompasses the maximum possible territory up to the space occupied by another individual and in the same proportion. Thus, the border that separates the territory occupied by a subject is half the distance that separates it from the closest subject. Following a Voronoi scheme by joining the subjects together and representing the perpendicular bisectors of the joining segments, it conforms a series of polygons in two-dimensional space or Thiessen Polygons—Voronoi Diagrams—(Aurenhammer and Klein, 2000).

Cumulatively between each frame, and ignoring the territories established by minimum distances (subjects compared to themselves between frames or compared to other subjects that travel the same space), the algorithm applied to the detection model yields relevant planimetries that expose the trends of occupation of human groups (Figure 3).

Figure 3.

Top: Evolving occupation frontiers generated by Voronoi algorithmic people detection (pedestrian movement shown in yellow). Down: Highlighted areas of detection (hence, different design strategies should be considered in each area). Authors, 2024.

Results: Mapping occupancy and speed dynamics

With the application of CTV and DL methods, we create situated cartographies in Carapungo’s Entry Park to make visible the socio-spatial dynamics in a graphic, effective, and precise way. These cartographies not only detect human bodies in a complex informal public space but also identify their most common relationships and potential interactions in small-scale environments.

The results are divided into “Occupancy Cartographies,” where occupancy patterns are analyzed, and “Count & Speed Cartographies,” where user count and speed are tracked.

Occupancy cartographies

The situated cartography makes possible to identify when an individual cease to belong to one environment and begins to belong to another based on the location of other subjects and allowing the identification of inflection or breaking points in the same track. As illustrated in Figure 4, a person with a comparable Euclidean distance to different groups ceases to belong to the platform environment and begins to be associated with the bus stop environment. This individual occupies distinct territories during different temporal windows (frames 15 and frame 16).

Figure 4.

Cartographies in relation to occupancy with the built environment. Authors, 2024.

Those areas untouched by relational axes indicate environments with a limited tendency for people interaction and, consequently, a low level of occupation (Figure 5). This is especially notable as Carapungo Entry Park is a transit area where individuals consistently seek the quickest or most visually accessible route towards the Pan-American Highway to check for their bus’s arrival. Acknowledging this observation, these spaces stand out as potential candidates for future projects. The minimal human activity and interaction in these zones suggest the prospect of introducing new initiatives or developments, offering opportunities for strategic planning and the optimal utilization of these currently underutilized spaces.

Figure 5.

Agglomerative models based on pedestrian nearest neighbor links show four low-occupational-rate areas (design opportunities). Authors, 2024.

Population counts & speed cartographies

We have made visible the “fluctuation in people count” and the “variations in their velocity” throughout the video in relation to the specific location through three different graphs, with the aim of critically comparing their strengths and weaknesses. The visualization methods used to present this information are as follows: “Timeline Graph,” “Strip-Diamonds Graph,” and “Stacked Area Graph” (Figure 6).

Figure 6.

“Timeline,” “Strip-Diamonds,” and “Stacked Area” graphs illustrating population count and speed of movements. Authors, 2024.

In the three graphs, we can see how Bus Stop R, Bus Stop L, and Platform are the regions that accommodate the highest number of people, evidencing how many people cross the bridge to reach the bus area and how people move incessantly along the entire Pan-American Highway without knowing exactly where their bus will stop. While the “Timeline Graph” shows a higher and constant number of people at both bus stops, both the “Strip-Diamonds” and the “Stacked Area Graph” emphasize that there is more population at the platform and at Bus Stop R. It is worth noting that the diamond-shaped symbols represent the average population count for each distinct region. Furthermore, there is no doubt that the region with the fewest people is the area near the cantina. However, the “Stacked Area Graph” depicts a significant crowd gathered at the cantina early in the video, suggesting that this diagram may have limited relevance for our objectives.

In terms of speed, we can observe that the graphs are even less similar, allowing us to analyze and compare which one is the most useful. In the “Timeline Graph,” we observe a constant movement in both bus regions, emphasizing the restless anticipation of people awaiting the bus’s arrival. We also see spikes in speed over time in the other regions, illustrating how people rush through those areas punctually to catch the bus. The bridge area and the cantina stand out, with the latter being unrepresented in terms of the number of people but now noticeable as some individuals rush through there towards the end of the video to catch their bus on time. In the “Strip-Diamonds” graph, we notice that it extrapolates the results and lacks the precision of the previous one. However, by looking at the diamond, it shows that the mean speed remains relatively consistent across all regions, except for the Cantina, where a more relaxed pace is observed. Similarly, the “Stacked Area Graph” provides extrapolated values, with the Entry Ramp area taking center stage, despite its almost negligible presence in the “Timeline Graph.”

In conclusion, the “Timeline Graph” offers the most specific information and precise information for counting people and speed. It allows us to pinpoint specific moments of activity, such as individuals rushing to catch the bus, rather than relying on extrapolations that may distort reality, as seen in the “Stacked Area Graph.”

In our final analysis, we develop a heat map designed to visually represent smoothed speed values within the zone captured in the video (Figure 7). In this cartography, the color scheme serves as a visual indicator: cooler colors, such as blues, represent lower speeds, while warmer colors, like reds and oranges, signify higher speeds. This heat map allows us to discern notable patterns in population movement. Particularly, we observe that the Bus Stop R region appears as the hottest spot on the map. This suggests a significant concentration of high-speed movement within the area.

Figure 7.

Heat Map of population speed in the video. Authors, 2024.

Discussion

The use of AI in the field of urban planning and urban design provides us with new models and tools to visualize situations and patterns in a more immediate way. Furthermore, this information can be used as an important parameter to consider in urban design, planning policy, and decision-making. Our case study shows a prevalent urban scenario in Global South cities, characterized by insufficient planning, outdated infrastructure, and pervasive informality. In this work, we explore two main topics: AI-generated cartographies and data production, aiming to inform future urban projects addressing informality and inadequate public infrastructure. The discussion of these results focuses on visible parameters to evaluate spatial implications.

Occupancy and count & speed cartographies: Making visible situations and patterns

The “Occupancy” and “Count & Speed” cartographies reveals varying interaction levels on the sidewalk—a makeshift public space at a bus intermodal hub. However, it inadequately handles the load of people and transportation services. This urban element becomes a crowded, chaotic, and dangerous location for pedestrians, commuters, and drivers within this space. In a 2-m width, 11-m length sidewalk, parallel to the Pan-American Highway, the 45 people trying to catch a bus in 30 s represents a complexity and danger of precarious urban design that does not adapt to current inhabitants needs. Moreover, this situation is quotidian—it happens every working day during rush hour—and for the duration of the peak hour. Hence, the repetitiveness of this action affects directly to the neighborhood.

The complexity of this scene may produce a feeling of insecurity within users, which can lead to decrease in overall mobility. As for the spatial conditions of the area, Carapungo Entry Park intends to be used as a place where people gather and interact, though this regular conflict erodes the social cohesion within the community. For being such a common clashing dynamic, it has also brought to the area, negative perceptions in relation to security and mobility, marginalizing even more the neighborhood. The data and cartographies underscore the imperative to address this hazardous congestion by either designing a bus station capable of accommodating the substantial flow or implementing a soft design approach to enhance the signage system. Additionally, cartography identifies a potential waiting area within the park, enhancing pedestrian flow towards buses, taxis, and private cars.

These cartographies indicate that the most used and occupied areas are the smaller in the framed space, and simultaneously, can cast extents of the rest of spaces that are underused. This model provides a quantitative delineation of several invisible relationships to human eyes, as they are not physically constructed, and additionally, they offer a variety of micro-occupancies in a complex public space that otherwise would be difficult to capture with simple first-person photographic or video analysis taken from ground level, which is one of the most common methods for in-situ urban analysis.

“Occupanncy” and “Count & Speed” cartographies should be understood as predictive cartographies that summarize patterns of mobility in pedestrians, neighborhood-scale agglomerations, and the potential composition of amenities in such informal places.

Application and benefits of models: Infrastructural design through precise cartographic technique

Harnessing AI helps us understand how people navigate conflictive urban spaces, informing the design of more livable and efficient areas (Ibrahim et al., 2021), especially in informal public spaces. AI offers an automatic, pragmatic, fast, and objective method to cartography and count people in a specific urban area, under temporal and contested situations often overlooked by local governments, public institutions related to urban planning, design and mobility, and planners responsible for improving infrastructure and transportation issues. This understanding aids in setting policy priorities. However, challenges include the rendering time of DL models, access to powerful hardware, difficulty detecting people when they are not wearing bright clothing, when the ground colors are not cobbled, or if there is a lack of green surfaces, and the lack of precision of satellite images, making drone use essential if allowed. Drones can be used here as there are no restrictions yet.

This information can be used to, firstly, make visible the reality and its everyday problems, and secondly, to provide spatial evidence that can lead to propose a spatial solution related to this problem; in this case, the lack of public infrastructure in aspects of mobility and public space, by using soft design (Alavi et al., 2019; Stojanovski et al., 2020). Additionally, the model could be used to identify areas where pedestrian infrastructure may be deficient or unsafe in respect of the intensity of flux, such as crosswalks or sidewalks, and prioritize improvements in those areas through urban planning and policies (Ye et al., 2021; Yigitcanlar et al., 2021). We reached a tool that is relevant for possible projects’ proposals, as it considers the particularity and identity of the locations, the use, area of occupancy, movement detections, movement representation, speed of pedestrians’ movements, flux, and relationships between people and others, and the built environment.

Moreover, we have been able to assess people’s movement speed, which is a useful characteristic for distinguishing between places where people want to spend more time and places where they want to move quickly. With the ability to detect people, we can identify patterns of staying versus moving, as well as the type of movement, such as relaxed or rushed. By applying this approach to data gathered from people recognition technology, urban planners can make more informed and effective decisions, resulting in better urban development outcomes.

Conclusion: Unveiling the potential of AI in complex and overlooked public spaces

This paper has investigated the application of Artificial Intelligence to localize and depict patterns of behavior of the inhabitants of Carapungo in an informal bus station during the morning rush hour. We chose Carapungo’s Entry Park because its complexity in terms of spatial configuration (the most populated parish in Quito, the presence of the Pan-American Highway, the lack of adequate public infrastructure of mobility, commerce, public space), and social interactions (informal and formal practices, passengers and commuters waiting in a non-waiting area, people crossing the Pan-American Highway), demands to take into account all these situations that occur at a daily basis. The need to frame these circumstances, led us to use AI tools to have a wider perspective of this conflictive area.

This approach has been possible thanks to several drone recordings, which significantly reduce mapping time and provide an accurate visual depiction of people’s positions, relationships, and quantities in this small-scale urban situation. For our model development, we carefully selected a one-minute drone video captured at a 30-m altitude. Employing both CVT and a DL model based on YOLOv5 training, our dual approach aimed to segment moving actors within the urban scene, identify their flows, and quantitatively represent them through various cartographies designed to transform complex data into easily understandable visualizations for urban designers and policymakers.

AI holds transformative potential in urban design, as demonstrated by our Carapungo’s Entry Park case study. Our model enables planners and policymakers to evaluate resilience standards for informal public spaces, integrating local insights with experimental AI models. Through cartographies, our research explores relationships among people, the built environment, and movement, refining approaches to informal practices as urban design parameters. The visualization of all these complex situations through these cartographies, are a clear document that argues the need for urban policies and planning processes. Additionally, it is possible to estimate population density, analyze movement patterns, and social interactions, which suggests optimal layouts for public spaces. This methodology simulates various urban design scenarios to assess their impact on the community, environment, and overall functionality of the space where there are such conflictive and complex settings. It is a deeper understanding that informs more effective planning strategies, facilitating the evaluation of urban plans for better solutions.

Exploring these and on-going case studies, we aim to introduce such findings to municipal authorities, fostering a connection between on-the-ground research, quantitative data, and the formulation of effective public space policies. The pivotal question arises: Can AI serve as a tool to enhance public spaces inclusively, or does it risk becoming another tool that inadvertently reinforces societal stratification? This inquiry is vital for ensuring that technological advancements contribute to equitable and accessible urban environments.

Footnotes

Acknowledgement

This article is result of the project “Post-Public Space. Spatial practices based on informality”—ARQ.AMG.20.02, supported and funded by Universidad de Las Américas, Quito, Ecuador.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Universidad de Las Americas, Quito, Ecuador; ARQ.AMG.20.02.

ORCID iDs

Víctor Cano-Ciborro

Ana Medina

Daniel Diaz

Data availability statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Note

Victor Cano-Ciborro is an architect with a Master's and Ph.D. (cum laude and Outstanding Doctoral Thesis Prize 2021/2022) from the Polytechnic University of Madrid. His academic affiliations include CEPT University in Ahmedabad (India); Universidad de las Américas, Quito (Ecuador); ETSA Madrid; and the European University of the Canary Islands. He has been a pre-doctoral visiting researcher at the University of California, Berkeley, a postdoctoral researcher at The New School / UPM, and a Marie-Curie fellow at Brown University. His research focuses on counter-cartographies that unveil overlooked conflicts and resistances in contested territories through the agency of non-compliant bodies.

Ana Medina, Ph.D. Architect, is currently Founder and Director of the Master in Advanced Architectural Design, and Associate Professor at Universidad de Las Américas, Quito. She is also International Researcher at Keio University and at the Hypermedia research group in Madrid. Ana's research focuses on the critical examination of public space design strategies and practices in relation to privatization, informality, everyday human behavior and the creation of healthy spaces. Her work contributes to the field of architecture and urban design by combining methods such as artificial intelligence or geospatial analysis, along with ethnographic architecture or the production of dissident cartographies.

Alejandro Burgueño is a senior architect with more than 10 years of experience in urban planning, infrastructure and building projects under BIM-GIS digital management methodologies, virtual reality and parametric construction. He is also a data scientist with a master's degree in Artificial Intelligence and currently develops technological tools for LLYC in the area of innovation. During his career he has worked in renowned agencies in London, Marrakech, Shanghai and Madrid. Alejandro master’s digital techniques focused on spatial processing, image and language. He is passionate about research on the Internet of Things and Artificial Intelligence.

Mario González-Rodríguez received the Ph.D. degree in computer science from the Autonomous University of Madrid (UAM) in 2012. He has been an Associate Professor with Universidad de las Américas, Quito, Ecuador, since 2016. His current research interests include artificial intelligence, complex systems, information processing using neural networks, the modeling of attractor networks for pattern retrieval, data analytics, and smart transportation systems.

Daniel Díaz Bedoya was born in Quito, Ecuador, in 1998. He obtained his B.Sc. degree in software engineering from Universidad de las Américas in 2021. Currently, he serves as a Research Assistant at the same institution, where he has been dedicated to exploring the fascinating realms of Artificial Intelligence, Computer Vision, Deep Learning, and Data Analytics.

María Rosa Zambrano, Ph.D. Architect (2023), Master in Analysis, History and Theory of Architecture (2015) by Polytechnic University of Madrid, Specialist in History of Art by Simon Bolivar Andean University (2013), and BSc. In Architecture by Pontifical Catholic University of Ecuador (2011). Currently, she is Assistant Professor at the School of Architecture, Universidad de las Americas, Quito, Ecuador (2015-). Her research focuses on history of architecture, heritage, archives and culture media.

References

Alahi

Bierlaire

Vandergheynst

(2014) Robust real-time pedestrians detection in urban environments with low-resolution cameras. Transportation Research Part C: Emerging Technologies 39: 113–128. DOI: 10.1016/j.trc.2013.11.019.

Alavi

Churchill

Wiberg

, et al. (2019) Introduction to human-building interaction (HBI). ACM Transactions on Computer-Human Interaction 26(2): 1–10. DOI: 10.1145/3309714.

Aurenhammer

Klein

(2000) Voronoi Diagrams**partially supported by the deutsche forschungsgemeinschaft, grant K1 655 2-2. In: Handbook of Computational Geometry. Amsterdam, The Netherlands: Elsevier, 201–290. DOI: 10.1016/B978-044482537-7/50006-1.

Badrinarayanan

Kendall

Cipolla

(2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(12): 2481–2495. DOI: 10.1109/TPAMI.2016.2644615.

Bramich

(2008) A new algorithm for difference image analysis. Monthly Notices of the Royal Astronomical Society: Letters 386(1): L77–L81. DOI: 10.1111/j.1745-3933.2008.00464.x.

Butilă

Boboc

(2022) Urban traffic monitoring and analysis using unmanned aerial vehicles (UAVs): a systematic literature review. Remote Sensing 14(3): 620. DOI: 10.3390/rs14030620.

Craye

Ardjoune

(2019) Spatio-temporal semantic segmentation for drone detection. In: 2019 16th IEEE international conference on advanced video and signal based surveillance (AVSS), Taipei, Taiwan, 18-21 September 2019, 1–5. DOI: 10.1109/AVSS.2019.8909854.

Definición de La Estrategia de Comunicación a Operadores y Usuarios Sobre La Implementación Del Sistema Integrado de Transporte y La Tarifa Integrada de Transporte En El Distrito Metropolitano de Quito (2021). Dirección Metropolitana de Políticas y Planeamiento de la MovilidadPub. L. No. EC-EPM METRO QUITO-208964. https://www.metrodequito.gob.ec/wp-content/uploads/TDR-DEFINICION-ESTRATEGIA-DE-COMUNICACION-SISTEMA-INTEGRADO-DE-TRANSPORTE.pdf

Geyer

(2008) Active target search from UAVs in urban environments. In: IEEE international conference on robotics and automation, Pasadena, California, USA, 19-23 May 2008, 2366–2371. DOI: 10.1109/ROBOT.2008.4543567.

10.

Gouiaa

Akhloufi

Shahbazi

(2021) Advances in convolution neural networks based crowd counting and density estimation. Big Data and Cognitive Computing 5(4): 50. DOI: 10.3390/bdcc5040050.

11.

Haraway

(2013) Simians, Cyborgs, and Women. London, UK: Routledge.

12.

Hosseini

Sevtsuk

Miranda

, et al. (2023) Mapping the walk: a scalable computer vision approach for generating sidewalk network datasets from aerial imagery. Computers, Environment and Urban Systems 101: 101950. DOI: 10.1016/j.compenvurbsys.2023.101950.

13.

Ibrahim

Haworth

Cheng

(2021) URBAN-i: from urban scenes to mapping slums, transport modes, and pedestrians in cities using deep learning and computer vision. Environment and Planning B: Urban Analytics and City Science 48(1): 76–93. DOI: 10.1177/2399808319846517.

14.

INEC (2022) Proyecciones Poblacionales Ecuador. Proyecciones Poblacionales.

15.

Liu

Anguelov

Erhan

, et al. (2016) SSD: Single Shot MultiBox Detector. Cham: Springer, 21–37.

16.

Long

Shelhamer

Darrell

(2015) Fully convolutional networks for semantic segmentation. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, 7 June 2015, 3431–3440. DOI: 10.1109/CVPR.2015.7298965.

17.

Medina Gavilanes

Cano-Ciborro

(2022) Cartographies of everyday conflicts in public spaces. Informal micro-activities on formal infrastructure. Carapungo Entry park, Quito. Revista INVI 37(106): 149–176. DOI: 10.5354/0718-8358.2022.67122.

18.

Mokayed

Shivakumara

Woon

, et al. (2021) A new DCT-PCM method for license plate number detection in drone images. Pattern Recognition Letters 148: 45–53. DOI: 10.1016/j.patrec.2021.05.002.

19.

Municipio de

Quito.

(2021). Administración Zonal Calderón. Quito. https://www.quito.gob.ec/administracion-zonales/administracion-calderon/

20.

Pacheco Junior

Costa do Amaral

(2021) Performance Analysis in the Segmentation of urban asphalted roads in RGB satellite images using K-Means++ and SegNet. Inteligencia Artificial 24(68): 89–103. DOI: 10.4114/intartif.vol24iss68pp89-103.

21.

Park

(2020) Park and neighborhood Attributes associated with park use: an observational study using unmanned aerial vehicles. Environment and Behavior 52(5): 518–543. DOI: 10.1177/0013916518811418.

22.

Park

Ewing

(2017) The usability of unmanned aerial vehicles (UAVs) for measuring park-based physical activity. Landscape and Urban Planning 167: 157–164. DOI: 10.1016/j.landurbplan.2017.06.010.

23.

Półka

Ptak

Kuziora

, et al. (2018) The use of unmanned aerial vehicles by urban search and rescue groups. In: Drones - Applications. London, UK: InTech. DOI: 10.5772/intechopen.73320.

24.

Redmon

Divvala

Girshick

, et al. (2016) You only look once: unified, real-time object detection. 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, 8th June 2016, 779–788. DOI: 10.1109/CVPR.2016.91.

25.

Ronneberger

Fischer

Brox

(2015) U-net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science 9351: 34–241. doi: 10.1007/978-3-319-24574-4_28.

26.

Roy

(2011) Slumdog cities: rethinking subaltern urbanism. International Journal of Urban and Regional Research 35(2): 223–238. DOI: 10.1111/j.1468-2427.2011.01051.x.

27.

Secretaría de Movilidad Administración General Municipio de Quito (2017) Reestructuración de la Red de Transporte Público de Pasajeros del Distrito Metropolitano de Quito. Quito: Administracíon General Municipio del Distrito Metropolitano de Quito. Available at: https://www7.quito.gob.ec/mdmq_ordenanzas/Comisiones-del-Concejo/Movilidad/2017/2017-09-05/1.-Rutas-y-frecuencias/rutas-y-frecuencias/Informe-IV_-Reestructuracíon-de-la-Red-de-TP-del-DMQ_Resumen-ejecutivo.Agosto-1.pdf.

28.

Singla

(2014) Motion Detection Based on Frame Difference Method. International Journal of Information & Computation Technology, 1559–1565. Available at: https://api.semanticscholar.org/CorpusID:16062556.

29.

Stojanovski

Partanen

Samuels

, et al. (2020) Viewpoint: city information modelling (CIM) and digitizing urban design practices. Built Environment 46(4): 637–646. DOI: 10.2148/benv.46.4.637.

30.

Tan

Liu

Zhao

, et al. (2018) UAV-image-based illegal activity detection for urban subway safety. In: Themistocleous

Hadjimitsis

Michaelides

, et al. (eds) Sixth International conference on remote sensing and geoinformation of the environment (RSCy2018), Paphos, Cyprus, 6-29 March 2018, SPIE, 9. DOI: 10.1117/12.2323087.

31.

Thakur

Pahwa

Gupta

(2015) Brain tumor Detection, Segmentation Using Watershed Segmentation and Morphological Operation. IJRET: International Journal of Research in Engineering and Technology. Available at: https://api.semanticscholar.org/CorpusID:88317235.

32.

Transporte de pasajeros (2015) Nuevo servicio de transporte para Calderón beneficia a más de 110 mil personas. Pasajeros Quito. https://www.trolebus.gob.ec/index.php?option=com_content&view=article&id=230:nuevo-servicio-de-transporte-para-calderon-beneficia-a-mas-de-110-mil-personas&catid=84&Itemid=599

33.

Tzelepi

Tefas

(2017) Human crowd detection for drone flight safety using convolutional neural networks. In: 2017 25th European Signal processing conference (EUSIPCO), Kos Island, Greece, 28 August 2017, 743–747. DOI: 10.23919/EUSIPCO.2017.8081306.

34.

Velastin

Fernández

Espinosa

, et al. (2020) Detecting, tracking and counting people getting on/off a metropolitan train using a standard video camera. Sensors 20(21): 6251. DOI: 10.3390/s20216251.

35.

Xiao

Zhou

Cui

, et al. (2021) Deep learning for occluded and multi-scale pedestrian detection: a review. IET Image Processing 15(2): 286–301. DOI: 10.1049/ipr2.12042.

36.

Wang

, et al. (2021) Towards an AI-driven framework for multi-scale urban flood resilience planning and design. Computational Urban Science 1(1): 11. DOI: 10.1007/s43762-021-00011-0.

37.

Yigitcanlar

Corchado

Mehmood

, et al. (2021) Responsible urban innovation with local government artificial intelligence (AI): a conceptual framework and research agenda. Journal of Open Innovation: Technology, Market, and Complexity 7(1): 71. DOI: 10.3390/joitmc7010071.

38.

Zhu

Lyu