Sage Journals: Discover world-class research

Abstract

Technologies in the smart city, such as autonomous vehicles and delivery robots, promise to increase the mobility and freedom of people with disabilities. These technologies have also failed to “see” or comprehend wheelchair riders, people walking with service animals, and people walking with bicycles—all outliers to machine learning models. Big data and algorithms have been amply critiqued for their biases—harmful and systematic errors—but the harms that arise from AI's inherent inability to handle nuance, context, and exception have been largely overlooked. In this paper, I run two machine learning models across nine cities in the United States to attempt to fill a gap in data about the location of curb ramps. I find that while curb ramp prediction models may achieve up to 88% accuracy, the rate of accuracy varied in context in ways both predictable and unpredictable. I look closely at cases of unpredictable error (outlier bias), by triangulating with aerial and street view imagery. The sampling of cases shows that while it may be possible to conjecture about patterns in these errors, there is nothing clearly systematic. While more data and bigger models might improve the accuracy somewhat, I propose that a bias toward outliers is something fundamental to machine learning models which gravitate to the mean and require unbiased and not missing data. I conclude by arguing that universal design or design for the outliers is imperative for justice in the smart city where algorithms and data are increasingly embedded as infrastructure.

Keywords

artificial intelligence deep learning bias accessibility disability United States

Introduction

In 2018, 3 years after Uber started experimenting with autonomous vehicles (Metz and Conger, 2020), Elaine Herzberg was walking her bicycle across the street in Tempe, Arizona, and was struck and killed by one of Uber's self-driving cars (Wakabayashi, 2018). The algorithms detected her but classified her as a vehicle, then a bicycle, then other, and on and on until it was too late. How did this happen? The algorithm trained to recognize pedestrians was not taught to look for jaywalkers and had never seen anyone walking a bicycle (Marshall and Davies, 2019; Whittaker et al., 2019). Delivery robots on city sidewalks have similarly failed to recognize pedestrians. Haben Girma, who is blind and walks with a service animal, was blocked on the sidewalk by a delivery robot (Girma, 2020). Emily Ackerman, a power wheelchair rider, had a similar experience with a robot in Pittsburg. As she writes, “The robot was sitting motionless on the curb cut on the other side of Forbes Avenue. It wasn’t crossing with the rest of the pedestrians, and when I reached the curb, it didn’t move as the walk signal was ending. I found myself sitting in the street as the traffic light turned green, blocked by a non-sentient being incapable of understanding the consequences of its actions” (Ackerman, 2019). These examples point to artificial intelligence's (AI) potentially deadly inability to deal with context, nuance, and outliers which leads to a particular harm or bias to people with disabilities.¹

With unintended consequences that range from merely annoying to fatal, when these kinds of technologies consider accessibility and disability in their boosterism, they often promise to increase the mobility, freedom, and inclusion of people with disabilities in urban environments. Such smart city rhetoric is often techno-ableist, meaning infused with an ableist view of “disability that at once talks about empowering disabled people through technologies while at the same time reinforcing ableist tropes about what body-minds are good to have and who counts as worthy” (Shew 2020, 43; see also Newman-Griffis et al. 2023). Disability justice scholars and activists acknowledge ambivalence toward technologies which can at once reinforce ableist notions while also contributing to independent living (e.g., see references to cyborg in Kafer, 2013). We would be remiss to simply critique these developments in the smart city as techno-ableist at best. For example, autonomous vehicles, which use deep learning to sense the environment in real time (Li et al., 2020), might give people who cannot drive the ability to go wherever they like. In the United States, people with disabilities take fewer trips than those without (Brumbaugh, 2018). Many believe that self-driving cars would begin to bridge that long-standing gap (for example, Alexiou, 2021; Claypool et al., 2017; Fiol and Weng, 2022; Reardon, 2021; VentureBeat, 2020). The same AI sensing technology powers delivery robots, which could similarly improve quality of life by bringing prescriptions or groceries to people who otherwise hire someone to do those errands (Hossain, 2022).² Other scholars have used AI tools to examine the accessibility of the transportation environment, including sidewalk assessment (Abbott et al., 2018; Bolten et al., 2015; Ku et al., 2022; Luo et al., 2019; Marangão et al., 2022), Americans with Disabilities Act (ADA)³ compliance evaluation (Ai and Tsai, 2016; Goldchain, 2017; Saha et al., 2019), and crosswalk identification in aerial images (Ahmetovic et al., 2017; Berriel et al., 2017; Ghilardi et al., 2018; Li et al., 2023).

The case of transportation, artificial intelligence, and accessibility illuminates weaknesses and contradictions in the smart city. Disability justice scholars and practices of universal or inclusive design may provide helpful avenues for addressing the harms of data and algorithmic bias. In this paper, I use the specific case of curb ramp classification across two algorithms, multiple data sources, and nine cities to understand the textures of outlier bias on an infrastructural feature. This analysis allows me to examine how error varies across context in ways that are not systematic and thus cannot just be coded in. Drawing on disability justice and inclusive design, I propose a new frame for interpreting and working toward a more universal right to the smart city.

Bias in the smart city

The smart city is not something wholly new, but rather a new iteration of old ideas (Kitchin et al., 2023). It is also an empty signifier which “contains no inherent meaning on its own, but remains flexible in its applications potential” (Burns, Fast and Mackinnon 2023, 4). For the purposes of this research, I define the smart city as a city run on complex algorithms and ever bigger datasets. In this smart city, the data and algorithms are part of the infrastructure.

Data and algorithms are biased—the error is distributed unevenly. This uneven error often disproportionately harms certain populations (D’Ignazio and Klein, 2020; O’Neil, 2017). These biases emerge with the application of new algorithms to new arenas of life. For example, facial recognition software less accurately identifies female and non-white faces (Buolamwini and Gebru, 2018), people of color are more likely to be labeled criminals by policing algorithms (Selbst, 2017) and more likely to receive longer jail sentences (Angwin and Larson, 2016), and families in poverty are more likely to be targeted by algorithms for child welfare investigations (Eubanks, 2018). New iterations of generative AI are no better and have been observed amplifying racial and gender discrimination (Nicoletti and Bass, 2023; Srinivasan and Uchino, 2021).

Some systematic and harmful errors could have been caught before they did harm by simply looking for patterns in model error. Google's ethical AI team released a paper in 2019 that amply covers a framework for doing just that. Their framework, “model cards for model reporting” suggests conducting “benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups” (Mitchell et al., 2019). While the standardization of such ethical reporting is more complex in practice, the solutions are straightforward.

However, there is another class of errors which are potentially more pernicious because they are not so cleanly systematic. I call these kinds of errors outlier bias because they systematically exclude or harm any person or event that falls outside of a social or algorithmic conception of “normal.”⁴ People with disabilities are an expansive constituency group, and the kinds of algorithmic errors that negatively impact this group could not be coded out of an algorithm easily (Treviranus, 2018). An algorithm used for employment screening, for example, has been shown to disproportionately harm people with disabilities (Engler, 2019; Whittaker et al., 2019). Even recognizing this, weeding it out is almost impossible. As Shari Trewn, an accessibility researcher at IBM states, “The way that AI judges people is with who it thinks they’re similar to—even when it may never have seen anybody similar to them—is a fundamental limitation in terms of fair treatment for people with disabilities” (Engler, 2019). The encounters with autonomous vehicles and robots outlined above fit this kind of non-systematic outlier error within machine learning.

Classification, emergent consequences, and a right to be an outlier in the smart city

Outlier bias illuminates avenues for bias and harm in the smart city that lack technical solutions. Not everything can be counted or turned into data (Bowker and Star, 2000; D’Ignazio and Klein, 2020; Onuoha, 2019). Yet, data algorithms from simple linear regression to generative AI assume that data are normal, un-skewed, not missing, and not biased and do not contain outliers. Practically speaking, “no classification system, any more than any representation, may specify completely the wildness and complexity of what is represented” (Bowker and Star, 2000: 232). Thus, no algorithm that requires data can be unbiased and that bias is likely to emerge in wild and unpredictable ways.

If the right to the city is the right to live in the city and be unalienated from the production of new forms of urban life (Attoh, 2011; Harvey, 2003; Lefebvre, 1992), what does this mean in the smart city? Disability justice scholars and practices of universal or inclusive design provide avenues for productive work. Specifically, design from and for the margins is most likely to benefit most people (Hamraie, 2017; Hamraie and Fritsch, 2019; Treviranus, 2021). The curb ramp itself is an example of universal design (see Figure 1). While a product of the disability rights movement, it benefits anyone on or accompanied by wheels (e.g., a parent pushing a stroller or a traveler pulling luggage).

Figure 1.

Curb ramp (United States Department of Justice Civil Rights Division, 2012; author photo).

What would the universal right to the smart city look like in practice? In this paper I explore that question through the case of curb ramp classification in nine urban environments across the United States using two machine learning algorithms. First, the lack of data on curb ramp locations disadvantages people on wheels from navigating the environment and stunts ADA compliance enforcement. Second, the curb ramp is identifiable and classifiable by humans but also ranging in characteristics so that it is more difficult for machines and algorithms to identify. Using triangulation and qualitative methods, I explore algorithmic error to contribute to understanding outlier bias.

Methods

Study area and data

AI methods require training data. I created a training set based on the open data about curb ramps for nine cities across the United States: Arlington, Boston, Denver, Indianapolis, Nashville, San Francisco, Seattle, Spokane, and Washington DC. Based on work assessing the open data practices across U.S. municipalities (Deitz et al., 2021), these nine cities were the only ones with both adequate data on curb ramp locations (for labeling), LiDAR point cloud data that were open and spatially/temporally proximate to the curb ramp data, and aerial imagery at the time of this analysis. These data were sourced from the USGS (2020) (see Table 1). The LiDAR point clouds had densities ranging from 4 to 45 points per meter. Data on the location of streets and landmarks came from the Census’ TIGER/line shapefile program (US Census Bureau, 2020). I classified ground points in the LiDAR data using the standard algorithm in ArcGIS. This algorithm has a tolerance for slope variation that allows it to capture gradual changes in topography (ESRI, 2020). With the street data, I created a point feature for intersections by finding all locations where two streets crossed.

Table 1.

Data source and year of measurement*.

Municipality	Aerial imagery	LiDAR	Curb ramp locations
Arlington	2016	2018	Arlington County, VA, 2020
Boston	2016	2013	City of Boston, 2014
Denver	2015	2013	City of Denver, 2018
Indianapolis	2014	2016	City of Indianapolis, 2018
Nashville	2014	2016	City of Nashville, 2019
San Francisco	2016	2018	City of San Francisco, 2020
Seattle	2015	2016	City of Seattle, 2020
Spokane	2015	2015	City of Spokane, 2020
Washington DC	2015	2014	City of Washington DC, 2010

*All TIGER/line data are from 2020.

I then created a tessellation of hexagons across each city. Each hexagon has an approximate area of 50,000 square feet (see Figure 2). I selected one of these hexagons per city as my training sample for a total of nine hexagons. This was done to reduce data size and to have a balanced sample⁵ area from each city. I created a second sample of 10,000 square foot hexagons as the validation set for each city.

Figure 2.

Training and validation samples.

Within these nine hexagons, I created a second hexagon tessellation of equal areas of about 50 square feet. This was done for multiple reasons—to capture the aerial nature of curb ramps in real life, to create a balanced sample of labeled points and to add spatial contiguity to the balanced sample. While a curb ramp is not necessarily a discrete entity (but rather a sloped and blended area connecting paths), most curb ramps do not extend beyond 50 square feet.⁶ These smaller hexagons were clipped to the LiDAR-labeled ground locations because a curb ramp is not going to be found, for example, on the top of a building or in a tree (see Figure 3).

Figure 3.

Ground hexagon samples.

Next, I created a sample of the smaller hexagons with equal numbers of ramps and non-ramps across the training and validation areas of each city. In total, this resulted in 25,822 hexagons for training and 5,054 for validation. These hexagon locations were used to clip the point data from the LiDAR point cloud. The point dataset had 1,117,930 points for training and 251,131 for validation. Within the point training and validation set, I calculated the distance to the nearest: street, intersection, and landmark. I also summarized the DEM and aerial image raster information at the point. Figure 4 summarizes these data cleaning processes.

Figure 4.

Overview of data cleaning and methodological approach.

Machine learning

I ran two machine learning algorithms to predict curb ramp locations and triangulate predicted results. The selected models used were random forest regression and a point-based convoluted neural network (PointCNN; see Figure 4). The random forest method was used because it is one of the oldest machine learning methods and relatively less opaque. The PointCNN model is a cutting-edge deep learning model used for sensing environments, and the mechanisms for learning are hard to decipher. The differences between the models are interpretability, newness, and spatial awareness.

Random forest classification

Random forest classification is one of the oldest machine learning methods. It was developed by Tin Kam Ho in 1995 and improved by Leo Breiman in the early 2000s (Breiman, 2001; Ho, 1995). The random forest regression method simply involves running multiple models or decision trees through a set of training and prediction data and allowing them to vote on the answer for each location. I ran the random forest using the scikit-learn library for Python in Google Colab. The variable to predict or classify was a binary ramp value, and the potential explanatory variables were distance to street, distance to intersection, distance to landmark, elevation from a digital elevation model (DEM), and raster value across all four aerial imagery bands. The random forest model is spatially unaware, in the sense that it does not account for or weigh neighboring features. The distance measures and raster values were added as predictors to manually give the models spatial information.

PointCNN

PointCNN is a deep learning model using convolutional neural networks (Li, 2021; Li et al., 2018). Convolutional neural networks, or CNNs, are a promising advancement in object detection using AI and have been used extensively on autonomous vehicles. At the basic level, a CNN scans over the input data and breaks it into smaller parts. PointCNN is especially suited for LiDAR data because it accounts for the irregular and unordered nature of point clouds. I ran PointCNN using the ArcGIS deep learning frameworks and Jupyter notebooks. I set up my model with 100 epochs, early stopping⁷, and a one-cycle learning rate. The one-cycle or cyclical learning rate method lets the learning rate cycle between reasonable boundaries and has been shown to achieve improved classification accuracy over fewer iterations without a need for further tuning (Smith, 2017). Extra features (beyond x, y, and z) used for learning included LiDAR intensity, scan angle, and return number. The validation loss stopped decreasing at 11 epochs.

Error and outlier analysis

I analyzed the success of these models on the classification task according to classic accuracy metrics, contextual benchmarking, triangulating between models, and a qualitative process of ground-truthing with Google Street View imagery.

Performance metrics

As stated above, machine learning results are predominately evaluated in terms of overall accuracy and by class (see Table 2). Overall accuracy is simply the proportion of correct classifications over all possible classification locations. Precision is used to understand the proportion of positive identifications that were correctly classified. Recall accounts for the number of false-negative classifications or the number of positive classifications that were correctly identified. I calculated all three-performance metrics for each model across the entire sample.

Table 2.

Performance metrics.

Accuracy	Precision	Recall
$\frac{t r u e p o s i t i v e s (t p) + t r u e n e g a t i v e s (t n)}{s a m p l e s i z e}$	$\frac{t p}{t p + f a l s e p o s i t i v e (f p)}$	$\frac{t p}{t p + f a l s e n e g a t i v e (f n)}$

I also looked at the rate of agreement between models and the original training data. There were eight possible combinations of results (see Table 3). The first two were that all were correct (all 0, all 1, where 0 is not ramp, and 1 is ramp). The second two were that all were incorrect (all false negative, all false positive). The next four were combinations of only one model or the other making a false classification (PointCNN (PCNN) false negative, PointCNN false positive, random forest (RF) false positive, and random forest false negative).

Table 3.

Possible classification result combinations.

Short name	ramp	rf	pcnn
All 0	0	0	0
All 1	1	1	1
All false negative	1	0	0
All false positive	0	1	1
PCNN false positive	0	0	1
PCNN false negative	1	1	0
RF false positive	0	1	0
RF false negative	1	0	1

Contextually benchmarking

I then looked at the error in a way like that proposed in “model cards for model reporting.” That is, “benchmarked evaluation in a variety of conditions” (Mitchell et al., 2019). In this case, my conditions were geographic area and model. I calculated the overall accuracy by city and model as well as the agreement between models by city. The model comparison could also be called triangulation.

Qualitative analysis

Finally, I used satellite imagery (Google Street View) and cartographic visualizations to better understand the error for a random sample of locations and look for potential non-systematic outliers. I mapped the model predictions on an aerial image and found the locations in Google Street View. Pairing the results, aerial imagery, and street view imagery allowed for thicker description of the performance measures and contextual benchmarking. This method also allowed for the identification of outliers and surprise trends. The act of showing rather than just telling and reporting opens the process to other interpretations and observations. The results of this qualitative work are presented in the discussion section below (see Figures 5–8).

Figure 5.

Street design by city.

Figure 6.

Denver ground truth. Each of the four quadrants (a, b, c, d) shows the hexagon data area with colors representing the kinds of point prediction that occurred at that location (I aggregated similar predictions back to a polygon from a point). The base of the map is an aerial satellite image. The images labeled 1–5 are from Google Street View at that location.

Figure 7.

Arlington ground truth. This is one intersection with five ramp locations (hexagons) and Google Street View imagery showing more the view on the ground. Letters a–e connect the imagery to the hexagon locations.

Figure 8.

Seattle ground truth. This visual is set up like Figure 7.

Classifying curb ramps

Performance metrics

Across all cities, the random forest model performed much better than the PointCNN in terms of accuracy—88% compared to 56% (see Table 4). The precision on the random forest model was slightly better than the recall because the number of false-positive classifications was slightly smaller than the number of false negatives (89.8% compared to 89.4%). The recall on the PointCNN model classifications is much better than precision (71.9% compared to 59.9%), meaning there were more false-positive than false-negative values.

Table 4.

Classification accuracy.

	Validation (n)	Accuracy	Precision	Recall
Random Forest (rf)	251,131	88.0%	89.8%	89.4%
PointCNN (pcnn)	251,131	56.0%	59.9%	71.9%

The PointCNN included information about LiDAR intensity, scan angle, and return number, but there is no way of calculating the importance of each variable. Importance metrics in random forests are generated by calculating each time the variable is responsible for a split or decision in each decision tree. In the random forest, distance to street intersection and distance to street (two closely related variables) were the most important variables (see Table 5). Street intersection was responsible for a split 47% of the time and distance to street 17% of the time. Elevation was the next most important variable (9.6%), followed by distance to landmark (7.1%), and then the aerial image bands.

Table 5.

Variable importance, random forest.

Variable	Responsible for a split in random forest (%)
Distance to landmark	07.1
Distance to street intersection	47.4
Distance to street	17.1
Elevation	09.6
Aerial image
Band 1	04.6
Band 2	04.6
Band 3	05.1
Band 4	04.4

Contextual benchmarking

Triangulation (algorithmic context)

Across all the models, there were eight different combinations of possible results (including the initial ramp label used for training from the municipal open data). Just over half of the points (50.1%) were classified correctly by both models (12.9% all 0 and 37.2% all 1; see Table 6). A truly correct classification of a curb ramp was more common than a true negative. Only 6.1% were classified incorrectly by both models (1.8% all false negative and 4.3% all false positive). The PointCNN model falsely classified almost a quarter of the points as ramp (23.4%) and falsely classified 14.4% as non-ramp—these were points that the random forest model correctly classified. The random forest model classified 5.8% of the points incorrectly when the PointCNN classified them correctly (1.6% FP and 4.2% FN).

Table 6.

Classification results across all models.

	Ramp, rf, pcnn	Total points	%
All 0	0, 0, 0	32,456	12.9%
All 1	1, 1, 1	93,413	37.2%
All FN	1, 0, 0	4601	01.8%
All FP	0, 1, 1	10,733	04.3%
pcnn FP	0, 0, 1	58,943	23.4%
pcnn FN	1, 1, 0	36,188	14.4%
rf FP	0, 1, 0	4011	01.6%
rf FN	1, 0, 1	10,786	04.2%

Spatial context

The classification accuracy varied widely across cities and models (see Table 7). The average accuracy for the random forest model was 88% across all cities. Arlington (91%), Denver (96%), San Francisco (90%), Seattle (94%), and Spokane (92%) performed better than average. Boston (87%), Indianapolis (74%), Nashville (85%), and Washington DC (85%) performed worse than average. The classification accuracy of the PointCNN model was higher than average in Boston (62%), Denver (67%), Nashville (58%), Spokane (75%), and Washington DC (60%). It was lowest in Arlington (41%), followed by Indianapolis (53%), San Francisco (45%), and Seattle (45%). Since the data are balanced between ramp and non-ramp classes, a model could predict every point as either ramp or non-ramp and achieve better accuracy than Arlington, San Francisco, and Seattle. In summary, Denver and Spokane performed better than average across both models, and Indianapolis performed worse.

Table 7.

Accuracy across models and cities.

Location	rf	pcnn
Overall	88.0%	56.0%
Arlington	91.2%	41.1%
Boston	87.2%	61.9%
Denver	96.3%	66.7%
Indianapolis	73.8%	52.8%
Nashville	85.0%	58.0%
San Francisco	90.4%	45.4%
Seattle	93.5%	45.0%
Spokane	92.0%	74.8%
Washington DC	84.8%	59.9%

Spatial and algorithmic context

In the same way that accuracy varied by city, the agreement between algorithms also varied across cities (see Table 8). Arlington (38%), Boston (32%), Denver (48%), Indianapolis (30%), Nashville (48%), and San Francisco (42%) had above-average agreement between models on non-ramp classifications (all 0). The remaining cities—Seattle (39%), Spokane (70%), and Washington DC (47%)—had above-average model agreement on positive ramp classification (all 1). All models falsely classified ramp locations as non-ramp (all false negative or FN) more than average in Arlington (6%), Boston (4%), Indianapolis (10%), Nashville (6%), and San Francisco (6%). In Spokane (5%) and Washington DC (6%), a slightly higher proportion of points were falsely labeled as ramps across models (all false positive or FP). The random forest model performed worse than average by falsely labeling ramp points as ramps (rf FN) in Washington DC (7%). On average, 2% of the points were falsely classified as ramps by only the random forest model—this was higher in Arlington (3%), Boston (5%), Indianapolis (15%), Nashville (7%), and San Francisco (4%). The performance of the PointCNN in false-negative and false-positive classifications was widely varying across cities. The rate of false-negative classifications was higher in Arlington (53%), Boston (28%), Denver (24%), Indianapolis (34%), Nashville (23%), and San Francisco (49%). The number of false-positive classifications by the PointCNN was nearly double average in Seattle (49%) and slightly higher than average in Washington DC (26%).

Table 8.

Classification results across models by city.

Location	All 0	All 1	All FN	All FP	rf FN	rf FP	pcnn FN	pcnn FP
Overall	12.9%	37.2%	1.8%	4.3%	4.3%	1.6%	14.4%	23.5%
Arlington	38.4%	0.1%	6.3%	0.0%	0.0%	2.5%	52.6%	0.1%
Boston	31.7%	23.5%	4.1%	2.0%	2.1%	4.5%	28.2%	3.8%
Denver	47.5%	17.9%	0.6%	1.8%	0.4%	0.9%	24.1%	6.9%
Indianapolis	30.3%	6.4%	9.7%	0.5%	1.3%	14.7%	34.2%	2.8%
Nashville	48.1%	2.3%	6.4%	0.9%	0.5%	7.1%	22.5%	12.1%
San Francisco	41.6%	0.1%	5.9%	0.0%	0.0%	3.7%	48.8%	0.0%
Seattle	3.5%	38.9%	0.1%	3.7%	2.5%	0.2%	1.9%	49.3%
Spokane	1.8%	70.0%	0.2%	4.8%	2.7%	0.3%	4.6%	15.6%
Washington DC	5.2%	46.6%	0.9%	6.2%	7.1%	1.0%	7.5%	25.5%

Discussion

The results of this study confirm that machine learning algorithms are generally good at learning classifications. The random forest model was more accurate than the deep learning PointCNN model at curb ramp classification in this context.⁸ The random forest classified ramp locations with 88% accuracy, and the PointCNN achieved only 56% accuracy. The PointCNN model's classifications were only slightly better than assigning every point as a ramp. This is a surprising result because PointCNN is a spatially aware machine learning model and tends to be better suited for spatial applications such as this. The results also show how machine learning algorithms exhibit bias or systematic error. In this case, the system underlying the error is not immediately apparent.

These models likely differed in performance in part because they had different information to learn from. The random forest model had information about streets and intersections, which proved to be important variables for random forest decisions or splits. Distance to street intersection accounted for 47% of the random forest decisions, and distance to street accounted for 17%.

Predictable design

The importance of street design in what the random forest learned about classifying curb ramps is also apparent by looking at maps of the cities that performed best compared to others. The classification accuracy in Denver was 96%—the city, or at least the portion of the city used for training and validation is almost perfectly gridded (see Figure 5). The random forest also performed better than average in Seattle, Spokane, Arlington, and San Francisco. These cities are less perfectly gridded than Denver but certainly more than Washington DC, Nashville, Boston, or Indianapolis.

While this result seems obvious—machine learning was better at prediction in municipalities with predictable or uniform design—the implications are important. If AI only works on cases that are similar, predictable, and dominant, this has serious social implications for human uniqueness, variability, and individuality. Disability scholars have long pointed to the harmful effects of categories like “normal” and “abnormal” (Goggin et al., 2017; Saltes, 2013; Shakespeare, 2007). AI further produces and reflects a normative vision of the world—where outliers are at best ignored and at worst disadvantaged.

Nuance

In Denver, the random forest produced 167 false-positive predictions and 62 false-negative predictions (2.7% and 1.0% of the city's validation points, respectively). Figure 6 shows a few of the outliers that were not correctly classified in the random forest model. In gray are areas where the random forest incorrectly classified points as ramp (a and c in Figure 6). In image a, it appears that the location is misaligned to where there is a ramp (see image a.1), and rather located at the path from someone's house (see image a.2). This path has most of the characteristics of a curb ramp but no gradual slope which makes it not functionally a curb ramp. It appears that the random forest classified the part that is like a ramp as a curb ramp and the edge where there is no transition as non-ramp (see green area of the hexagon in quadrant a). Functionally, a location that is almost a ramp in this way is missing the key functionality—transition from a higher to lower path. Location c points to a location where there is a ramp, but it appears to be misaligned in the data. Quadrant b in Figure 6 is a location with a ramp that was labeled by all models as non-ramp. This could be because it looks different than a typical ramp with the parallel dips running down the slope. Quadrant d is the most interesting for considering outliers. These locations were labeled falsely by all models as non-ramp. However, the area to the left is similar to quadrant a.2, in that it has all the characteristics of a ramp, but no transition to the street—it is just a path to the curb. Like a.2, this location should not be classified as a ramp because it is functionally not adequate. The hexagon area to the right in quadrant d is functionally a curb ramp but transitions very gradually and ends in a kind of gravel surface.

In Arlington, the behavior of the PointCNN model is the most surprising and appears to have influenced the above-average rate of agreement on non-ramp classifications and above-average all false-negative predictions. Figure 7 shows five locations where points were labeled falsely as non-ramps (green hexagons). Locations a, b, and e are all pedestrian curb ramps with red warning material. These ramps feed into a crosswalk and appear to have gradual elevation changes from sidewalk into the street. There is a very small portion of location e that was falsely labeled as non-ramp by the random forest (gray). In this case, the random forest was likely correct in the classification (recall that representing a ramp as a hexagon is not a perfect choice, but one made under the data and algorithm constraints). Locations d and c are slightly more difficult. Location d is a connector between streets and pedestrian pathways; however, it does not have an elevation change. The areas in purple mean that both models labeled the area as non-ramp. That the random forest labeled any points in that location as non-ramp is surprising; perhaps it picked up the warning material color or proximity to the street intersection. Location c appears to be misaligned; however, like location d, the points that the random forest model labeled as ramp (green) do not make a lot of sense visually unless the model was picking up the brightness of the crosswalk. These results are interesting because they suggest a disconnection between what a pedestrian curb ramp is by definition and function. Further, ramps a, b, e, and c all have similar design characteristics (red warning material, gradual slope change) so it is unclear why the random forest was correct about locations a, b, and e but mostly incorrect on c. This location is an outlier in some way that is not visually apparent.

In contrast to Arlington, in Seattle, the rate of false-positive classifications from the PointCNN model alone (pcnn FP) is much higher than average (49% of points). In Figure 8, the orange areas (a, c, and d) mostly had agreement between models in correctly identifying ramp locations. Those ramps are slightly weathered but have distinct transitions from flares to landing. Location e was falsely labeled as a ramp by the PointCNN, and location b was falsely labeled by both models (pink). These locations are not ramps, but narrow driveways that gradually change in elevation toward the street. Narrow driveways are a quite common feature in Seattle's design. These locations could functionally be used by people with disabilities as curb ramps but might be less safe because of the likelihood of cars entering and exiting. Recall, machine learning algorithms look for features that are most like each other. In this case, narrow driveways fit the “normal” characteristics of a curb ramp and are functionally like pedestrian curb ramps. At the same time, they could be more dangerous because their use will be shared with automobiles.

These results reveal the diversity of model error. It is more difficult to pinpoint the systematic nature of this error or bias. Machine learning is an excellent tool for classification and prediction. It can achieve accuracy and process large data far better than any human. These tools also make systematic errors which are likely to disproportionately affect categories of people, and there is a rich body of work on addressing bias (see, for example, Crawford 2017; Friedman and Nissenbaum 1996; Eubanks 2018). But another, equally pernicious kind of error cannot just be calculated out or benchmarked. This kind of error, the kind of error that led to the death of Elaine Herzberg and the stalled mobility of Haben Girma and Emily Ackerman, cannot be solved with computation nor code but requires the slow and messy human work of observation, description, and contextualization to unravel.

Conclusion

This case focused on classifying curb ramps, an infrastructural feature that is particularly important to the safe travel of people with disabilities and which has improved the lives of many others (e.g., people walking with children in strollers). This case illuminates much more. That is, not only the risk of people with disabilities to be alienated from urban life (Attoh, 2011; Lefebvre, 1992), but the risk posed to any person or event that is an outlier. We might even go further to say that in the smart city data (i.e., people and events) that violates algorithmic norms will either be tamed or otherwise face exclusion (Bowker and Star, 2000).

Notions of normality have long been tied to power and dominance at the exclusion of people with disabilities. Machine learning is simply faster and more efficient at categorizing and identifying normality. This means that anyone who falls outside of the norm—who is an outlier in statistical terms based on how and what environmental features they require for access—is likely to experience unforeseen harm. As scholar Jutta Treviranus recounted of her tinkering with autonomous vehicle models, “When I presented a capture of my friend to the learning models, they all chose to run her over…I was told that the learning models were immature models that were not yet smart enough to recognize people in wheelchairs…When I came back to test out the smarter models they ran her over with greater confidence” (Whittaker et al., 2019: 12). Behind the scenes, the model was likely trained to recognize wheelchair riders as pedestrians, but even then, it would learn a “normal” or average profile of a wheelchair rider. What then of the safety of a wheelchair rider with glow sticks on their wheels and a cat in their lap? These kinds of outliers and exceptions cannot be coded in.

There is a glut of work and reporting revealing bias in AI and proposing solutions, but bias itself is socially constructed, contextual, and varying. Who gets to decide what is fair and who benefits? Even if we were to agree on what bias is, under what conditions does it exist? And as these results highlight, what happens when the bias is bias toward a heterogeneous group such as people with disabilities? Importantly, these outliers represent lives at stake, not simply observations in a dataset. I propose that, to address this kind of error or bias, we need to look to qualitative methodologies. To rephrase, what AI needs is descriptions and interpretations in context. In this paper, I have made a first attempt at this kind of description through ground-truthing model error.

In the context of these results, I return to the questions, what kind of world do we want to live in? And who has the right to the city? Failing to think critically about these questions means that dominant narratives or imaginaries win the day. As Jasanoff and Kim write, “[w]hat we ‘see’ in familiar surroundings looks right, epistemically as well as normatively. So the socially conditioned eye can take for granted that all-male orchestras or all-black passengers on the backseats of buses, or even scenes of filth and abject poverty simply represent the rightful order of things” (2015: 14). As Turnbull writes, “[i]n the long run, social and cultural complexity cannot be winnowed away; it's all there is” (2003: 227). Valuing outliers is not the domain of AI, but I argue that we can still mobilize the power of machine learning if we also work to value exception and outliers through qualitative approaches. What I am suggesting here is inclusive design for the smart city (Treviranus, 2018, 2021).

Inclusive design for the smart city would shake up our imaginary about what technology can do and what we would like it to. It would ask different more specific questions based on the needs and desires of the outer rim—where are the curb ramps? Where are the quiet places? What do the unhoused need? Imagine traffic algorithms that prioritized the needs of everyone except cars or facial recognition software that just failed on white faces with goatees? Imagine policing algorithms that disproportionately arrested and locked up people wearing expensive ties? I am not advocating for those futures but rather thought experiments that move the horizon of what is possible to the outer rim and the acceptable harms to those with power. This is a different kind of imaginary about what technology can do and what we would like it to do. This work is hard and slow. It requires open eyes, reflexivity, and wading through messes of big data, but to fail to try to contextualize and understand AI is to fail to take responsibility for shaping the world we live in.

Footnotes

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Institute for Transportation and Communities (grant number NITC 16 Dissertation Fellowships 2020, #1374).

ORCID iD

Shiloh Deitz

Notes

References

Abbott

Deshowitz

Murray

, et al. (2018) Walknet: A deep learning approach to improving sidewalk quality and accessibility. SMU Data Science Review 1(1): 13.

Ackerman

(2019) Lessons From My Standoff With an Autonomous Sidewalk Robot. Bloomberg.com, 19 November. Available at: https://www.bloomberg.com/news/articles/2019-11-19/why-tech-needs-more-designers-with-disabilities (accessed 10 November 2020).

ADA National Network (2021) What is the Americans with Disabilities Act (ADA)? Available at: https://adata.org/learn-about-ada (accessed 28 April 2021).

Ahmetovic

Manduchi

Coughlan

, et al. (2017) Mind your crossings: Mining GIS imagery for crosswalk localization. ACM transactions on Accessible Computing 9(4): 1–25.

Tsai

(2016) Automated sidewalk assessment method for Americans with disabilities act compliance using three-dimensional mobile lidar. Transportation Research Record 2542(1): 25–32.

Alexiou

(2021) How Passengers With Disabilities Can Drive The Autonomous Vehicle Revolution. Forbes . 11 April. Available at: https://www.forbes.com/sites/gusalexiou/2021/04/11/how-passengers-with-disabilities-can-drive-the-autonomous-vehicle-revolution/ (accessed 6 July 2023).

Angwin

Larson

(2016) Machine Bias. ProPublica . 23 May. Available at: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing (accessed 6 November 2019).

Arlington County, VA (2020) Handicap Ramp Polygons. Available at: https://gisdata-arlgis.opendata.arcgis.com/datasets/handicap-ramp-polygons (accessed 13 February 2021).

Attoh

(2011) What kind of right is the right to the city? Progress in Human Geography 35(5): 669–685.

10.

Berriel

Lopes

de Souza

, et al. (2017) Deep learning based large-scale automatic satellite crosswalk classification. IEEE Geoscience and Remote Sensing Letters 14(9): 1513–1517.

11.

Bolten

Amini

Hao

, et al. (2015) Urban sidewalks: Visualization and routing for individuals with limited mobility. In: Proceedings of the 1st International ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics - UrbanGIS’15, 2015. Bellevue, WA, USA : ACM Press, 122–125.

12.

Bowker

Star

(2000) Sorting Things Out: Classification and Its Consequences. Cambridge, MA: MIT Press.

13.

Breiman

(2001) Random forests. Machine Learning 45(1): 5–32.

14.

Brumbaugh

(2018) Travel Patterns of American Adults with Disabilities. United States Bureau of Transportation Statistics.

15.

Buolamwini

Gebru

(2018) Gender shades: Intersectional accuracy disparities in commercial gender classiﬁcation. Proceedings of Machine Learning Research 2018: 15.

16.

Burns

Fast

Mackinnon

(2023) Towards Urban Digital Justice: The Smart City as an Empty Signifier. In: In: Digital (In)Justice in the Smart City. Toronto: University of Toronto Press. Available at: https://utorontopress.com/9781487527174/digital-injustice-in-the-smart-city (accessed 2 June 2023).3–21.

17.

City of Boston (2014) Pedestrian Ramp Inventory - Analyze Boston. Available at: https://data.boston.gov/dataset/pedestrian-ramp-inventory (accessed 13 February 2021).

18.

City of Denver (2018) Curb Ramps (2018). Available at: https://www.denvergov.org/opendata/dataset/city-and-county-of-denver-curb-ramps-2018 (accessed 13 February 2021).

19.

City of Indianapolis (2018) Indianapolis DPW Curb Ratings. Available at: https://data.indy.gov/ (accessed 13 February 2021).

20.

City of Nashville (2019) Sidewalk Inventory for ADA Self-Assessment. Available at: https://data.nashville.gov/Transportation/Sidewalk-Inventory-for-ADA-Self-Assessment/vpxc-b5te (accessed 13 February 2021).

21.

City of San Francisco (2020) Curb Ramps | DataSF | City and County of San Francisco. Available at: https://data.sfgov.org/City-Infrastructure/Curb-Ramps/ch9w-7kih (accessed 13 February 2021).

22.

City of Seattle (2020) City of Seattle Open Data portal. Available at: https://data.seattle.gov/ (accessed 25 January 2021).

23.

City of Spokane (2020) Sidewalk. Available at: https://data-spokane.opendata.arcgis.com/datasets/sidewalk (accessed 13 February 2021).

24.

City of Washington, DC (2010) Sidewalk Ramps 2010. Available at: https://opendata.dc.gov/ (accessed 13 February 2021).

25.

Claypool

Bin_Nun

Gerlach

(2017) Self-Driving Cars: The Impact on People with Disabilities. January. Ruderman Family Foundation. Available at: https://rudermanfoundation.org/white_papers/self-driving-cars-the-impact-on-people-with-disabilities/ (accessed 6 July 2023).

26.

Crawford

(2017) The Trouble with Bias. In: NIPS, 5 December 2017. Available at: https://www.youtube.com/watch?v=fMym_BKWQzk&t=698 s (accessed 24 November 2019).

27.

Deitz

Lobben

Alferez

(2021) Squeaky wheels: Missing data, disability, and power in the smart city. Big Data & Society 8(2): 20539517211047736.

28.

D’Ignazio

Klein

(2020) Data Feminism. Cambridge, Massachusetts: The MIT Press.

29.

Engler

(2019) For some employment algorithms, disability discrimination by default. In: Brookings. Available at: https://www.brookings.edu/blog/techtank/2019/10/31/for-some-employment-algorithms-disability-discrimination-by-default/ (accessed 1 June 2020).

30.

ESRI (2020) Classify LAS Ground (3D Analyst). Available at: https://pro.arcgis.com/en/pro-app/latest/tool-reference/3d-analyst/classify-las-ground.htm (accessed 7 February 2021).

31.

Eubanks

(2018) Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. New York, NY: St. Martin’s Press.

32.

Fiol

Weng

(2022) Shared Autonomous Vehicles Could Improve Transit Access for People with Disabilities If Regulated Appropriately. Urban Wire, 4 October. Available at: https://www.urban.org/urban-wire/shared-autonomous-vehicles-could-improve-transit-access-people-disabilities-if-regulated (accessed 6 July 2023).

33.

Friedman

Nissenbaum

(1996) Bias in computer systems. ACM Trans. Inf. Syst 14(3): 330–347.

34.

Ghilardi

Jacques

Jr Manssour

(2018) Crosswalk Localization from Low Resolution Satellite Images to Assist Visually Impaired People. DOI: 10.1109/MCG.2016.50.

35.

Girma

(2020) The robots occupying our sidewalks. In: TechCrunch. Available at: https://social.techcrunch.com/2020/08/11/the-robots-occupying-our-sidewalks/ (accessed 10 November 2020).

36.

Goggin

Steele

Cadwallader

(2017) Normality and disability: Intersections among norms, law, and culture. Continuum 31(3): 337–340.

37.

Goldchain

(2017) Project Sidewalk, the crowd-sourced map of D.C.’s least accessible sidewalks, is nearly complete. Available at: https://dc.curbed.com/2017/10/17/16488310/project-sidewalk-accessibility-map (accessed 3 June 2020).

38.

Hamraie

(2017) Building Access: Universal Design and the Politics of Disability, 3rd ed. edition Minneapolis: Univ Of Minnesota Press.

39.

Hamraie

Fritsch

(2019) Crip technoscience manifesto. Catalyst: Feminism, Theory, Technoscience 5(1): 1–33.

40.

Harvey

(2003) The right to the city. International Journal of Urban and Regional Research 27(4): 939–941.

41.

(1995) Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition 1: 278–282.

42.

Hossain

(2022) Self-driving robots: A revolution in the local delivery. California Management Review Insights. Available at: https://cmr.berkeley.edu/2022/04/self-driving-robots-a-revolution-in-the-local-delivery/ (accessed 6 July 2023).

43.

Jasanoff

Kim

S-H

(2015) Dreamscapes of Modernity: Sociotechnical Imaginaries and the Fabrication of Power, 1st edition Chicago ; London: University of Chicago Press.

44.

Kafer

(2013) Feminist, Queer, Crip, Illustrated edition Bloomington, Indiana: Indiana University Press.

45.

Kitchin

Mackinnon

Burns

, et al. (2023) A dialoge with rob kitchin. In: Digital (In)Justice in the Smart City. Toronto: University of Toronto Press, 107–116. Available at: https://utorontopress.com/9781487527174/digital-injustice-in-the-smart-city (accessed 10 July 2023).

46.

Choi

, et al. (2022) Assessment of the resilience of pedestrian roads based on image deep learning models. Proceedings of the Institution of Civil Engineers - Municipal Engineer 175(3): 135–147.

47.

Lefebvre

(1992) The production of space (tran. D Nicholson-Smith). 1st edition. Malden: Wiley-Blackwell.

48.

Sheng

Irvin

, et al. (2023) Marked crosswalks in US transit-oriented station areas, 2007–2020: A computer vision approach using street view imagery. Environment and Planning B: Urban Analytics and City Science 50(2): 350–369.

49.

(2021) yangyanli/PointCNN. Python. Available at: https://github.com/yangyanli/PointCNN (accessed 13 January 2021).

50.

Sun

, et al. (2018) PointCNN: Convolution On X-Transformed Points. arXiv:1801.07791 [cs]. Available at: http://arxiv.org/abs/1801.07791 (accessed 11 November 2020).

51.

Zhong

, et al. (2021) Deep learning for LiDAR point clouds in autonomous driving: A review. IEEE Transactions on Neural Networks and Learning Systems 32(8): 3412–3432. DOI: 10.1109/TNNLS.2020.3015992.

52.

Linton

(1998) Claiming Disability: Knowledge and Identity. New York, NY: NYU Press.

53.

Luo

Wei

, et al. (2019) Developing an aerial-image-based approach for creating digital sidewalk inventories. Transportation Research Record 2673(8): 499–507.

54.

Marangão

GHDF

Magagnin

Fontes

MSGDC

, et al. (2022) Methodologies for measuring the quality of urban infrastructure for pedestrian mobility: A systematic literature review. Revista Nacional de Gerenciamento de Cidades 10(78).

55.

Marshall

Davies

(2019) Uber’s Self-Driving Car Didn’t Know Pedestrians Could Jaywalk. Wired, 5 November. Available at: https://www.wired.com/story/ubers-self-driving-car-didnt-know-pedestrians-could-jaywalk/ (accessed 5 February 2021).

56.

Metz

Conger

(2020) Uber, After Years of Trying, Is Handing Off Its Self-Driving Car Project. The New York Times . 7 December. Available at: https://www.nytimes.com/2020/12/07/technology/uber-self-driving-car-project.html (accessed 6 February 2021).

57.

Mitchell

Zaldivar

, et al. (2019) Model Cards for Model Reporting. In: FAT, 2019. DOI: 10.1145/3287560.3287596.

58.

Newman-Griffis

Rauchberg

Alharbi

, et al. (2023) Definition drives design: Disability models and mechanisms of bias in AI technologies. First Monday. DOI: 10.5210/fm.v28i1.12903.

59.

Nicoletti

Bass

(2023) Humans Are Biased. Generative AI Is Even Worse. Bloomberg.com, 27 June. Available at: https://www.bloomberg.com/graphics/2023-generative-ai-bias/ (accessed 10 July 2023).

60.

O’Neil

(2017) Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, Reprint edition New York: Broadway Books.

61.

Onuoha

(2019) MimiOnuoha/missing-datasets. Available at: https://github.com/MimiOnuoha/missing-datasets (accessed 16 November 2019).

62.

Reardon

(2021) Disabled passengers were promised autonomous vehicles — they’re still waiting. The Verge . 20 December. Available at: https://www.theverge.com/22832657/autonomous-vehicles-disabled-accessible-challenges-design (accessed 6 July 2023).

63.

Saha

Hara

Froehlich

, et al. (2019) Project sidewalk: A web-based crowdsourcing tool for collecting sidewalk accessibility data at scale. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI ‘19, 2019. Glasgow, Scotland Uk: ACM Press, 1–14.

64.

Saltes

(2013) Abnormal’ bodies on the borders of inclusion: Biopolitics and the paradox of disability surveillance. Surveillance & Society 11(1/2): 55–73.

65.

Selbst AD (2017) Disparate impact in big data policing. Georgia Law Review 52(1): 109–196.

66.

Shakespeare

(2007) Disability, normality, and difference. In: Cockburn

Pawson

(eds) Psychological Challenges in Obstetrics and Gynecology: The Clinical Management. London: Springer, 51–59.

67.

Shew

(2020) Ableism, technoableism, and future AI. IEEE Technology and Society Magazine 39(1): 40–85.

68.

Smith

(2017) Cyclical Learning Rates for Training Neural Networks. arXiv:1506.01186 [cs]. Available at: http://arxiv.org/abs/1506.01186 (accessed 7 February 2021).

69.

Srinivasan

Uchino

(2021) Biases in Generative Art – A Causal Look from the Lens of Art History. arXiv:2010.13266. arXiv. Available at: http://arxiv.org/abs/2010.13266 (accessed 10 July 2023).

70.

Treviranus

(2018) Sidewalk Toronto and Why Smarter is Not Better*. Available at: https://medium.datadriveninvestor.com/sidewalk-toronto-and-why-smarter-is-not-better-b233058d01c8 (accessed 12 June 2023).

71.

Treviranus

(2021) The Three Dimensions of Inclusive Design: Part One **. In: fwd50. Available at: https://medium.com/fwd50/the-three-dimensions-of-inclusive-design-part-one-103cad1ffdc2 (accessed 5 June 2023).

72.

Turnbull

(2003) Masons, Tricksters and Cartographers: Comparative Studies in the Sociology of Scientific and Indigenous Knowledge. New York, NY: Taylor & Francis.

73.

United States Department of Justice Civil Rights Division (2012) ADA Tool Kit: Curb Ramps and Pedestrian Crossings Under Title II of the ADA. Available at: https://www.ada.gov/pcatoolkit/chap6toolkit.htm (accessed 19 May 2020).

74.

US Census Bureau (2020) TIGER/Line Shapefiles. Available at: https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html (accessed 25 January 2021).

75.

USGS (2020) TNM Download v2. Available at: https://viewer.nationalmap.gov/basic/#/ (accessed 25 January 2021).

76.

VentureBeat (2020) Autonomous vehicles should benefit those with disabilities, but progress remains slow. Available at: https://venturebeat.com/ai/autonomous-vehicles-disabilities-accessibility-inclusive-design/ (accessed 6 July 2023).

77.

Wakabayashi

(2018) Self-Driving Uber Car Kills Pedestrian in Arizona, Where Robots Roam. The New York Times . 19 March. Available at: https://www.nytimes.com/2018/03/19/technology/uber-driverless-fatality.html (accessed 5 February 2021).

78.

Whittaker

Alper

College

, et al. (2019) Disability, Bias, and AI.: 32.

Outlier bias: AI classification of curb ramps,outliers,and context

Abstract

Keywords

Introduction

Bias in the smart city

Classification, emergent consequences, and a right to be an outlier in the smart city

Methods

Study area and data

Machine learning

Random forest classification

PointCNN

Error and outlier analysis

Performance metrics

Contextually benchmarking

Qualitative analysis

Classifying curb ramps

Performance metrics

Contextual benchmarking

Triangulation (algorithmic context)

Spatial context

Spatial and algorithmic context

Discussion

Predictable design

Nuance

Conclusion

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

Notes

References