Not so ‘placeless’ after all. Understanding the spatial implications of the digital economy

Abstract

Empirical studies on the geography of digital economic activities are currently lacking. This is due to digital economic activities remaining largely undefined in official economic statistics. This paper introduces a novel empirical pipeline to examine the spatial characteristics of the digital economy while addressing the challenges of data missingness. Firstly, we identify digital economic activities by using commercial websites and Natural Language Processing (NLP). Secondly, we study the geographical distribution of digital firms, as well as the mechanisms causing this distribution, through considering firms as the realisation of a latent spatial Poisson Point Process. In doing so, this paper enhances our understanding of urban processes by combining data created through advanced NLP techniques with spatial econometrics. Focusing on the case of London, UK, we identify digital economic activities by applying contextualized weak supervision to text scraped from firms’ websites. By doing so, we showcase how websites can complement, or even substitute official economic statistics, leading to a more complete understanding of the digital economy. Using this data, we proceed to inferring the causes of firms’ locations in space: indeed, firms can locate in space for exogenous reasons, such as the presence of infrastructure and public incentives, or endogenous reasons, such as knowledge spillovers. We base our analysis on estimating the inhomogeneous K-function, quantifying the spatial dependency between firms. Our study reveals a tendency for digital firms to cluster due to the importance of in-person interactions that cannot be explained simply by exogenous factors such as the attractiveness of the location. To our knowledge, the paper provides the first empirical evidence uncovering the persisting relevance of space and face-to-face interactions in the digital economy, prompting reflections on the geographical footprints of digitality.

Keywords

urban dynamics urban analytics spatial data big data

Introduction

The digital economy is often portrayed by both mainstream media and senior technology figures as having the ability to empower traditionally marginalised entrepreneurial actors (Duffy and Hund 2015; Dy 2019). This is mostly due to longstanding assumptions about the neutrality and ‘placelessness’ of the digital, whereby the digital is seen as an affordable and universally accessible entrepreneurial setting (Dy et al., 2017; Leszczynski and Elwood 2015; Wahome and Graham 2020). Such ‘techno-optimistic’ claims found a recent revival during the COVID-19 pandemic, leading analysts to suggest that the rapid adoption of digital technologies, driven by their capacity to enable remote work, could bolster digital business models and reduce the significance of geographical proximity. While the clustering of economic activities is a topic of intense interest among academic circles and policy makers, the geographical distribution of digital economic activities remains surprisingly unclear. Indeed, the agglomeration of firms has been usually analysed for traditional manufacturing industries. There is one clear reason behind this enduring concentration on manufacturing industries: a substantial lack of data identifying digital firms (Nathan and Rosso 2015; Papagiannidis et al., 2018).

Gaining an understanding of whether the Internet and digitisation could counterbalance the urban ad-vantages associated with spatial proximity would play a crucial role for the economics literature, as well as for policymakers. As a matter of fact, inter-regional economic inequality in Europe has increased over the last decades (Iammarino et al., 2019). While most major cities are experiencing growth, many rural areas and smaller cities are grappling with economic stagnation or even decline. As a response to this trend, policymakers are advocating for investment in broadband infrastructure and digitisation, hoping that these measures will help offset the lack of agglomeration benefits in rural regions. The idea is that by reducing the significance of spatial proximity, digital technologies could potentially reduce inter-regional inequality. In light of these debates, discovering the spatial phenomena at work in the digital economy would provide a clear understanding of the role of cities versus the role of the Internet for entrepreneurial ecosystems. Such findings would also help demystifying discourses on the equalising power of the Internet, contributing to the ever growing literature on digital inequalities. As we previously noticed, providing such insights has been nearly impossible to date due to a substantial data missingness. Indeed, micro data on ‘digital firms’ – whatever their definition might be – is non-existent. Simply put, without better measures than current industrial classification taxonomies, researchers and policymakers lack the tools to fully understand the geography of the digital economy.

In recent years, increasing attention has been devoted to the drawbacks of existing industrial classification taxonomies for the improvement of business censuses (Dalziel et al., 2018). Most empirical proposals tapped into applying unsupervised Natural Language Processing (NLP) techniques to unstructured data sources such as commercial websites and firms’ social media profiles. Indeed, such new ways of classifying firms can provide us with a more precise identification of digital companies. This type of data can in turn be used to understand various characteristics of firms, between which they way they colocate (or not) in space. The availability of such micro-geographical data can therefore pave the way to the study of the distribution of firms and provide a quantitative base for industrial policy efforts. According to Haefner and Sternberg (2020, p. 10), this research gap ‘is a perfect arena for empirically skilled economic geographers’.

Given the above considerations, the main objective of this study is to move from theoretical intuitions about the benefits of clustering in the digital economy to a solid body of empirical evidence through the means of NLP. Specifically, we aim to answer the question: do digital firms exhibit patterns of true or apparent contagion? To achieve this, we proceed in three steps. First, we adopt a contextualized weak supervision framework to identify firms in the digital economy, addressing the data scarcity issue. In doing so, we leverage text extracted from websites as underutilized sources of urban data. Second, we explore the spatial features of the digital economy using extant methods from the spatial econometrics literature. Notably, we investigate whether digital economic activities exhibit clustering and how these patterns compare to those of ‘non-digital’ activities. Third, we contribute to discerning the motivation behind such clustering patterns. This is because firms could cluster due to both endogenous reasons, such as the need to interact closely, and exogenous reasons, such as to exploit favourable urban infrastructure. To do so, we apply an inhomogeneous transformation of Ripley’s K function allowing us to discern how digital urban production benefits from its urban context. Specifically, the distribution of firms in space can be modelled as a point pattern: the realisation of a stochastic point process within a bounded region (Diggle 2013). Within this framework, firms are considered clustered when their spatial concentration exceeds that of a benchmark distribution.

Our findings not only show that digital economic activities cluster similarly to non-digital ones but also reveal that this clustering stems from a genuine need for offline, face-to-face interaction. While these methods have been widely applied to micro-scale data in ecology, their use in analysing firm demography is more recent (Arbia et al., 2012; Duranton and Overman 2005; Marcon and Puech 2010). This analysis provides a novel perspective on economic spatial dynamics and offers data-driven policy insights, highlighting which factors may be less influential than previously assumed in driving digital economic growth and supporting entrepreneurial ecosystems. Methodologically, our approach to answer the research question, which relies on features extracted from web data, requires integrating two distinct fields: Natural Language Processing (NLP) and Spatial Econometrics. We aspire this multidisciplinary approach to encourage further expansion of the human geographers’ methodological toolkit (Fu 2024).

The remainder of the article is structured as follows. In Section 2, we provide an historical outlook on previous theoretical and empirical work on the implications of ICTs adoption for the clustering of firms in space. We highlight how, after dying down in the early 2000s, the COVID-19 pandemic revived these questions. Following the discussion of the literature, in Section 3 we describe in detail our dataset construction pipeline and the NLP methodology used for it. In Section 4, we examine the application of the inhomogeneous transformation of Ripley’s K-function to better understand the spatial dynamics of firms. Additionally, we explore how the proposed inferential framework for the inhomogeneous K-function can help disentangle whether the spatial distribution of firms is driven by exogenous or endogenous factors. In Section 5 we provide our empirical results showing that digital firms are still subjected to agglomeration forces and substantially providing the first empirical demystification of the long-standing narrative about the irrelevance of geography for digital enterprise. Section 6 summarises our results and lays out ideas for future work. In this same section we conclude with a discussion on the policy and methodological implications of our findings.

Industrial clustering and the digital economy

There is no unanimous definition of the digital economy. Among others, the most widely used concepts describing it are information economy, the knowledge economy and the weightless economy (Quah 1999). Within the literature, scholars in industrial production management and microeconomics have traditionally defined digital firms as those that trade only in intangible, electronic goods and services (Brynjolfsson and Kahin 2002). However, the digital economy has evolved significantly since then. With the pervasive integration of digital technologies across sectors, the distinction between digital and non-digital firms has become increasingly blurred. Today, digital inputs—ranging from cloud infrastructure to data analytics – are embedded in the operations of virtually all industries. As highlighted by Ash et al. (2018) and Capello and Lenzi (2021), it is now difficult to imagine any business that does not depend, in some form, on digital tools or infrastructures.

In this paper, we adopt a broader definition of the digital economy, one that emphasizes the centrality of intangibles (Corrado et al., 2022; Haskel and Westlake 2017). This perspective reflects a shift in the structure of production, where value creation increasingly relies on non-physical assets such as software, design, and intellectual property. While online retail of physical goods is often seen as a key component of the digital economy, identifying all businesses involved in e-commerce, whether directly or through intermediaries, falls beyond the scope of this study. Indeed, the main technical virtue of digital goods within our definition is their unlimited tradability and versatility, making them unaffected by transportation and reproduction and increasing supply-chain efficiency (Goldfarb and Tucker 2019). Discussions on the implications of the digitisation of capital on economic geography have a long history, mainly wondering whether distance still matters in an economy where transportation costs are near zero.

The spatial implications of digitalisation

Since the dawn of the Internet, there has been no shortage of predictions that the rise of digitisation would shape-shift global economic geography. Such predictions stem from the assumption that the digital trans-formation has the potential to expand opportunities, not only in regard to who can engage in innovation or entrepreneurship but also in terms of where individuals can make valuable contributions from (Nambisan et al., 2019). Arguments from more than 20 years ago, primarily led by the works of Sassen (1994), Cairncross (2002) and Castells (1996), emphasised the power of the Internet, arguing that we should expect ICT-related firms to locate without a significant spatial structure and consequently, we should prepare for the unavoidable disruption of the traditional economic role of cities. These predictions were based on the idea that, with the increased ability to communicate in real time with any location worldwide, the exchange of knowledge and information would rely less on physical flows within geographical spaces. Within these works lie media-worthy expressions such as ‘the end of geography’, and the ‘death of distance’. In such a cyber-utopianist wave, Negroponte (1995) even predicted the ‘end of the city’. Technology writer Gilder (1995) further claimed that cities were just ‘leftover baggage from the industrial era’ and that the.

Internet revolution would strongly implied moving towards Negroponte (1995)’s predictions. Perhaps even more relevantly for the purpose of this paper, Mitchell (1996) referred to the Internet as being ‘fundamentally and profoundly anti-spatial’. Similarly, through a more philosophical lens, Michell (2002) indirectly referred to the cyber-space as something inherently belonging to metaphysics.¹^,²

Concerning the geographical opportunities offered by the digital, if we neglect the unevenness of telecommunications access, informational goods and services may be regarded as Weber’s ubiquities (Weber 1965). These informational ubiquities encompass a diverse array of products and services that constitute a signifi-cant portion of the global economy. Such goods are readily accessible nearly anywhere on the planet, often even while travelling, as long as there is access to necessary computer equipment and broadband connectivity. Some examples include software, games, remote telephone consultations, marketing, various forms of digital design such as architectural or manufacturing projects, e-learning solutions, online retail, and a wide range of financial products. Given the ubiquity of digital economic products, it’s unsurprising that scholars have begun questioning whether those transforming and trading digitized information must also be ubiquitous in space. Building on this idea, many politicians now advocate for building better connectivity infrastructure, enhancing skills, and furthering digitization, believing these efforts could compensate for the lack of agglomeration externalities in rural areas. By reducing the importance of spatial proximity, the goal is to mitigate inter-regional inequality. In this context, Friedman (2002) argued that the expansion of trade, the internationalization of firms, and the rise of networking would create a ‘flat world’ – a level playing field where individuals are economically empowered regardless of their location.

The validity of all the arguments presented until this point was empirically disproven by a concomitant renewed interest in economic agglomeration, following the footprints of Marshall (1890). These studies demonstrated the existence of concentration phenomena in a wide range of sectors, such as information technologies (Saxenian 1996) and biotechnologies (Feldman, 2003). Echoing such findings, Florida (2002) wrote that ‘never has a myth been easier to deflate’, and that ‘not only do people remain highly concentrated, but the economy itself continues to concentrate in specific places’.

According to a consistent bulk of literature, one should expect digital firms to be spatially concentrated within big metropolitan areas. Such school of thought rejects the notion that rural regions will ‘level up’ through increased digitisation. Coherently with this line of scholarship, Rodr´ıguez-Pose & Crescenzi (2008, p. (2) stated that ‘numerous forces are coalescing in order to provoke the emergence of urban mountains where wealth, economic activity, and innovative capacity agglomerate. The interactions of these forces in the close geographical proximity of large urban areas give shape to a much more complex geography of the world economy’. Indeed, while digitisation has made it easier to transport goods and knowledge across long distances, the transmission of person-related and context-dependent tacit knowledge still relies heavily on face-to-face interaction (Morgan 2001). As a result, corporate learning predominantly takes place through trust-based personal encounters, often in locations characterised by a critical mass of human interaction, such as cities. Empirical observations seem to confirm this latter line of thought. For instance, notwithstanding the rising of the digitisation of society in the in the 1990s and 2000s, American ‘superstar cities’ like New York and San Francisco have also grew in population and economic importance (Gyourko et al., 2013). More recently, Balland et al. (2020) demonstrated that complex knowledge, and consequently complex economic activities, needs the richness of cities to thrive and do not transfer well through digital channels only. The co-occurence of centrifugal forces (which lead to reduced disparities) and centripetal forces (which result in increased disparities) in the realm of digitisation has been labelled as the ‘paradoxical geographies of the digital economy’ (Moriset and Malecki 2009).

The fact that what we could define as the ‘funeral of space’ never took place is not wildly surprising when considering existing theoretical literature on agglomeration spillovers, and more specifically knowledge ones. Indeed, knowledge spillovers are generated not only by physical space but also by what is referred to as the relational space (Sheppard 2002). Knowledge spillovers have long been recognized as crucial for studying innovation processes, with physical space playing a key role in fostering proximity, particularly through face-to-face contact. As noted by Storper and Venables (2004), this form of interaction offers several key advantages, between which enabling efficient communication, promoting socialization and fostering relationship building and knowledge sharing. Finally, it provides psychological motivation, as personal interactions often lead to greater commitment and engagement. While some of these benefits could also apply to modern video-calling technologies, face-to-face interactions have distinct advantages. They enable seamless communication through body language, and physical presence, making exchanges richer and more meaningful. As Storper and Venables (2004, p. 9) broadly describe it, face-to-face contact is akin to ‘being on stage and playing a role’.

The value of physical space has been explored in numerous empirical studies. In Audretsch and Vivarelli (1994), for instance, the effect of knowledge spillovers on innovation is measured through patents, revealing a significant effect of these spillovers on small-and medium-sized firms. The definition of spillovers used in this paper, however, is relatively narrow, focussing solely on physical proximity to universities or research centres. Autant-Bernard (2001) broaden the scope of spillovers to include proximity to a significant number of firms within the same sector. Their study also demonstrates a significant positive correlation between knowledge spillovers and the innovative performance of firms. While proximity to universities, research centres, and firms from both the same and different sectors is important, it is crucial to recognize the multifaceted nature of knowledge spillovers. The mere concentration of firms within a particular sector does not fully explain a region’s innovation. To comprehensively understand and leverage knowledge spillovers, it is essential to identify the channels through which these spillovers are transmitted and spread across a region. This is where the concept of relational space comes into play. Relational space includes all interactions among firms, institutions, and individuals, and is characterized by a strong sense of community and collaborative capabilities within the region (Capello and Faggian 2005). Despite these insights, some scholars remain skeptical about the necessity of face-to-face contact for knowledge spillovers. For example, Torre (2008, pg. 870) contended that ICTs enable ‘long-distance sharing or co-production of tacit knowledge’.

The scholarly evidence presented above might be enough make a reader skeptical about the ‘death of space’. Notwithstanding theoretical literature on the relational space, Florida’s arguments and Saxenian’s findings, however, the question of whether developments in Information Technology would lower the need for face-to-face interaction recently resurfaced during the COVID-19 pandemic (Budnitz and Tranos 2022; Chapple et al., 2022; Florida et al., 2023; Ramani and Bloom 2021). Indeed, the pandemic served as a massive, forced experiment in digital work, reviving predictions similar to those from the past. For example, a recent article from The Atlantic postulated ‘the decline of the coastal superstar cities’ and that ‘the next Silicon Valley is nowhere’ (Thompson 2021). The COVID-19 crisis has imposed additional challenges on businesses and their existing models. It was anticipated that this crisis would accelerate the adoption of digital technologies, primarily due to their ability to facilitate remote work. After the recovery from the pandemic, city planners are still not clear about whether urban centres are on track to return to their pre-Covid states or if they are in store for a permanent change (Ramani and Bloom 2021; Reades and Crookston 2021). Specifically, following (Florida et al., 2023, p. 1519) firms would make savings from ‘leasing less space and the efficiencies of employees using less time in meetings and chit-chat by the coffee machine or the water cooler’. Some individuals, in response to the necessity of avoiding public transportation, might be motivated to reside in neighbourhoods much closer to the city centre. This ironic outcome could potentially lead to the development of ‘live–work neighborhoods’, a concept that urban planners have advocated for decades (Florida et al., 2023).

Continuing on the same line as in Florida (2002), Florida et al. (2023) confirms that the clustering of talent and economic resources, along with face-to-face interactions, buzz, and a diverse environment found exclusively in cities, are essential for fostering innovation, creativity, and driving economic growth. In Florida et al. (2023)’s work, particular attention is devoted to the usual hyperbole of techno-optimists that emerged during the pandemic claiming that ‘everything will change’, or that everything will be better because of the latest technological advancements in video communications. Such claims relied on the assumption that ‘this time is different’, because this is the first pandemic to occur together with a widely available alternative to face-to-face work. As the authors notice, however, there is a long history of failed forecasts that ‘this time, distance is dead’. Indeed, over the past two centuries, urbanisation has surged in tandem with significant advancements in transportation and telecommunications capabilities (Leamer and Storper 2001). Yet, as summarised in Busch et al. (2021), urban areas persist in their ability to draw knowledge-intensive (digital) production by offering skilled labour, favourable ecosystems, and a consumer base. All in all, a thread pulls consistently in objections to the placelessness of the digital economy, that is, the increase in urbanisation.

Industrial clustering in the digital and previous empirical efforts

In the previous section, we discussed how (more or less utopistic) reflections over the ‘death of space’ managed to retain scholarly interest to this day. All these reflections are behind a growing body of scholarship, predominantly disagreeing with the space-flattening properties that continue to be attributed to the Internet. The gist of these works revolves around the concept of knowledge spillovers and emphasises the importance of physical proximity for economic development. Notwithstanding the importance of this literature, we note that it never specifically addresses the digital economy, or when it does, it tends to offer context-dependent definitions.

To our knowledge, one of the first empirical evidence of the relevance of spatial proximity in innovative high tech firm can be found in Anselin et al. (1997). While this work does not specifically focus on the digital economy per se, it still provides an account of the mechanisms through which agglomeration forces contribute to a higher equilibrium growth rate for both cities and regions, and it is often cited as a motivation for further work. The investigation found a significant causal relation between university research and R&D activity when geographical proximity is observed. Thus, the presence of universities is an endogenous factor causing the agglomeration of high technology firms. Furthermore, the authors contribute to the argument at the centre of this article over the difficulties posed by data missingness on innovative economic activities.

Looking at online content production, Sinai and Waldfogel (2004) found that densely populated areas contribute more to the Internet than their less populated counterparts. Furthermore, since people’s preferences are often regionally clustered and there is a higher likelihood of consuming local media, individuals in densely populated regions are more likely to access the Internet. On a related note, Blum and Goldfarb (2006) investigated the online behaviour of approximately 2,600 American Internet users on international websites. Their findings align with a well-established observation in trade literature: bilateral trade decreases with increasing distance (Disdier and Head, 2008; Overman et al., 2003). This principle extends to online behaviour, implying that, even for products or services with zero shipping costs people are more inclined to access websites from neighbouring countries rather than distant ones (Goldfarb and Prince 2008).

Focussing on firms classified at the 2-digit level, Caragliu et al. (2016) tested for the relevance of different types of externalities, namely spatial proximity, labour market pooling, specialisation, diversity and learning advantages for agglomeration. Their work finds that agglomeration externalities are stronger for technology intensive industry. For as much as their results are indicative of the persisting power of agglomeration externalities for potentially ‘spaceless’ industries, similarly to Anselin et al. (1997) the authors suggest that the availability of more granularly classified firm level microdata would benefit their findings.

Examining online crowdfunding within the music industry, Agrawal et al. (2015) presented additional evidence underscoring the significance of local social networks. The study demonstrated that musicians typically receive initial funding from local supporters whom they were acquainted with before joining the crowdfunding platform. Always relating to (urban) agglomeration externalities, Busch et al. (2021) investigated how production of digital goods in cities is supported by urban context factors. The study employs a qualitative interview approach on 10 firms in Germany, discovering that urban advantages are still crucial in the production of digital goods, eventually arguing that a decoupling between such firms and cities is ‘fairly unlikely’.

Concerning inner-city firm demography, and differently from all previously presented evidence, Ramani and Bloom (2021) recently showed how COVID-19 has induced substantial change to the organisation of work and economic activity. Specifically, the study reveals that in large U.S. metropolitan regions, the pandemic caused a noticeable shift in household preferences, business locations, and real estate demand away from central business districts towards lower-density suburbs and exurban areas. This pattern suggests that the largest metropolitan areas, which tend to be the most agglomerated, have experienced the most pronounced movement of economic activities out of their city centre. Ramani and Bloom (2021)’s findings are particularly interesting in the context of this paper, as a by-product of our findings will be the inner-city locational preferences of firms.

To our knowledge, the only study yet that has addressed the reasons behind the clustering of potentially intangible firms with a robust statistical methodology and using micro-data is Arbia et al. (2012). The paper investigates the spatial distribution of high-tech firms in Milan. While the methodology used in this study, particularly the inhomogeneous K-function, will form the basis of our analysis, we note that the authors categorize high-tech manufacturing firms as representative of potentially ‘placeless’ firms, a view with which we substantially disagree. Additionally, the study’s sample size is relatively limited. Nonetheless, akin to the findings of this paper, we anticipate that if the arguments about the death of distance were to hold true, companies should cluster due to exogenous factors rather than endogenous ones.

Dataset construction

For our empirical analysis, we build a dataset of geo-located digital firms in the Greater London area (UK) by classifying text from firms’ websites into two groups: digital and non-digital. We chose Greater London as an example for the study as this allows us to focus on where economic activity is most clearly concentrated in the UK. London is at the centre of the UK tech economy, and the largest and most dynamic agglomeration in the UK. To create our dataset, we rely on the FAME enterprise dataset (Van Dijk 2020). FAME is a proprietary data source widely used in business studies (see, for instance, its use in Afrifa et al., 2022; Rizov et al., 2022; Shapira et al., 2014), and it relies on information reported by the UK registrar of companies (Companies House), while also providing the URL to companies’ websites. While we acknowledge that FAME has been criticized for undersampling smaller firms, it remains the only academically accessible source that includes firm websites. To address these concerns, we provide more general information on FAME and comprehensive validations of the dataset within the Appendix.

Dataset preprocessing

For this experiment, we collect two datasets of companies registered in the NUTS2 regions of Inner and Outer London. The first includes all companies with a website and a Standard Industrial Classification (SIC) code indicating they are potentially digital. These SIC codes, listed in Table 1, were manually selected based on our definition of the digital economy and refined following OECD (2011) and Nathan and Rosso (2015). We focus on firms whose SIC codes suggest they produce digital goods or services as outputs, rather than merely using digital tools as inputs. This approach allows us to capture companies engaged in digital production, even when their classification may obscure that activity.

Table 1.

Digital economy activities selected for analysis list of 2007 SIC codes.

SIC	ICT services
58	Publishing activities, of which we use
58210	Publishing of computer games
58290	Other software publishing
61	Telecommunications, of which we use
61100	Wired telecommunications activities
61200	Wireless telecommunications activities
61300	Satellite telecommunications activities
61900	Other telecommunications activities
62	Computer programming, consultancy and related, of which we use
62011	Ready-made interactive leisure and entertainment software
62012	Business and domestic software development
62020	IT consultancy activities
62030	Computer facilities management activities
62090	Other information technology service activities
63	Information service activities, of which we use
63110	Data processing, hosting and related
63120	Web portals
	“Knowledge Economy”
64	Financial activities
65	Insurance activities
66	Scientific research
69	Legal, accounting and tax consultancy activities
70	Public relations and financial management
71	Architectural and engineering activities
72	Scientific research
73	Advertising and market research
74	Professional, scientific and technical activities, of which we use
74100	Specialised design activities
74203	Film processing
74300	Translation and interpretation activities
74901	Environmental consulting activities
74902	Quantity surveying activities
74909	Other professional, scientific and technical activities n.e.c
77	Renting and leasing activities
78	Employment agencies
79	Tourism activities
82	Business support services activities, of which we use
82110	Combined office administrative service activities
82200	Activities of call centres
82990	Other business support service activities n.e.c
85590	Other education n.e.c
85600	Educational support services
92000	Gambling and betting activities
96090	Other service activities n.e.c

The second dataset builds on the first by including all of its firms, plus those under SIC sections C: Manufacturing, F: Construction, G: Wholesale, and I: Accommodation and Food Service Activities. We exclude sections A (Agriculture, Forestry, and Fishing) and B (Mining and Quarrying), as these sectors are geographically constrained by natural resource availability. The complete list of digital and non-digital SIC codes is provided in the supplementary materials.

To ensure a one-to-one mapping between firms and websites, we exclude websites linked to multiple companies. Most of these are owned by legal entities – often subsidiaries sharing a single parent company and website – whose inclusion would artificially inflate firm concentration in a given area. We also remove all dormant, non-trading, or dissolved companies, as well as firms with a majority shareholder (>50%) that is not a natural person (e.g. legal entities). This helps ensure we capture firms genuinely operating in the UK, rather than shell entities registered for legal or tax purposes.

To study the spatial distribution of digital and non-digital firms, we focus on their trading addresses.

FAME provides both registration and trading addresses: the former is a legal requirement for incorporation, while the latter reflects where business activities are primarily conducted. However, as firms often list the same address for both fields, we validate trading addresses using location data extracted from firm websites. We scrape the HTML content of each firm’s landing page and extract postcodes using a regular expression based on the UK Government Data Standard³. If the postcode found on the website matches the one reported in FAME, we treat the trading address as valid. If it differs, we exclude the firm from the dataset. In cases where no postcode is found on the website, we assume the FAME entry is correct. Using this procedure, we identify and remove 6% of firms due to inconsistent postcodes.

We successively remove all postcodes associated with more than 100 firms, following the reasoning that it is highly unlikely for a single postcode to physically accommodate such a large number of companies (Stich et al., 2023). This results in the exclusion of 14 postcodes, covering a total of 2,245 firms.⁴ According to the ONS (Watkins 2020), this concentration may result from factors such as the rise of management and personal service companies, or firms registering addresses in prestigious locations. However, investigating the nature of companies in these postcodes lies beyond the scope of this study.

Websites’ classification

After this first round of data pruning, our next step is building a classifier in order to understand which, between the companies we selected as being potentially digital in nature, actually trades digital goods and services. To do so, we use a contextualised weak supervision framework (Mekala and Shang 2020). Said simply, weakly supervised learning aims at building predictive models by instilling domain knowledge on the data, in this case, through the use of seed words. Unlike previous research efforts that used fully unsupervised methods like topic modelling for firm classification (e.g. Bishop et al., 2022), weakly supervised learning offers the advantage of allowing the user to define both the number of classes and their semantics, rather than relying on the model to do so. In contrast to supervised techniques, weakly supervised classification techniques only require seeds in various forms, which alleviate the burden of human experts on annotating massive documents, especially in specific domains We find this classification method to be particularly suited for the industrial classification problem at hand, as researchers and policy makers alike might wish to define their own taxonomy of firms according to their research purposes. Moreover, while other studies might use self-collected datasets to train and test their web page classifiers, in the case at hand, a lack of a comprehensive test-bed makes it challenging to compare the accuracy of different web page classifiers (Hashemi 2020).

In this context, weak supervision is formulated by generating pseudo-labels for each class to predict, composed by indicative seed words. This process is complemented by contextualised embeddings, assigning to each token in the text a different interpretation according to its context. Despite individual word embeddings models such as word2vec (Mikolov et al., 2013) having been highly effective in previous applications, these word representations remain static, meaning that they lack contextual adaptability. Since 2018, Transformer-based text contextualization models, such as BERT, have revolutionized the NLP landscape. These models are highly effective for creating meaningful, computer-readable text representations, as they can disambiguate between different interpretations of the same word and provide a more holistic under-standing of text compared to traditional bag-of-words approaches (Devlin et al., 2019). We hereby present a simple example of contextualisation: suppose, in our experiment we want to differentiate between the word ‘cloud’ in ‘cloud computing infrastructure’ versus the word ‘cloud’ in ‘having your head in the clouds’.

In Table 2, we present an example of the algorithm at work for a firm offering software development services: at each iteration, the user-provided seedwords get contextualised and disambiguated. The classification based on initial labels is subsequently followed by seed-expansion, allowing us to identify words that are both discriminative and strongly indicative of a particular class, which can then be added to the set of seed words. A contextualised method allows us to have a higher level of confidence in determining which companies actually trade digital products, hence improving obsolete industrial classification taxonomies and allowing us to provide an analysis of an economy that was previously impossible to measure. Further, although the classification is inherently interpretable, given its foundation in human-readable tokens, the proposed approach further enhances the transparency and interpretability of the model’s output. We set up our problem as a binary classification task, juxtaposing digital versus non-digital companies. To evaluate model performance for the industrial classification task, both experiments were manually validated on a random 1% sample of the results (n = 866), obtaining an F₁ score of 0.77. As the classification procedure is not the core of the paper, we direct the reader to more information about the text classification procedure is provided in the supplementary materials.

Table 2.

Table showing the seedword expansion of the software.development.services class evolution through iterations.

	Seed words for class software.development.services
Iter	Starting seedwords	Expanded seedwords
0	Software, app, development, building, solutions
1		Software, app, development, building, solutions, support, application$0
2		Systems, solutions, oracle, management, technology, cloud$1, software, development, data$1, location$1
3		Systems, solutions, oracle, intelligence, management, data$0, compliance, risk, platform, technology, analytics, cloud$1, security, software, data$1

Following the extensive dataset construction procedures described above, composed of data collection, text scraping, postcode validation and text classification, we obtain a sample of 18,399 digital firms and a sample of 68,358 non-digital firms. Considering this as a further preprocessing steps, we check whether all the steps mentioned above introduced any industrial and geographical bias into our sample. The results of the validation are presented in the supplementary materials.

Methodological pipeline

Following the creation of the digital and non-digital firms microdata, we need to adopt an appropriate statistical framework in order to understand the distribution of firms in an inhomogeneous space. To account for this inhomogeneity, we rely on methods developed in the spatial point patterns literature. The analysis of point pattern has a long tradition of being used in ecology and epidemiology, explaining the spatial distributions of species and diseases (Baddeley et al., 2015; Bivand et al., 2008). The method we utilise here is based on the Inhomogeneous K function (Baddeley et al., 2000).

First of all, by inhomogeneous space we mean that the probability of observing a firm is not constant in space. This practically implies that the probability of a firm to locate at a given location is not constant, but instead varies systematically with both observable traits, such as land price or transport proximity, and unobservable traits, like ‘buzz’, indicating the chatter and interactions in offices, bars, and restaurants within urban areas serve as a form of information transmission, a group-based, self-generating ex-change of knowledge and information that occurs outside of formal collaboration.

Specifically, a spatially heterogeneous context gives rise to the phenomenon of apparent contagion, whereby firms will tend to cluster in a specific location. For instance, firms might cluster around useful infrastructure such as train stations, or around amenities such as restaurants and cafes. In other words, the contagion is only ‘apparent’ because firms are not influencing one another to make similar choices, that is, location preferences are not contagious. In the economics jargon, these can also be referred as ‘exogenous causes’. Statistically, spatial heterogeneity is captured through the first order intensity function of a spatial point pattern λ(x), where in this context x denotes the geographical coordinates of a point. As a consequence, λ(x)dx denotes the probability that an event locates in an infinitesimal region centred on x and with a surface area dx.

On the other hand, in a scenario of true contagion the presence of one specific point x in an area would be the main attraction for firms. An instance of this phenomenon is triggered by knowledge spillovers, and the relational space discussed in in paragraph 2.1. In the economics jargon, we call these ‘endogenous effects’. Statistically, true contagion is formalized by the second-order intensity function λ₂(x, y)dxdy, expressing the probability that two events locate in two infinitesimal regions centred in x and y and with surface areas dx and dy, respectively (Diggle et al., 2007). Hence, λ₂(x, y) serves as an indicator for the anticipated extra events situated in location y in comparison to a specific event situated in location x, thereby offering a measure of spatial dependence.

In its natural environment, the clustering of firms could arise from a combination of apparent and true contagion. As our research aim is to disentangle these two effects, we will need to adopt a method capable of achieving this distinction. In the next section, we discuss the inhomogeneous K functions and its usage in the analysis of firm demography.

Descriptive statistics

To the best of our knowledge, the Inhomogeneous K function has been rarely utilised in evaluating the spatial concentration of economic activities. The inhomogeneous K-function is a cumulative distance-based measure that adheres to most of the characteristics of a ‘good concentration index’, as defined by Marcon and Puech (2010, p. 2). Specifically, there are several advantages to adopting this function: it does not rely on arbitrary definitions of spatial units (such as provinces, regions, or states); it enables the assessment of statistical significance; it facilitates comparisons across different industries, and it assesses spatial heterogeneity.

The inhomogeneous K-function is a generalisation of Ripley’s K-function (Ripley 1976, 1977) to in-stances of non-stationary point processes. Practically, in a non-stationary point-process inhomogeneity is operationalised by replacing the constant intensity λ with an intensity function varying over space λ(x). This variation is used to assess the endogenous effects of the interaction among events while adjusting for the exogenous effects of the attributes of the study area. The function follows equation (1) below, where the term d_ij is calculated as a Euclidean distance⁵

{\hat{K}}_{I} (d, λ) = \frac{1}{| A |} \sum_{i = 1}^{n} \sum_{i \neq j} \frac{I_{d} (d_{i j} \leq d)}{λ (x_{i}) λ (x_{j})}

(1)

The functional form of the first-order intensity λ(x) is estimated non-parametrically through kernel smoothing, following equation (2) below. In equation (2), h representes the bandwidth, K(x) the Kernel function and x_j − x_i the Euclidean distance between each point. In practice, the values of the function were computed using the Gaussian kde function in SciPy (Virtanen et al., 2020)

{\hat{λ}}_{h} (x_{i}) = \frac{1}{2 h} \sum_{i \neq j} K (\frac{x_{j} - x_{i}}{h})

(2)

In order to disentangle the contribution due to spatial heterogeneity or spatial dependence, we tackle the problem following Arbia et al. (2012), Krugman (1991) and Duranton and Overman (2005). Specifically, we formally assess the validity of the empirically observed inhomogeneous K-function value via building the confidence intervals for the null hypothesis of absence of endogenous effects. We estimate λ(x), the first order intensity function, on a different point pattern in which the locations represent the heterogeneity of the study area, in this case the ‘non-digital’ firms described in Section 3. Hence, in the analysis of the spatial interaction between digital firms, we estimate λ(x) on the spatial locations of all selected firms (digital and non-digital). As in a case-control study, in this approach the location of all firms constitutes the control data, while the location of digital firms constitute the case data, used for the estimation of the inhomogeneous K function. Following the epidemiological literature (specifically, see Schulz and Grimes (2002)) the control group contains instances of cases as it should be representative of the population at risk of developing an illness and being cases, or, in this context, ‘at risk’ of being digital. This design will allow us to measure the extra-concentration of digital firms with respect to all firms.

To make associations within points more easier to inspect, especially in instances with high estimator variance, Baddeley et al. (2000) suggests transforming the K-function to the L-function, as proposed in Besag (1977). The transformation is shown in equation (3) below

L_{(r)} \sqrt{\frac{K (r)}{π}}

(3)

In order to evaluate the statistical significance of the values of K^ˆ_I (d) (or L^ˆ_I (d)), we follow the inferential framework proposed in Arbia et al. (2012) and Baddeley et al. (2014). Specifically, as we do not know the exact distribution of K^ˆ_I (d), we cannot theoretically evaluate its variance. Consequently, we construct the confidence interval for the null hypothesis of absence of spatial interactions based on Monte Carlo simulations. To adjust for potential edge effect, we get a bigger hull (overscan) for simulations following Haase (1995). We provide more technical detail on the computational implementation of Baddeley’s Inhomogeneous K function in the Technical Appendix.

Inference

To attain inferential evaluation of the findings, we follow the framework proposed in Arbia et al. (2012). In practice, we simulate 999 inhomogeneous Poisson processes with the same number of points as the observed pattern. These simulations follow a probability density function that is proportional to the first-order intensity, denoted as λ(x), which we estimate within the study area. We estimate the function λ(x) using the spatial locations of all non-digital firms. Under this approach, all firm locations are part of the control data, while the functionals denoted as K^ˆ_I (d), estimated on the identified digital firms, constitute the case data. In essence, similar to a case-control design, we evaluate the excess concentration of digital firms over the overall distribution of all firms. This methodology is commonly employed in spatial epidemiology, where cases represent points related to a specific disease, and controls reflect the spatial distribution of the population, as discussed in Diggle and Rowlingson (1994).

Subsequently, we calculate an L^ˆ_I (d) function for each simulation. This process allows us to derive ap-proximate n/(n + 1)×100% confidence envelopes based on the highest and lowest values of the LˆI (d) functions obtained from the 999 simulations under the null hypothesis. Hence, these envelopes correspond to 99.9% confidence under the null hypothesis. Consequently, if the empirical LˆI (d), at a particular distance d, falls outside these envelopes, it indicates a significant deviation from the null hypothesis.

We employ a thinning sampler, which involves accepting or rejecting a Bernoulli variable based on the probability density calculated from λ^ˆ(x)/λ₀, where λ₀ corresponds to the maximum value of the estimated first-order intensity function λ^ˆ(x). In Figure 1, we present a schematic view of our working pipeline combining our data collection, classification and analysis. For more information on how to implement a thinning sampler, please refer to the technical Appendix.

Figure 1.

Our proposed framework of data preprocessing, classification, and analysis. We start by collecting firms’ websites’ urls, scraping their landing page, cleaning their text and geo-referencing the text. We then classify firms in digital and non-digital ones through applying a weak supervision algorithm to contextualised embeddings of websites’ text. Once we obtained our two distinct datasets of firms, we analyse the data through the simulating companies location patterns.

Results and discussion

The Gaussian kernel density estimates for the point processes of digital and non-digital firms are shown in Figure 2. The Gaussian KDE is computed with SciPy using Scott’s Rule for optimal bandwidth estimation.⁶ This first descriptive set of results shows that firms intensities for digital and non-digital firms in London are almost equal, with a higher concentration in the central boroughs (Camden, City of Westminster, City of London). For both types of firms, we find pockets of significant concentration more peripheral boroughs such as Kingston and Richmond upon Thames (South-West), Harrow and Barnet (North-West), Enfield, Waltham Forest and Redbridge (North/North-East) and Croydon (South). Oppositely, we also observe lower densities in East London boroughs such as Havering and Barking and Dagenham. A reader who is informed about the socio-economics of London, will not find these results particularly surprising. Indeed, East London boroughs are some of the most deprived of the whole UK (Smith et al., 2015). Turning back to our research question, the observed densities are coherent with our literature informed theoretical expectations, in that we do not find striking differences when comparing the distribution in space of digital and non-digital companies. These results serve as a first indication of the exaggerated ‘space-flattening’ properties attributed to the digital: if space was so irrelevant to entrepreneurship in the digital, we would expect a smoother distribution across London.

Figure 2.

Non-parametric Gaussian kernel smoothing estimate of the intensity of our two datasets of firms within the boundaries of the NUTS 2 Greater London region. The bandwidth parameter is set to h = 1 km. (a) KDE of digital firms. (b) KDE of non-digital firms.

Figure 3 showcases our second set of results, and one of our main contributions to the literature. The figure displays the behaviour of the L^ˆ_I (d) function at various distances d for firms producing digital goods. Within the figures we also display as dashed lines the confidence envelopes referred to the null hypothesis of absence of interaction at a significance level α = 0.01. As explained while discussing our methodology in Section 4, the values of L^ˆ_I (d) transformation are essentially interpretable in graphical terms. This means that when the value of function L^ˆ_I at distance d is above the confidence envelopes, we observe a significant level of spatial concentration. Conversely, if the values assumed by L^ˆ_I were to be under the envelopes, we would observe significant spatial dispersion. The function clearly indicates spatial interaction for digital firms. Specifically, the results show the presence of non-statistically significant clustering at lower distances (<5 km), and of highly statistically significant clustering at higher distances (>10 km). In practice, this means that digital firms don’t really cluster along a street or in a single city block, but cluster within specific districts or areas, or in this case boroughs.

Figure 3.

Behaviour of the estimated inhomogeneous L-functions (solid line) and of the corresponding 99.9% confidence envelopes under the null hypothesis (dashed lines) of our identified digital firms.

Reflecting on the results presented in Figure 3 allows us to disentangle the whether the clustering of digital firms observed in Figure 2 is motivated by firms need for physical interaction or by specific characteristics of the area they are located in. As the L^ˆ_I function quantifies the amount of spatial concentration that cannot be explained by exogenous factors such as presence of infrastructure, cultural amenities or regulatory incentives, we can infer that digital firms cluster due to a genuine need to interact between each other. These findings are coherent with our working hypothesis, motivated by advancements in the literature demonstrating that the digital would have not meant the abandonment of physical interactions between firms. In other words, our empirical findings confirm that agglomeration externalities might still play a role in the distribution of potentially ‘placeless’ industries, suggesting that the Internet and digitisation are yet to counterbalance the urban advantages associated with spatial proximity.

Overall, our results provide a strong empirical contribution to the demystification of discourse portraying the digital economy as providing opportunities notwithstanding geography. In doing so, it empirically confirms narratives on the extant value of the ‘relational space’ in the digital, confirming the role of urban areas in the attraction of knowledge-intensive production (Busch et al., 2021; Capello and Faggian 2005; Florida et al., 2023). While our results are in line with recent findings by Balland et al. (2020), we wish to highlight how our study proposes a causal and more granular, within-city approach. Although our results highlight the importance of endogenous agglomeration externalities for digital industries, we cannot make overly strong claims about the phenomenon. This is because our data could have been processed at a higher level of granularity, that is, we are yet to test our claims for different types of digital firms. Indeed, different types of firms produce different typologies of knowledge, and might consequently exhibit different patterns of agglomeration (Asheim et al., 2007; Storper and Venables 2004). We hence consider this paper and its result as a starting point for researchers to analyse the clustering of different types of digital firms and further contribute to the literature on the ‘paradoxical geographies of the digital’ (Moriset and Malecki 2009). We believe that adopting this approach contributes to clarifying the need for cluster policy in digital industries, as well as demonstrating how spatial microdata can help with the analysis firm demography, their location, and the impact that this has on their survival in the longterm. Considering the TechCity cluster in Shoreditch as a case study, Nathan (2022) proposed a causal interpretation of whether cluster policy has an impact, highlighting that although such policies are popular among policymakers, we know surprisingly little about their effectiveness. The study finds that digital content firms experienced higher revenue and more high-growth activity in the post-policy period than firms in comparable areas. Taken together with our own findings, this suggests that clustering effects remain significant even in sectors characterized by intangible outputs.

Accordingly, we argue that targeted policies to support urban clusters – such as investment in co-working infrastructure, support for local knowledge-sharing ecosystems, and place-based innovation incentives – can be effective tools for fostering digital and knowledge-intensive industries. In addition, our methodological pipeline provides a scalable way to monitor and detect emerging clusters in near real-time, particularly where traditional economic statistics fall short. These findings offer policymakers both empirical grounding and actionable insights for designing more responsive, spatially aware strategies in the digital economy.

Finally, the results obtained in this experiment emphasize the value of adopting multidisciplinary research pipelines in geography. Indeed, while our experiments are heavily influenced by the work of Arbia et al. (2012), had we not adopted an NLP framework to identify digital firms, we would have not been able to provide an analysis of the digital economy effectively. Contrarily to our work, Arbia et al. (2012) equates placeless industries are equated to ‘high tech’ manufacturing ones. In proposing such multidisciplinary working framework we also answer Anselin et al. (1997)’s call advocating for the usage of micro-data in the investigation of economic clustering. Moreover, we believe to have practically confirmed Haefner and Sternberg (2020, p. 10)’s claims on how this field of research is ‘a perfect arena for empirically skilled economic geographers’. By doing so, our hope is to further motivate geographers to engage in problems spanning over diverse methodological areas.

Conclusions and suggestions for further research

In this paper, we tackled two main objectives. First, we clearly demonstrated that digital and non-digital firms in the Greater London area showcase similar patterns of agglomeration. Second, we found that such clustering is motivated by true contagion, that is, firms need to interact with each other. As discussed within the literature review, this might be motivated by the power of knowledge externalities. By investigating these phenomena, our aim was to further demystify digital transformation narrative suggesting that digital technologies will contribute to the spreading of firms regions. Further, this work contributes to governmental and wider policy work towards the understanding of the benefits of clustering policies for levelling up. Our findings strengthen the case for policies that support clustering as a driver of innovation and regional development, even in the digital economy. We situate our work in an extensive theoretical framework touching upon the main reasons causing industrial clustering. We notice how most of the previous empirical literature on the spatial footprints of the digital economy did not approach the problem with microdata, and when it did, it equated digital firms with tech-manufacturing ones. To solve this problem, we based our analysis on digital economic microdata identified through contextualised weak supervision applied to firms website data. By doing so, we demonstrate the potential of internet trace data (in this case, websites) can expand our understanding of firm demographies. To analyse such data, in this paper we have utilised a set of tools proposed in spatial statistical literature, ultimately allowing us to disentangle the motivations behind firms clustering in a city. These tools belong to the family of K-functions and fall within the domain of Point Pattern Analysis. They were first introduced in the epidemiological context in the seminal work of Diggle et al. (1995) and, to the best of our knowledge, this is the second time (after the work of Arbia et al., 2012) that they have been used in economic geography.

While the study is focused on London, leveraging the availability of rich administrative data and firm-level website information, the framework and methods we propose are broadly transferable to other urban contexts. Importantly, the proposed classification approach relying on firm website content and weak super-vision via seed words, is adaptable to different regional settings where similar data is available. However, we acknowledge that the patterns of digital firm formation and spatial clustering may vary with local institutional, economic, and infrastructural conditions. Specifically, our classification deliberately excludes firms that offer both digital and non-digital outputs, as these ‘mixed’ firms pose challenges both conceptually and empirically. Thus, this study should be seen as a first step, one that opens the way to deeper investigations into the structure, dynamics, and spatial footprints of the digital economy across geographies. Much remains to be uncovered about its full structure, dynamics, and spatial footprints, particularly across different geographic contexts. It is crucial to note that this work was conducted prior to the emergence of open-source, commercially available large language models. Since then, new methods have been proposed holding the promise to improve large-scale unsupervised text mining with minimal human effort (see, e.g. Wan et al., 2024). We believe that applying such approaches to firm-level text data has the potential to generate more accurate and granular taxonomies, thereby enabling a finer understanding of firm localization processes at the micro-scale. Such advances would also provide policymakers with more precise insights to design place-based industrial and innovation policies.

Supplemental Material

Supplemental Material - Not so ‘placeless’ after all. Understanding the spatial implications of the digital economy

Supplemental Material for Not so ‘placeless’ after all. Understanding the spatial implications of the digital economy by Giulia Occhini, Levi John Wolf, Emmanouil Tranos in Environment and Planning B: Urban Analytics and City Science.

Footnotes

ORCID iDs

Giulia Occhini

Levi John Wolf

Emmanouil Tranos

Funding

Giulia Occhini’s Doctoral Studentship is funded by The Alan Turing Institute via EPSRC grant EP/N510129/1.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Supplemental Material

Supplemental material for this article is available online.

Notes

Author biographies

Giulia Occhini is a PhD student at the University of Bristol, funded by The Alan Turing Institute.

Levi John Wolf is an Associate Professor of Spatial Analysis at the University of Bristol.

Emmanouil Tranos is a Professor of Quantitative Human Geography at the University of Bristol.

References

Afrifa

Amankwah-Amoah

Yamoah

, et al. (2022) ‘Regional development, innovation systems and service companies’ performance’. Technological Forecasting and Social Change 174: 121258.

Agrawal

Catalini

Goldfarb

(2015) ‘Crowdfunding: geography, social networks, and the timing of investment decisions’. Journal of Economics and Management Strategy 24(2): 253–274.

Anselin

Varga

Acs

(1997) ‘Local geographic spillovers between university research and high technology innovations’. Journal of Urban Economics 42(3): 422–448.

Arbia

Espa

Giuliani

, et al. (2012) ‘Clusters of firms in an inhomogeneous space: the high-tech industries in milan’,. Economic Modelling 29(1): 3–11.

Ash

Kitchin

Leszczynski

(2018) ‘Digital turn, digital geographies?’. Progress in Human Geography 42(1): 25–43.

Asheim

Coenen

Vang

(2007) ‘Face-to-face, buzz, and knowledge bases: sociospatial implications for learning, innovation, and innovation policy’. Environment and Planning C: Government and Policy 25(5): 655–670.

Audretsch

Vivarelli

(1994) ‘Small firms and r&d spillovers: evidence from italy’. Revue d’´economie industrielle 67(1): 225–237.

Autant-Bernard

(2001) ‘The geography of knowledge spillovers and technological proximity’. Economics of Innovation and New Technology 10(4): 237–254.

Baddeley

Møller

Waagepetersen

(2000) ‘Non-and semi-parametric estimation of interaction in inhomogeneous point patterns’. Statistica Neerlandica 54(3): 329–350.

10.

Baddeley

Diggle

Hardegen

, et al. (2014) ‘On tests of spatial pattern based on simulation envelopes’. Ecological Monographs 84(3): 477–489.

11.

Baddeley

Rubak

Turner

(2015) Spatial Point Patterns: Methodology and Applications with R. CRC Press.

12.

Balland

P-A

Jara-Figueroa

Petralia

, et al. (2020) ‘Complex economic activities concentrate in large cities’. Nature Human Behaviour 4(3): 248–254.

13.

Besag

(1977) ‘Discussion of the paper by ripley (1977)’. Journal of the Royal Statistical Society: Series B 39: 193–195.

14.

Bishop

Mateos-Garcia

Richardson

, et al. (2022) Using text data to improve industrial statistics in the uk, technical report, economic statistics centre of excellence (ESCoE).

15.

Bivand

Pebesma

G´omez-Rubio

, et al. (2008) Applied Spatial Data Analysis with R. Springer.

16.

Blum

Goldfarb

(2006) ‘Does the internet defy the law of gravity?’. Journal of International Economics 70(2): 384–405.

17.

Brynjolfsson

Kahin

(2002) Understanding the Digital Economy: Data, Tools, and Research. MIT press.

18.

Budnitz

Tranos

(2022) ‘Working from home and digital divides: resilience during the pandemic’. Annals of the Association of American Geographers 112(4): 893–913.

19.

Busch

H-C

Mu¨hl

Fuchs

, et al. (2021) ‘Digital urban production: how does industry 4.0 reconfigure productive value creation in urban contexts?’. Regional Studies 55(10-11): 1801–1815.

20.

Cairncross

(2002) ‘The death of distance’. RSA Journal 149(5502): 40–42.

21.

Capello

Faggian

(2005) ‘Collective learning and relational capital in local innovation processes’. Regional Studies 39(1): 75–87.

22.

Capello

Lenzi

(2021) The Regional Economics of Technological Transformations: Industry 4.0 and Servitisation in European Regions. Routledge.

23.

Caragliu

de Dominicis

de Groot , et al. (2016) ‘Both marshall and jacobs were right!’. Economic Geography 92(1): 87–111.

24.

Castells

(1996) ‘The space of flows’. The rise of the network society 1: 376–482.

25.

Chapple

Leong

Huang

, et al. (2022) ‘The death of down-town? pandemic recovery trajectories across 62 north american cities’.

26.

Corrado

Haskel

Jona-Lasinio

, et al. (2022) ‘Intangible capital and modern economies’. The Journal of Economic Perspectives 36(3): 3–28.

27.

Dalziel

Yang

Breslav

, et al. (2018) ‘Can we design an industry classification system that reflects industry architecture?’. Journal of Enterprise Transformation 8(1-2): 22–46.

28.

Devlin

Chang

M. W.

Lee

Toutanova

(2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologiesand short papers), volume 1 (long and short papers) (pp. 4171-4186

29.

Diggle

(2013) Statistical Analysis of Spatial and spatio-temporal Point Patterns. CRC Press.

30.

Diggle

Rowlingson

(1994) ‘A conditional approach to point process modelling of elevated risk’. Journal of the Royal Statistical Society: Series A 157(3): 433–440.

31.

Diggle

Chetwynd

H¨aggkvist

, et al. (1995) ‘Second-order analysis of space-time clustering’. Statistical Methods in Medical Research 4(2): 124–136.

32.

Diggle

G´omez-Rubio

Brown

, et al. (2007) ‘Second-order analysis of inhomogeneous spatial point processes using case–control data’. Biometrics 63(2): 550–557.

33.

Disdier

A-C

Head

(2008) ‘The puzzling persistence of the distance effect on bilateral trade’. The Review of Economics and Statistics 90(1): 37–48.

34.

Duffy

Hund

(2015) “Having it all” on social media: entrepreneurial femininity and self-branding among fashion bloggers’. Social Media + Society 1(2): 2056305115604337.

35.

Duranton

Overman

(2005) ‘Testing for localization using micro-geographic data’. The Review of Economic Studies 72(4): 1077–1106.

36.

(2019) ‘Levelling the playing field? towards a critical-social perspective on digital entrepreneurship’. Futures 19(1): 114–124.

37.

Marlow

Martin

(2017) ‘A web of opportunity or the same old story? women digital entrepreneurs and intersectionality theory’. Human Relations 70(3): 286–311.

38.

Feldman

(2003) The locational dynamics of the US biotech industry: knowledge externalities and the anchor hypothesis. Industry and innovation 10(3): 311–329.

39.

Florida

(2002) ‘The Rise of the Creative Class: And How It’s Transforming Work, Leisure, Community and Everyday Life’. Hazard Press.

40.

Florida

Rodr´ıguez-Pose

Storper

(2023) ‘Critical commentary: cities in a post-covid world’. Urban Studies 60(8): 1509–1531.

41.

Friedman

(2002) The World is Flat: A Brief History of the twenty-first Century. Farrar, Straus and Giroux New York, Vol. 19.

42.

(2024) ‘Natural language processing in urban planning: a research agenda’. Journal of Planning Literature 08854122241229571.

43.

Gilder

(1995) Forbes ASAP. February 27. Quoted in Mitchell Moss, “Technology and Cities. Cityscape 3: 3.

44.

Goldfarb

Prince

(2008) ‘Internet adoption and usage patterns are different: implications for the digital divide’. Information Economics and Policy 20(1): 2–15.

45.

Goldfarb

Tucker

(2019) ‘Digital economics’. Journal of Economic Literature 57(1): 3–43.

46.

Gyourko

Mayer

Sinai

(2013) ‘Superstar cities’. American Economic Journal: Economic Policy 5(4): 167–199.

47.

Haase

(1995) ‘Spatial pattern analysis in ecology based on ripley’s k-function: introduction and methods of edge correction’. Journal of Vegetation Science 6(4): 575–582.

48.

Haefner

Sternberg

(2020) ‘Spatial implications of digitization: state of the field and research agenda’. Geography Compass 14(12): e12544.

49.

Hashemi

(2020) Web page classification: a survey of perspectives, gaps, and future directions. Multimedia Tools and Applications 79: 1–25.

50.

Haskel

Westlake

(2017) Capitalism Without Capital, in ‘Capitalism Without Capital’. Princeton University Press.

51.

Iammarino

Rodr´ıguez-Pose

Storper

(2019) ‘Regional inequality in europe: evidence, theory and policy implications’. Journal of Economic Geography 19(2): 273–298.

52.

Krugman

(1991) Geography and Trade. MIT Press.

53.

Leamer

Storper

(2001) ‘The economic geography of the internet age’. Journal of International Business Studies 32: 641–665.

54.

Leszczynski

Elwood

(2015) ‘Feminist geographies of new spatial media’. Canadian Geographies / Géographies canadiennes 59(1): 12–28.

55.

Marcon

Puech

(2010) ‘Measures of the geographic concentration of industries: improving distance-based methods’. Journal of Economic Geography 10(5): 745–762.

56.

Marshall

(1890) Principles of economics. Macmillan and Co.

57.

Mekala

Shang

(2020) Contextualized weak supervision for text classification. In: ‘Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics’, pp. 323–333. Association for Computational Linguistics.

58.

Michell

TWH

(2002) The Psychasthenia of Deep Space. Evaluating the Reassertion of Space in Critical Social Theory. University of London University College London.

59.

Mikolov

Chen

Corrado

Dean

(2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 .

60.

Mitchell

(1996) City of Bits: Space, Place, and the Infobahn. MIT press.

61.

Morgan

(2001) ‘The exaggerated death of geography: localised learning, innovation and uneven devel-opment’. Technology 20: 23.

62.

Moriset

Malecki

(2009) ‘Organization versus space: the paradoxical geographies of the digital economy’. Geography Compass 3(1): 256–274.

63.

Moss

(1998) ‘Technology and cities’. Cityscape: 107–127.

64.

Nambisan

Wright

Feldman

(2019) ‘The digital transformation of innovation and entrepreneur-ship: progress, challenges and key themes’. Research Policy 48(8): 103773.

65.

Nathan

(2022) ‘Does light touch cluster policy work? evaluating the tech city programme’. Research Policy 51(9): 104138.

66.

Nathan

Rosso

(2015) ‘Mapping digital businesses with big data: some early findings from the UK’. Research Policy 44(9): 1714–1733.

67.

Negroponte

(1995) Being digital. Alfred A. Knopf.

68.

OECD (2011) OECD Guide to Measuring the Information Society 2011. https://www.oecd-ilibrary.org/content/publication/9789264113541-en

69.

Overman

Redding

Venables

(2003) The economic geography of trade, production, and income: a survey of empirics. Handbook of international trade,: 350–387.

70.

Papagiannidis

See-To

Assimakopoulos

, et al. (2018) ‘Identifying industrial clusters with a novel big-data methodology: are SIC codes (not) fit for purpose in the internet age?’. Computers & Operations Research 98: 355–366.

71.

Quah

(1999) The weightless economy in economic development (Centre for Economic Performance Discussion Paper No. 417). London School of Economics and Political Science.

72.

Ramani

Bloom

(2021) The Donut effect of COVID-19 on cities (No. w28876). National Bureau of Economic Research.

73.

Reades

Crookston

(2021) Why face-to-face Still Matters: The Persistent Power of Cities in the Post-pandemic Era. Policy Press.

74.

Ripley

(1976) ‘The second-order analysis of stationary point processes’. Journal of Applied Probability 13(2): 255–266.

75.

Ripley

(1977) ‘Modelling spatial patterns’. Journal of the Royal Statistical Society - Series B: Statistical Methodology 39(2): 172–192.

76.

Rizov

Vecchi

Domenech

(2022) ‘Going online: forecasting the impact of websites on produc-tivity and market structure’. Technological Forecasting and Social Change 184: 121959.

77.

Rodr´ıguez-Pose

Crescenzi

(2008) ‘mountains in a flat world: why proximity still matters for the location of economic activity’, Cambridge journal of regions. Economy and Society 1(3): 371–388.

78.

Sassen

(1994) Cities in a World Economy. Thousand Oaks, CA: Pine forge’.

79.

Saxenian

(1996) Regional Advantage: Culture and Competition in Silicon Valley and Route 128, with a New Preface by the Author. Harvard University Press.

80.

Schulz

Grimes

(2002) ‘Case-control studies: research in reverse’. Lancet (London, England) 359(9304): 431–434.

81.

Shapira

G¨ok

Klochikhin

, et al. (2014) ‘Probing “green” industry enterprises in the uk: a new identification approach’. Technological Forecasting and Social Change 85: 93–104.

82.

Sheppard

(2002) ‘The spaces and times of globalization: place, scale, networks, and positionality’. Economic Geography 78(3): 307–330.

83.

Sinai

Waldfogel

(2004) ‘Geography and the internet: is the internet a substitute or a complement for cities?’. Journal of Urban Economics 56(1): 1–24.

84.

Smith

Noble

, et al. (2015) ‘The English Indices of Deprivation 2015’. Department for Communities and Local Government, 1–94.

85.

Stich

Tranos

Nathan

(2023) ‘Modeling clusters from the ground up: a web data approach’. Environment and Planning B: Urban Analytics and City Science 50(1): 244–267.

86.

Storper

Venables

(2004) ‘Buzz: face-to-face contact and the urban economy’. Journal of Economic Geography 4(4): 351–370.

87.

Thompson

(2021) ‘Superstar cities are in trouble’. https://www.theatlantic.com/ideas/archive/2021/02/remote-work-revolution/617842/

88.

Torre

(2008) ‘On the role played by temporary geographical proximity in knowledge transmission’. Regional Studies 42(6): 869–889.

89.

van Dijk

. (2020) ‘FAME — UK and Ireland Company Data’.

90.

Virtanen

Gommers

Oliphant

, et al. (2020) ‘SciPy 1.0: fundamental Algorithms for Scientific Computing in Python’. Nature Methods 17: 261–272.

91.

Wahome

Graham

(2020) ‘Spatially shaped imaginaries of the digital economy’, information. Information, Communication & Society 23(8): 1123–1138.

92.

Wan

Safavi

Jauhar

, et al. (2024) Tnt-llm: text mining at scale with large language models. In: ‘Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining’, pp. 5836–5847. Association for Computing Machinery.

93.

Watkins

(2020) ‘Multiple business registrations at a single postcode: 2020’. https://www.ons.gov.uk/businessindustryandtrade/business/activitysizeandlocation/methodologies/multiplebusinessr

94.

Weber

(1965) ‘Theory of the location of industries, 1909’. CSISS Classics. https://escholarship.org/uc/item/1k3927t6

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.83 MB