Abstract
The study of the built environment is evolving with digital advancements and the emergence of a big data era, opening up new possibilities for planning practice and research. However, the integration of digital tools in research and practice calls for consideration of methodological questions. In this article, we compare two studies. One is our own ‘small data’ case study research and the other is a ‘big data’ approach recently published in this journal. Both studies discuss housing space standards – internal floorspace – in the context of a deregulated planning policy known in England as ‘permitted development’ (PD) relating to office-to-residential conversion schemes. Not only do these studies differ methodologically but also in results: our own case studies found that the majority of PD housing units do not meet recommended space standards. This finding is consistent with other in-depth studies on the same issue but is contradicted by the big data study that yielded different results. By reference to example conversion schemes, we argue that to understand the space standards issue, a more in-depth small data approach is more reliable than relying solely on secondary data sets and a big data approach. This illustrates a need for wider debate as to when big data is beneficial and when it can be misleading, particularly if being utilised to criticise evidence from alternate, more detailed data approaches. We conclude that it is crucial that academic discussions on different methodological approaches are conducted with respect, openness and transparency regarding the suitability of different approaches.
Introduction: A time of change in data and research methodologies?
The study of the built environment and associated processes by human geographers and planning scholars is undergoing a time of change. Although planning practice has long involved quantitative data as part of the survey-analysis-plan process, planning scholars have tended to make greater use of qualitative approaches over recent decades following human geographers in the qualitative revolution.
This is, however, changing. In planning practice, a UK government policy proposal ‘white paper’ suggested a need to ‘take a radical, digital-first approach to modernise the planning process. This means moving from a process based on documents to a process driven by data’ (MHCLG, 2020: 21). There is argued to be a digital revolution and that ‘big data’ and digital technology will open-up new possibilities to examine planning questions, with better data helping to unlock the potentials of spatial planning (Batty and Wang, 2022).
In academia, university geography courses are increasingly including quantitative or data-led streams and, as Franklin (2022) has written, there has been a ‘rapid expansion’ of the range of methods in quantitative human geography with growth in ‘data types and provenance but also related to subtle changes in the types of research done in the field’ (p. 689). Meanwhile, Kandt and Batty (2021) argue that the analysis of big data is defining a new era in urban research and planning, governance and policy. This reflects a broader ‘computational turn’ in thought and research (Burkholder, 1992 in boyd and Crawford, 2012) with a wave of enthusiasm for big data sweeping through the social sciences. Indeed, a data revolution is underway with data science methods ‘ever more broadly adopted and deeply entrenched at universities’ (Tanweer et al., 2021: 2).
Although the emergence of the socio-technological phenomenon that is this big data era has been widely commented upon, we believe it remains important to question ‘the assumptions, values and biases of this new wave of research’ (boyd and Crawford, 2012: 675) and the implications of utilising at times decontextualised information as a form of data short-cut. This is particularly timely given calls for digitally enabled planning which say very little about data quality or the consequences for scholarship where there are areas of inherent uncertainty in the approaches involved.
In this article, we highlight why we think this is important by considering the case of space standards – the internal floorspace – of housing created through a deregulated planning route called ‘permitted development’ (PD) in England. In this, our own research (taking a ‘small data’ case study approach) has produced very different findings (that 70%–78% of conversions would fail to meet suggested space standards) to Chng et al. (2024; published in this journal, taking a ‘big data’ approach, and finding just 14%–34% of conversions would fail to meet suggested space standards). As such this article acts as a response to Chng et al. (2024), but also seeks to broaden the debate. First, we briefly consider the rise of big data and existing scholarship on the implications of this for research, before explaining the issue of PD housing space standards specifically. We then consider some of the key issues that we think are caused by understanding these built environment outcomes through a big data type analysis, before drawing some more general conclusions.
The rise of ‘big data’
The need for a renewed conversation about research methods to understand built environment processes and outcomes is driven by the rise of ‘big data’, something which becomes ever more pressing with the rapid development of Artificial Intelligence. It has been argued we are at the start of a big data ‘revolution’ (Batty, 2016). There are various definitions of big data. Batty suggested that it was ‘any data that cannot fit into an Excel spreadsheet’ (Batty, 2013: 274) while Kitchin talks about ‘data sets that are characterised by high volume, velocity, variety, exhaustivity, resolution and indexicality, relationality and flexibility’ (Kitchin, 2013: 262). Batty (2016) further notes that big data are often the produce of automation, such as a result of real-time streaming from fixed or mobile sensors, albeit having commented elsewhere that there are also many big data sets that are human generated (Batty, 2013).
The era involves not just data sets themselves but also their analysis: boyd and Crawford define big data as a ‘cultural, technological and scholarly phenomenon that rests on the interplay of technology, analysis, mythology’ (boyd and Crawford, 2012: 662) while Franklin (2023) highlights the importance of methodological change (big data are often analysed using ‘big code’) and the associated emergence of new forms of information. Built environment disciplines, such as real estate, where quantitative analyses of substantial datasets have historically been more typically integrated into research, also see the emergence of ‘big data’ as a significant regime shift (DeLisle et al., 2020), one which is extending into ‘PropTech’ (Braesemann and Baum, 2020), and other digital technologies, such as machine learning (Pérez-Rave et al., 2019). Big data has been harnessed to forecast market trends (Grybauskas et al., 2021), examine online listings (Lee and Lee, 2023) and evaluate valuation methods, such as automation (Kok et al., 2017). However, with valuation, the accuracy, reliability and form of data is key to avoiding the ‘garbage in, garbage out’, phenomenon (Jennifer, 2021), reinforcing the methodological importance of how urban big data can be used effectively for built environment research. Barkham et al. (2018) discuss how big data technologies can contribute towards improving urban quality of life and be socially useful, when data is collected and analysed effectively – with smart technologies inherently making ‘our cities cleaner, leaner and more efficient’ (p. 33).
With much of this big data being geographic in nature and either explicitly or implicitly spatially or temporarily referenced, there are suggestions of new opportunities for analysis about our world including in relation to cities, urban systems and the built environment (Graham and Shelton, 2013; Kitchin, 2013). Davenport and Patil (2012: 72) go so far as to suggest that being a data scientist is the ‘sexiest job of the Twenty-First Century’ and it has been argued that big data is ‘certainly enriching our experiences of how cities function, . . . offering many new opportunities for social interaction and more informed decision-making with respect to our knowledge of how best to interact in cities’ (Batty, 2013: 277).
Yet it has also been argued that many of the practices of the ‘big data paradigm’ have resonances with ‘geography’s quantitative revolution that as a movement began in the 1950s’ (Barnes, 2013: 298) to which the qualitative revolution was at least in part a reaction. Over a decade ago, Kitchin (2013) argued that ‘big data are undoubtedly going to become a key part of geographic scholarship . . . [but] . . . Big data poses a number of epistemological, methodological and ethical questions’ (p. 266). More recently, Arribas-Bel and Bakens (2019: 868) raise the need to consider ‘when big data is useful and when it is misleading’. The rise of Artificial Intelligence is also relevant here. As Allam and Dhunny (2019) explain, big data can be analysed by using AI, while Sanchez et al. (2023) call for researchers, through their own work, to engage planning practitioners in the design and development of AI applications which may help encourage its further adoption. Yet more fundamental underlying questions about big data remain. We consider existing scholarship on some of these issues in the next section, before turning to our own example of space standards to suggest the continued relevance of debate about use of big data and pressing need for further methodological reflection in studies related to housing, planning and the built environment.
Big data: Big uncertainties, big issues?
A number of scholars see great potential in the big data revolution, including the ability to see new trends and patterns and to generate understanding that would not be possible with smaller data sets, with big urban data able to support the generation of new hypotheses and give new evidence (e.g. Kandt and Batty, 2021). As Kitchin (2013) argues ‘it is clear that academics from across a broad range of fields and disciplines are engaging with big data to make pronouncements about geographical processes and phenomena’ (p. 264) and thus it seems all the more important for human geographers and planning scholars to engage critically with big data or avoid missing its opportunities.
Part of critical social science scholarship should be asking questions about data collection and analysis, including being able to understand the properties and limitations of data sets no matter their size (boyd and Crawford, 2012). Unfortunately, part of this big data revolution seems to involve echoes of the earlier quantitative revolution whereby computational techniques and an ‘avalanche of numbers become ends in themselves, disconnected from what is important. That is, techniques and numbers become fetishized, put on a pedestal, prized for what they are rather than for what they do’ (Barnes, 2013: 299). There is a need to avoid data as noise and as an end in itself, analysed just because they exist, or a conflation of data with knowledge.
One key issue seems to be the reductionist idea that ‘big data can speak for themselves and does not require contextual or domain-specific knowledge with regard to analysis and interpretation’ (Kitchin, 2013: 264). While Anderson’s suggestion (2008 in boyd and Crawford, 2012) that with bigger data the numbers can just speak for themselves has been widely critiqued, concern remains that ‘what is often lost is context . . . This was one of the warnings of humanistic geographers who attacked geography’s quantifiers during the 1970s’ (Barnes, 2013: 299). This can include a fetishisation of data over real-world importance and lack of interpretation as well as context. This may be particularly important where data are being used beyond their original intent, such as administrative data relating to tax being used as a proxy to understand property ownership or commercial data around mobile phones and credit cards to try to understand mobility or gentrification. There is also a continuing importance for theory to help explain causal relationships and underlying processes (Franklin, 2023) to address concerns that big data could otherwise ‘obscure, more than reveal, the complexity of social and spatial processes’ (Graham and Shelton, 2013: 255).
Similarly, and with reference to empirical data in a study examining the possibilities and the limitations of using online content to study informality in Shanghai’s housing market, Harten et al. argue: With the excitement of new data streams comes the need for greater critical thinking about this data . . . Big data proponents stake claims to greater objectivity, neutrality and accuracy. With enough volume, so the argument goes, data can speak for itself, transcending context, heralding the ‘end of theory’ (Anderson, 2008). But no data is value-free, no matter how great its volume and velocity. Social inequity and politics in the material world also shape the geography of the digital world. . . Our study of Shanghai’s group rental housing market also provides an opportunity to explore the limitations of scraped data. Through groundtruthing fieldwork, we discovered that the data we had scraped was systematically misrepresenting actual market conditions (Harten et al., 2021: 1832).
Alongside this are issues raised by Franklin (2022) about ‘representativeness, selection bias and other forms of missingness’ which can give uncertainty in all analyses but are perhaps more difficult to understand and measure here, so that rather than larger sample sizes obviating concern about uncertainty, instead ‘bigger data can entail bigger problems’ (p. 691). Data quality often concerns representativeness, and it is suggested there is need for particular care about ‘accidental data’, that is data being used in an analysis for something other than their original purpose (Arribas-Bel, 2014). Big data themselves, of course, are generated by social and political processes and structures which are also important to consider (Franklin, 2023).
Although these important issues have been discussed previously, we remain concerned about the extent to which the limits of data sets are considered or acknowledged in big data analyses and the extent to which appropriate methodological choices are made when the lure of a complex big data set presents itself. Kitchin (2013: 264) suggested that there was a need for careful thinking about the fundamental epistemological questions raised by big data, including over how useful, valid information can be extracted, analysed and understood from the ‘data deluge’ to produce meaningful outcomes. We believe there is still further need for reflection and conversation about this, as illustrated by the specific example of the space standards of PD housing in England.
Investigating space standards in permitted development housing
Permitted development (PD) is a form of planning deregulation in the UK under which certain types of development are permitted on a national basis rather than needing a specific locally granted planning permission. PD bypasses the need for consideration of the principle of development (such as whether it is appropriate in that location) and its design (including the size and mix of housing being provided). This became a particularly important issue in 2013 when, as part of a neoliberal infused agenda framing planning as a ‘barrier’ to the free market delivering more housing (Ferm et al., 2021), PD was expanded in England to include the conversion of office buildings into residential use, the first time that new housing could be created under PD since universal planning control was introduced in 1948. Instead of applying for full planning permission, the developer notifies the local planning authority of their intention to convert under PD and a more limited ‘prior approval’ process then happens.
This deregulation for office-to-residential change of use was then extended to retail-to-residential and light industrial-to-residential in 2015, but concern soon emerged over the quality of housing being created. Some of us investigated this through research published in 2018 (Clifford et al., 2018, also reported in Ferm et al., 2021 and Canelas et al., 2022), 2019 (Clifford et al., 2019) and all of us through a major review published in 2020 (Clifford et al., 2020). The 2018 and 2020 reports took an approach of selecting case study local authority areas on the basis of high rates of PD schemes but differing characteristics of authority. All schemes which potentially had been converted under PD in that authority were visited in person, and case study conversions then selected on the basis of providing a cross-section of typical implemented schemes in those locations, complemented by a detailed analysis of the associated planning documents. The 2019 report had a similar methodological approach of site visits and detailed planning documentation analysis but for schemes anywhere in England brought to the attention of a planning charitable organisation (the TCPA) through an online call, thereby potentially less representative of the types of PD conversion being seen. We would consider our 2020 study our most comprehensive work on the quality of PD housing to date, covering all office-, industrial- and retail-to-residential conversions in 11 local planning authority areas.
In all of our reports, a major issue was the quality of the housing created. With this no longer under the same level of scrutiny and control of the local state but essentially at the whim of the developer, it varied enormously. There were particular issues around space standards. The internal space available in housing has long been noted as being of particular importance to people’s quality of life. Lack of space can impact the ability, for example, to have a meaningful social life and provide sufficient privacy. It can negatively impact children’s development and, at the most extreme, can lead to overcrowding which can in turn promote the spread of infectious disease and harm physical health (Clifford and Ferm, 2021; Raphael, 2016). Space standards can be linked to issues as diverse as financialised urban development and value extraction, to carbon and energy efficiency, as well as to people’s everyday lives in the home (Hubbard, 2025).
In England, central government have produced the Nationally Described Space Standard (NDSS) for housing (DCLG, 2015). Standards depend on the number of bedrooms, intended occupiers and storeys, increasing from an absolute minimum of 37 m2. These standards can be applied to dwellings with full planning permission if they are part of the relevant local (development) plan. From 2013 to 2021 they could not be applied to PD schemes even if in the development plan and even though these were creating dwellings, since they were defined by central government as out of scope for this deregulated planning route. NDSS still do not apply if they are not in the development plan and only apply to ‘dwellings’ (so not, e.g. student housing, residential institutions or a House in Multiple Occupation).
Our 2018 report found that 70% of the office-to-residential PD housing units examined across five case study local authorities would fail to meet the NDSS. Our 2020 report found that 78% of the commercial-to-residential PD housing units examined across 11 case study local authorities would fail to meet the NDSS. As already noted, our findings on this issue are in stark contrast to those from Chng et al. (2024), who reported finding just 34% of large conversions and 14% of small conversions failing to meet the NDSS.
The methodological approaches taken by our respective studies are quite different. Chng et al. (2024) characterise our research as qualitative but perhaps a better description would be mixed methods. Our studies included stakeholder interviews and site visits alongside quantitative analysis of individual planning documentation including examining the submitted floorplans to determine space standards. As such, our approach was labour intensive, hence the case study approach. It is also worth highlighting that whilst Clifford et al. (2019) was focussed on poorer quality schemes, the other reports were not (despite the presentation as such in Chng et al. (2024)) and involved consideration of smaller as well as larger schemes.
In contrast, and noting the challenges of data availability on space standards (see also Hubbard et al., 2023), Chng et al. (2024) have performed an analysis at a larger scale, across all 33 local authorities of Greater London, comparing the Greater London Authority’s data on planning approvals with data from Energy Performance Certificates (EPCs) for these PD locations. Although the data here would fit in an Excel spreadsheet, we would characterise this as a ‘big data’ type approach. The EPC data set contains over 26 million individual records for domestic properties across England and Wales (DLUHC, 2024). Even though human rather than sensor generated, it is a large set. The analysis by Chng et al. (2024) also includes many of the features typical of a big data type analysis, such as using data for purposes other than what they were originally generated for (EPCs are about energy performance, not space standards albeit they do include floorspace data) and using coding based approaches. We believe this approach raises a number of issues of concern, which we now consider in turn.
Concerns about an EPC based approach to studying PD space standards
Issue 1: Dealing with complexity
Planning and associated regulation is an area of considerable complexity. This raises important questions about how any methodological approach is able to deal with such complexity. There are two key factors to consider with respect to understanding space standards in PD housing. The first considers the way permitted development rights (PDR) operate alongside traditional full planning permission as a sort of shadow, parallel system of consent. Not only has there been, since 2013, fairly regular changes to the types of development activity which are covered by PD and the rules which apply to it to (Callway et al., 2024), but there are further complex interactions between full planning permission and PD consents.
This is because developers often want to do things which are not allowed under the PD regulations as part of conversion schemes, for example making extensive exterior alterations to buildings, or adding extensions to create additional space (including, sometimes, additional floors on top of existing buildings). There may also be larger buildings (particularly office buildings) where different parts of the building are in different uses prior to conversion, but only some of these uses are eligible for approval to change to residential under PD. In these cases, developers could have chosen to just apply for a full planning permission for the whole scheme, but they often will not want to do so. Full planning permission costs considerably more than a PD prior approval in terms of the fees due to the local planning authority just for the consent but also in terms of planning gain charges. For example, ‘Section 106 agreements’ including affordable housing requirements and the ‘Community Infrastructure Levy’ charges are usually due on full planning permissions but not on prior approvals and these may amount to millions of pounds on a large office conversion scheme (see Clifford et al., 2020). There is also greater planning risk around full planning permission (uncertainty over getting the consent) given the principle of development can be considered by the local planning authority in a way that it is not possible in the prior approval process. Full planning permission allows greater scrutiny of the design of schemes and a greater ability for the local planning authority to impose a range of requirements on the scheme’s design and implementation, which may reduce developer profits.
For these reasons, it is not unusual to find conversions where there are both consents via the full planning permission and via the PD route. We can illustrate this through the example of Windsor House on London Road, Norbury in Croydon borough (Figure 1) – a 1970s purpose built office building. In November 2013, a developer applied for prior approval to convert the majority of the building under PDR into 56 one and two bedroom flats. However, as parts of two of the seven floors of the building had been in use for education and training purposes as opposed to in office use, these could not be converted under PD. The developer therefore applied for a full planning permission in March 2014 for these parts to be converted into 9 two and three bedroom flats. With planning permission secured alongside the original prior approval, Land Registry data shows the building was sold in April 2015 for £925,000. A different organisation then applied in June 2015 for prior approval to turn the same parts of the building as covered by the earlier prior approval into 140 studio flats and in September 2015 for planning permission to extend the fifth floor for an additional 9 flats (this was refused by the local authority but then allowed on appeal to central government’s Planning Inspectorate). The building was sold again in December 2016 for £25million.

Windsor House, Norbury: an office-to-residential conversion in Croydon borough with a mix of planning permission and PD consents.
From our site visit in 2017, we knew the residential conversion had been implemented and that there were 149 flats there, corresponding to the second prior approval and original planning permission having been implemented but with the planning permission for extension not implemented. The planning database floorplans show the planning permission flats as being 70–116 m2 each (compliant with NDSS). Calculating from the scaled floorplans, the PD studio flats are 21–28 m2 each (not compliant with NDSS). The EPC data show the planning permission flats as 64–125 m2 each and the PD studio flats as 20–33 m2 each. Although the floorspaces differ between the two sources, both clearly show the planning permission flats complying with the NDSS and the PD flats as not compliant. This shows the influence of planning regulation, and in addition to the space standards issue being influenced by planning regulation, planners were able to try and enforce a mix of units through planning permission so not all flats were studios.
Although in this case the number of planning permission flats is quite small compared to the number of PD flats, nevertheless there is a mix here, and a complex interaction of permissions, that can only really be disentangled through detailed case study work. We are aware of other schemes where the proportion of planning permission units is higher. It is also sometimes the case that developers use PDR as a negotiating tool, using a prior approval to establish the principle of residential use in a previously non-residential building. It can also be a fallback scheme which they use as a basis to negotiate a later full planning permission for something involving both a change of use to residential and more extensive development (including in some cases demolition of the former commercial building and rebuilding of a new residential building under that full planning permission).
Chng et al. (2024) appear to have taken a list of implemented prior approvals from the Greater London Authority and then assumed that all residential addresses in that building were converted under PDR but that is not the case. If the goal is to understand the influence of planning regulation on issues like space standards, this is an important distinction. They have not acknowledged the potential interaction of prior approvals and full planning permissions at all, nor the temporalities of changes to PDR across their study period. One of us is currently leading a project trying to identify PD housing across England, but as part of this all addresses with prior approval for conversion from commercial to residential are also checked for full planning permissions at the same address and that permission is then checked to see if it is creating new residential units itself (e.g. through extensions, change of use for different parts of the building or demolish and rebuild) or not (e.g. just making exterior alterations to a building being converted under PDR). This is time consuming work, but thus far in Greater London 8.6% of buildings checked appear to have a mix of residential units created under PD and planning permission and for 9.3% of buildings we either cannot be sure or believe a prior approval was then superseded by a later planning permission and so are excluding them from our analysis.
The second important area of complexity is the space standards themselves. The space standards are gradated by the number of bedrooms provided in the home and whether these are envisaged as single or double occupancy, so the minimum floorspace is larger for a two bedroom than one bedroom flat and if that is designed to be occupied by four people rather than three people in total, and so on (DCLG, 2015). This means that if you want to check compliance with the space standards, you need at least to understand the number of bedrooms. In our case study approach, this was done by reference to floorplans submitted as part of the prior approval process. If the intended number of occupiers was specified this was taken as the relevant size standard, if it was not then we always took the smaller size for that number of bedrooms as the applicable standard.
The EPC data does not show the number of bedrooms, and so Chng et al. (2024) have just applied the minimum standard for a one bedroom, one person residence to everything they have analysed. This may be easier to apply, but does not appear the best way to deal with this complexity. In our 2020 report we found 31.1% of homes created under PDR were two bedrooms or more, so applying the smaller standard here is especially problematic. This is harder to deal with when taking a big data approach than looking at detailed individual case studies, but a possible alternative could have been to look at the number of habitable rooms listed on the EPC data and assumed the number of bedrooms was one less than that where that was two or more, for example, a flat with three habitable rooms is assumed to be a two bedroom flat. The relevant space standard could then have been applied.
Issue 2: Dealing with data accuracy/measurement error
Analysis relying on secondary data sources inevitably depends on the accuracy of that data. There are a number of concerns with the accuracy of EPC data. Indeed, a government review in 2020 noted that only 3% of respondents to a consultation they had conducted thought the reliability of EPC data was good and proposed greater quality assurance in future (BEIS and MHCLG, 2020). Hardy and Glew (2019) find that a startling 27% of EPCs display some indication of inaccuracy and estimate the true error rate could be even higher. They note errors are more common for flats than houses (housing created under PDR is overwhelmingly flats). EPCs are created through human generated information and, in part, include inaccuracies because of human error. Whilst this matters for the intended purpose of EPCs, it can lead to more significant errors in analysis where ‘the uses for the EPC have been extended beyond their original function’ (Gledhill et al., 2023: no page number).
Inaccuracies in EPC data include specific concerns relating to the total floor area (TFA) of dwellings: Jenkins et al. note that ‘significant variation in TFA was apparent across many properties. Some of this variation is unlikely to be just a consequence of acceptable experimental error during a site visit’ (Jenkins et al., 2017: 483). Drawing on a detailed case study approach to checking the reliability of EPA data, they note variation of 13.7% TFA data for the properties they reviewed. Nagarajah and Davis (2019) conducted their own research comparing EPCs with independent assessments of a sample of properties and found that in 25% of cases, the floor area reported on the EPC varied by at least 10% from their own independent measurements. They found that in 56% of cases the EPCs underestimated the actual size of the dwelling, and in 44% of cases overestimated, whilst also noting that there is no mandated standard approach to calculating the floorspace of a property which an assessor must use. From our own review of PD housing floorplans, we noted how some poor quality schemes had very strange internal layouts in order to squeeze in the maximum number of flats in deep floorplate office buildings with constrained windows. Such ‘dog leg’ shaped properties may be more challenging for energy performance assessors to calculate the TFA.
This issue can be illustrated by the example of the PD conversion at 3 Church Road, Croydon (Figure 2). A former Victorian shoe and boot factory from the 1890s, it had been in office use prior to conversion to residential use under a prior approval granted in 2015 (following previous prior approvals from 2013 and 2014, with each successive prior notification squeezing in more flats of smaller size). The EPC data shows that Flat 30 was assessed in September 2015 with a floorspace given as 18 m2. A new EPC certificate issued following a second assessment in May 2021 then has Flat 30 as 24 m2, a difference of 28.6%.

3 Church Road, Croydon: an office-to-residential conversion in Croydon borough illustrating some of the uncertainties around EPC data.
From our site visit and understanding, no building works to alter the size of the flat had been undertaken; this simply illustrates the variability in floorspace calculation by those undertaking energy performance assessments. Interestingly, Flats 11, 23, 24 and 30 in this building all have EPCs listing them as ‘top floor flats’ but this numbering and the relationship between the floorspace sizes on the EPC certificates and that on the floorplans submitted to the local planning authority suggest that in reality these are not all on the top floor at all; again this suggests cause for concern with the reliability of EPC data.
This is not to say that submitted floorplans (on which we relied in our own research) will always match perfectly ‘ground truth’ and in some cases we have had to measure and calculate space standards from submitted plans ourselves, introducing a possible source of error. We believe, however, that planning floorplan compared to ground truth error is likely to be much smaller than with the EPC data given often the floorplans submitted for prior approval are architectural plans then used in the construction process.
Chng et al. (2024) openly acknowledge the issue with EPC data on floorspace, and cite Nagarajah and Davis (2019) on this. However, having noted that there can be both over- and under-estimation, they then attempt to control for this by only labelling homes with TFA on the EPC smaller than 35 m2 as being below NDSS minimal size, even though the NDSS specifies 37 m2. The existing literature, however, does not suggest consistent over-estimation of floorspace at all, but rather that EPCs both over- and under-estimate floorspaces (and if anything, being slightly more likely to under-estimate). The result is that a flat that is actually 32 m2 could be listed as 35 m2 on its EPC, within what appears to be the everyday variability of this data, and then that would have not been counted in Chng et al.’s (2024) analysis as being below minimum recommended size. Clearly at smaller sizes, every additional metre square of space can make significant difference to someone’s quality of life within their housing, and so there could be a large amount of housing which is below NDSS recommendations and yet counted in Chng et al.’s (2024) analysis as complying with these standards.
Issue 3: Dealing with data availability
A final key concern that we can see through the example of analysing PD housing space standards is the availability of data in the first place. In this, it is worth noting that the use of PD rights has been particularly attractive to SME developers and sees huge variation in housing quality. Some very high quality residential conversions have been delivered under PD, but from our own previous research we understand there is a large tail of poorer quality housing, and some of the worst has been delivered by those at the ‘sharper’ end of practice in the built environment sector. We often found that the poorer quality schemes were in the Private Rental Sector (PRS) – many being unmortgageable and so unable to be individually sold – and note Spencer et al.’s (2020) research which has found shocking levels of unethical behaviour and even criminality amongst landlords in the PRS more generally, which could then apply to some landlords of PD housing in the PRS.
Against this context, it is perhaps unsurprising that there are some PD conversions which have been implemented but where EPCs have not been lodged for the housing created. One scheme which illustrates this is Newbury House, Ilford (Figure 3). An office building constructed following a planning permission granted in 1962, it was converted into 60 flats under a prior approval from August 2014. These are all studio flats and submitted floorplans clearly show these are 13–23.5 m2 each. From a site visit we know these to be occupied and have been for over 6 years now, even attracting adverse media attention (e.g. Jones, 2018). We cannot, however, find any EPCs for these flats at all.

Newbury House, Ilford: an office-to-residential conversion in Redbridge borough illustrating some of the issues around EPC availability.
In Clifford et al. (2020), we did look for EPCs for our case study conversion schemes in order to see what they said about the energy performance standards of PD housing. At the time we said we could not find EPCs for 24.6% of the properties which we knew from our site visits were definitely now in residential use.
Part of the issue here can just be around address data more generally. In the UK, such data are somewhat messy, with inconsistencies in the formatting of property addresses between different data sets being very common. Further, when a large office building is converted to residential use, it can then have a new postcode issued to it, if the Royal Mail are aware of the conversion (in the early years of these PDRs, we understand there were some issues because planning permission monitoring usually triggers local authorities to confirm new addresses and this was sometimes missed in the deregulated space of PD). The case of 3 Church Road, Croydon, illustrates this messiness: the planning database has the prior approvals listed against the postcode of CR0 1SG, which was correct for the office building before conversion. The official Ordnance Survey AddressBase dataset and the Valuation Office Agency’s Council Tax data show a new residential postcode of CR0 1FP, and there are EPCs for 4 of the 32 flats with that postcode but then all the other flats have EPCs against postcode CR0 1RQ (actually that of a neighbouring building).
Another part of the issue has been delays in lodging EPCs and adding them to the publicly available dataset. The government’s own action plan (BEIS and MHCLG, 2020) also noted that a £200 fine for not having an EPC was probably insufficient to deter non-compliance. Looking again to our 2020 research, of the schemes we knew were in residential use but could not find a residential EPC for, 30% of these did have an EPC but all with new addresses we did not identify at the time. Forty-five percent of the schemes have had EPCs lodged in the years since our research even though from our site visits we knew they were in residential use – in some cases several years before those EPCs existed. For the remaining 15% of schemes, we still cannot find EPCs even though we found evidence of residential use during our site visits (and in some cases other sources such as Valuation Office Agency Council Tax data corroborate this). This illustrates not just the issue about missing data, but also the wider complexities inherent in such research whether through a big or small data approach.
Chng et al. (2024) have acknowledged the address data issue and apparently taken measures of using other locational data to try to find EPCs even where things like the postcode or street address have changed during conversion (although it is not clear from the paper how exactly this was done). Nevertheless, it is still possible to find schemes where there is evidence of residential use but no EPCs, including some where the addresses of flats are shown in other data sets such as those for local council tax and postal deliveries. Chng et al. (2024) do not acknowledge this issue. This is not to say that everybody will always comply with planning or legal regulations, however the likelihood of enforcement action and consequences for breaching planning regulations are perhaps more significant than for not having an EPC when you should. Further, the methodological approach underpinning our research on this topic has involved both desk work utilising planning databases but also site visits (in some cases involving counting things like door buzzers and mail boxes to understand the number of flats actually implemented at a particular location). PD has opened up a new deregulated space around the development of new housing, and some have certainly exploited this less tightly controlled space to deliver lower quality housing. Understanding just what has actually happened on the ground cannot always be done remotely using a big data analysis when in some cases relevant data simply does not exist in the first place.
Conclusions
A data revolution is underway impacting many aspects of our lives. In academic research on the built environment by human geographers and planning scholars, an era of the dominance of qualitative research and analysis may be ending with a resurgence and repositioning of quantitative analysis, particularly that based on the so-called big data regime shift. With the rise of digital planning agendas, big data may be increasingly important in planning practice in the future as well, albeit the new quantitative revolution risks data overwhelm (Wyly, 2014). Further, as Taylor argues, research based on big data tends to acquire a policy-related diagnostic function, regardless of the authors’ intentions. It comes ready-made as policy advice because it singles out what can be quantified and observed at scale, and therefore what is accessible for shaping by technical interventions. ‘Paths, nodes, districts, edges and landmarks’ can be addressed cleanly using a technical problem-framing in ways that messy social life cannot. Moreover, research that makes visible possible targets of optimisation also plays a political role in determining what gets optimised . . . Given that datasets come with their own politics, this makes researchers’ choice of object and of methodological approach important in determining how urban governance and development are designed and executed (Taylor, 2021: 3200).
All of this requires renewed reflection on methodological questions. There can be shortcomings to smaller data mixed method or qualitative studies which may not always reveal the extent of particular issues and where, for example, there can be particular issues around case study selection (Yin, 2014). There are also very practical concerns about the resources and time-taken to do deeper and richer research involving site visits or in person interviews. As boyd and Crawford note, ‘historically . . . collecting data has been hard, time consuming and resource intensive. Much of the enthusiasm surrounding Big Data stems from the perception that it offers easy access to massive amounts of data’ (boyd and Crawford, 2012: 673). However, we believe that it is important that there is due caution and care taken in the current rush towards big data. Just because a data set is available, does not mean it is always going to be the best approach to understanding a particular phenomenon and apparent short cuts to replace hard, human efforts to understand phenomena may sometimes be best avoided.
As we have illustrated through the example of considering the space standards of housing created through planning deregulation, we believe a more intense approach combining site visits and analysis of individual floorplans offer greater reliability than one based on quoted floorsize within a secondary data set created for other purposes. This is because of issues around the complexity of planning regulation, measurement error present in the energy performance secondary data set, and the availability of data for PD housing schemes in that data set. These issues are likely to be relevant beyond the specific issue of PD housing space standards. In the built environment, there is also the addition of the human factor: as a site of considerable potential capital accumulation, it is perhaps unsurprising that there may be examples of behaviour which do not comply with regulation or norms and so there is all the more need for caution around secondary data sets.
This is not to say that there is no merit in big data approaches, of course. This new era of big data has many opportunities and indeed at different times we ourselves have had involvement in studies utilising quantitative approaches. The findings of Chng et al. (2024) around affordability of PD housing and average levels of air pollution, for example, appear to add new perspectives on these issues. Taking an extensive view and looking at scale across the whole of Greater London can help us understand the full impacts of this type of planning deregulation.
We were, however, somewhat surprised to see Chng et al. (2024) claim that their study ‘confounded some existing assumptions’ (p. 975) around space standards; by implication our previous work. Based on the points made in this response, we believe that our findings should be considered the result of a thorough study rather than characterised as assumptions. Most importantly, we stand by our earlier conclusions that the majority of PD housing (created before the relevant studies) does not meet NDSS. Our own findings are similar to others who have done detailed studies of space standards in PD housing based on knowledge of individual schemes: Remøy and Street (2018) draw on a local planning authority analysis of all office-to-residential prior approvals in central Croydon which found 83% did not comply with a NDSS, whilst a similar analysis by a planner for Brent found 87% did not comply with NDSS (Dilley, 2018). We think it is important to further develop an intellectual debate and agenda around methodological approaches in built environment research, a debate that is transparent, open and serves to collectively advance our field of research.
In this case it is also important to look beyond scholarly debate, though. As previous studies have shown, and some recent pilot research focussed specifically on the link between PD housing and health has reaffirmed (Pineo et al., 2024), there is an important link between space standards and people’s health and wellbeing (see also Dunning et al., 2023). Reflecting more generally, Chng et al. (2024) themselves suggest PD housing may be exposing residents ‘to forms of slow violence’ (p. 976). But by a methodological approach which we believe incorrectly underplays the level of compliance with space standards in PD housing, there is a risk that a big data driven study plays into the hands of critics of planning regulation who would rather not have state regulation around such issues at all, whatever the consequences for communities.
Our 2020 study was an ‘independent review’ commissioned by government following media and campaign pressure around the problems of PD housing quality. Following receipt of our report in 2020, the government acted fairly quickly to require housing created under PDR to at least have adequate natural light into habitable rooms (we found evidence of some flats with no windows, but resisted introducing space standards for this housing until 2021 and only hours before facing a possible Parliamentary defeat over the matter). Some have criticised their introduction (Breach, 2020). Studies on this topic can therefore very readily have real world consequences.
A similar example here relates to studying housing eviction across the US. Critiquing a big data approach relying on web scraping of court administrative listings, Aiello et al. (2018) raise concerns that this method ignores differing legal contexts between states, leading to actively misleading claims such as that Oregon has fewer evictions than other states even though no-cause evictions are allowed there without any formal court filing which would show in the data set analysed. Nelson et al. (2021) highlight both the importance of understanding differences in legal processes and housing markets in place-specific ways to understand evictions. They also note that administrative record keeping can be inconsistent and overlook informal procedures and actions. Summers and Steil (2024), meanwhile, compare an intensive approach of obtaining paper court records and subjecting these to detailed analysis compared to web scraping court administrative data relating to evictions and find the administrative data presents an inaccurate picture as court listings often result in agreements between landlords and tenants before a judge makes a decision on the case. In this example, trying to understand housing eviction through a big data approach sacrifices precision for scale, oversimplifies everyday complexity, potentially introduces bias, and can ultimately present a misleading picture about an issue of housing justice, for which campaigners and policy makers need a better understanding of what is actually happening.
There is, then, an ethical dimension to our methodological choices. This seems particularly so when considering the appropriateness of different approaches to understanding issues like deregulation in complex, human driven systems like built environment change. boyd and Crawford (2012) have argued that there was ‘an arrogant undercurrent in many Big Data debates where other forms of analysis are too easily sidelined’ (p. 666) and that there can be a continuing potential value of small data. In slight contrast, Wong suggests that the varied and incomplete coverage of big data and potential issues of bias and measurement error make a ‘mix of big data, traditional data and other independent data sources together’ the best way forward (2022 in Batty and Wang, 2022: 103). Tanweer et al. (2021) similarly argue for bringing an interpretivist lens to help produce better large-scale analyses, while – noting that ‘big data is not always better data; it is different data’ – Harten et al. (2020) call for greater interdisciplinary integration in urban research (p. 1841). Similarly, Howe (2021) calls for more mixed-methods approaches that bridge between the frequent quantitative fixation with big data sets and the qualitative focus on particular sites of research and everyday experience which can be challenging to generalise beyond to, under the guise of ‘thinking through people’.
As illustrated by the example of PD housing space standards, we believe there remains a pressing need for careful thought and reflection over big data driven approaches, opening-up of methodological black boxes around big data, and a more productive conversation between methodological tools to fully understand the built environment.
Footnotes
Acknowledgements
Although this paper is a commentary based response to another paper concentrating on methodological questions and so not directly the result of funded research, our understanding of these issues and earlier data collection has been supported variously by the RICS Research Trust, the Town & Country Planning Association, the Ministry for Housing, Communities and Local Government and Impact on Urban Health. One of us is currently undertaking further research on the issue of permitted development housing funded by the National Institute for Health and Care Research. We would like to acknowledge this various funding and the involvement of the various participants in these different projects. We are also very grateful to the editor and reviewers for their support in the production of this paper. In particular, reviewer 1 for their supportive reaction to what they called an ‘ethnography of data’ and reviewer 2 for the suggestion relating to housing eviction data. As ever, the interpretation of all feedback received is the responsibility of the authors.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethical approval
This paper is based on secondary analysis and commentary only, which would be considered exempt from the requirements for ethical approval.
