Abstract
Simultaneous peak water flow (SPWF) is a fundamental parameter calculated during the design phase to ensure the proper sizing of water systems in buildings. Over the years, overestimation has been a persistent problem in SPWF studies, leading to inefficiencies in energy and water use. This research aims to address this problem by developing a flexible data-driven approach to estimate water fixture use probability (FUP), a key variable in SPWF calculations. The proposed method uses artificial neural networks to model the water demand based on the type and FUP of water fixtures in the building, drawing on the Wistort model and Water Demand Estimation Model (WDEM). Differential evolution was used to optimise the FUP values based on empirical water flow data. The model was applied to two non-residential buildings of different sizes and uses, where results showed that the models closely estimate the empirical SPWF compared to building design codes. Due to its use of adaptable parameters, flexible underlying models, and real-world data, the developed approach can be applied to buildings of any type and size. By reducing the risk of SPWF overestimation, the method contributes to more efficient water and energy use, ultimately supporting sustainability goals in the built environment.
Practical application
This study presents a new methodology for estimating SPWF for buildings. The methodology is flexible and can be applied to different types and sizes of buildings. The use of real-world data aims to provide a more practical model that will closely estimate the SPWF of the buildings, as compared to the design flow rates from the design codes, for more cost-efficient, energy-efficient, and water-efficient building water supply systems.
Keywords
Introduction
Estimating simultaneous peak water flow (SPWF) is a crucial aspect of building design as it determines the sizing of the building’s water supply system. This involves determining the types and counts of water fixtures and estimating the number of building occupants, occupancy patterns, the probability of fixture use or fixture use probabilities (FUPs) at any given time, and the amount of water used during particular periods. SPWF represents the estimated maximum instantaneous water flow rate and informs the design flow, which then ensures that building occupants can simultaneously access water with satisfactory flow and pressure during peak usage periods. Consequently, this determines the reliability and quality of water delivery throughout the building. Obtaining a good estimate of SPWF also ensures that resources associated with the water system, such as construction materials and energy-for-water, are used efficiently. However, current estimation methods tend to overestimate the SPWF due to various factors, including changes in water usage behaviour, occupancy profiles, and the improved efficiency of water fixtures among others.1–4
The overestimation of SPWF in the built environment can be attributed to multiple factors. Addressing this problem requires a thorough examination of each component within the methodology. SPWF estimation models often consider the number of fixtures (n), fixture flow rate (q), and FUP (p) as inputs. While the number of fixtures and fixture flow rate are fixed variables that yield the total flow rate at a given instant, the FUP introduces a measure of uncertainty reflective of real-world conditions.
To address the problem of SPWF overestimation, this work aims to develop a novel method for estimating the FUP in buildings. The approach seeks to capture the complex, non-linear relationships and uncertainties associated with user behaviour and water demand by leveraging machine learning and neural networks. In this work, FUPs are taken to be constant but unknown values. The variability of water use over time is captured by our modelling technique by assuming that water use is a random variable whose distribution depends on the underlying FUPs. Thus, we closely tie the empirical variations of water use due to underlying assignable factors such as time of day and building occupancy levels to realisations from the underlying distributions, allowing for a more realistic representation of peak water demand. This makes the probabilistic aspect governed by FUPs a crucial element in SPWF estimation methods.
There is no known data-driven method or machine learning-based method for determining the FUPs. Most FUP values used in existing models are generalisations or derivations from survey data taken from specific countries, periods, and building types, limiting their adaptability to diverse applications. This study therefore also introduces the use of neural networks for SPWF estimation. The proposed methodology offers a more adaptable approach, accommodating the continuous evolution of water fixture technologies and changing water use behaviour. By focusing on developing a model to derive FUPs, this study ensures applicability across various types and regions of buildings.
Background and literature review
Methods for estimating SPWF can be classified as having a top-down or bottom-up approach in terms of their structure. Top-down approaches take into consideration the aggregated water demand of a building, while bottom-up approaches focus on the water end-use. Most top-down approaches rely on occupancy estimations and FUPs to calculate the building’s SPWF. This approach is often applicable and more reliable for buildings with a large amount of high-resolution data collected over multiple years. 5 Examples of earlier methods include those developed by Hunter 6 and Wistort. 7 On the other hand, bottom-up approaches work at the micro-component scale typically using water demand pulses to model the overall water demand, and taking into account the frequency and duration of water use per fixture. 8 An example under this category is SIMDEUM, which uses water pulses based on the concept of the Poisson Rectangular Pulse.9–11
A fundamental part of calculating the SPWF is the estimation of the FUP for each fixture in the building. The first and most widely known method, the Hunter’s Method, estimated the probabilities by observing morning calls of a hotel, and exit times of an apartment building. 6 These values were used to signify the peak occupancy of the buildings and were assumed to be directly associated with the probability that water fixtures in the building were being used.
In the fixture units method, Hunter mathematically expressed the probability that exactly r out of n fixtures are in use at any instant of time for n fixtures of the same type as
The above formula for
While this methodology is coherent, with probabilities that are contingent on occupant presence, it lacks robustness due to its inflexible and overly specific foundation, which limits its applicability to other building types or occupancy profiles. The FUP also depends on the building type and the occupancy profiles; hence there is a need to develop a more flexible method than the surveying method implemented by Hunter. On applying Hunter’s method to a combination of fixtures in a building, it is found that the theoretical peak water use (expressed as a combination of quantiles of separate binomial distributions, one for each fixture) grossly overestimates the empirical/observed values. Thus, the need to reduce this gross overestimation has been recognised in the literature.2,3
Robert Wistort developed a modification of Hunter’s method in 1994 by applying a normal approximation to the binomial distribution. 7 The theoretical advantage of Wistort’s model is that water use can now be expressed as an analytic combination, via the normal distribution, of separate water uses by the different fixtures. Wistort’s model was designed to overcome the limitations of its predecessor by eliminating the reliance on fixture units and on separate quantiles of the different fixture flows. This enhancement makes Wistort’s model more flexible, even as fixture efficiency improves or user behaviour evolves. With the use of better-approximated usage probability values and appropriate fixture flow rates, the dimensionless design expression of Wistort’s model can effectively determine the SPWF of buildings with different types and sizes. 12
One limitation of Wistort’s model is that the normal approximation used for combining water use from different fixtures works well when the number of fixtures is large. In reality, the number of fixtures in buildings generally does not extend beyond 30, a rule of thumb used in the statistics literature for the validity of the normal approximation to the binomial. Thus, this paper also addresses the problem of how to combine the water use of the different fixtures when the number of fixtures remains small or moderate. This paper demonstrates that the use of artificial neural networks (ANNs) allows us to develop a methodology for aggregation without reverting to the normal approximation theory.
Currently, one of the most used guides for sizing plumbing systems is the Plumbing Engineering Services Design Guide. Published in 2002 by the Chartered Institute of Plumbing and Heating Engineering (CIPHE), it provides guidelines for the sizing of pipework systems in the United Kingdom and other countries. 13 It is also believed that the method produced by CIPHE was based on Hunter’s method. 2 The advantage of using the CIPHE Design Guide is that the different scales provided for frequency of use (low/medium/high) allow for appropriate categorisation of building types and the volume and frequency of occupancy which can avoid overestimation, especially for less dense buildings. As a widely used guidance on water supply sizing, the CIPHE design flows will be used for comparison, together with the BS 8558 14 which is also used in the United Kingdom.
Recent developments in SPWF models and fixture use probability estimation
In more recent developments in SPWF estimation, Omaghomi 15 focused on estimating the FUP during peak hours in residential buildings. High-resolution water-use data from around 1000 single-family homes and single-family households in multi-apartment buildings in the United States were analysed. The study found that for single-family homes, the FUPs ranged from 0.005 to 0.055 for typical household features (e.g. bathtub, clothes washer, dishwasher, faucet, shower and toilet) during peak hours. Meanwhile, in multi-apartment buildings, the FUPs tended to be lower and generally decreased as the number of units in the buildings increased. The study focused only on one peak hour of water use in residential buildings, identified as 8:00 AM–9:00 AM. The SPWF from the results of this study, calculated using the Water Demand Calculator, closely aligned with the observed values from the Uniform Plumbing Code which is based on Hunter’s curve. 15
Josey and Gong 16 developed a stochastic water demand model that can be used to determine the probability of use of water fixtures for multi-level residential buildings. This study emphasised the use of high-level statistical information taken from publicly available water-end use studies. The methodology can also be used to determine region-specific probability values. The formula developed considers the building size and the number of occupants and does not require the number of fixtures or the number of occupants in each household in the building. Validation was performed using three residential and one mixed-use building in Australia, where it was found that the computed design flow overestimated the observed water consumption values by up to 36%, reducing the overestimation by 250% compared to the peak demand flow from the Australian plumbing standard AS/NZS 3500.1:2021. 16
Meanwhile, Cortez-Lara 17 proposed a new methodology for estimating peak water demand through semi-direct methods and an Inverse Transform Method (ITM). The water consumption data were normalised using the standardised flow rates from Mexico’s plumbing codes, where the method was applied. The standardisation aimed to identify patterns of end-use for common household fixtures which include faucets, showers, toilets, and sinks. A discrete frequency distribution based on the standardised flow rates was then established to determine the probabilities for simultaneous use, using a binomial distribution. The methodology employed a Monte Carlo technique to generate different scenarios in the modelling, where the ITM was utilised based on cumulative distribution functions of the probabilities. A unique contribution of the study was the elimination of the human behaviour factor through the standardisation of flow rates. While this eliminates the uncertainties introduced by the human factor, it also overlooks the inherent nature of water consumption, which is largely influenced by human behaviour.
SPWF models for non-residential buildings
Most of the studies mentioned thus far were developed specifically for residential buildings. Non-residential buildings receive less focus in SPWF estimation due to several factors such as data acquisition and the wider range of building types and uses to account for. This study therefore aims to address this gap by focusing on non-residential buildings. In literature, two prominent SPWF methodologies focused on non-residential buildings include Murakawa’s Simulation for Water Consumption (MSWC) and SIMDEUM.
In Murakawa’s MSWC model, 18 probability distributions are used to model various water usages, and the water demand is forecast chronologically using a Monte Carlo simulation. Murakawa used queuing theory to estimate the number of fixtures operating simultaneously. 19 Using this method, water demand can be calculated on a daily, hourly, or peak-time basis. 20 It can be used for both residential and non-residential buildings (including apartments, hotels, restaurants, office buildings, and train stations), and single-storey or multi-storey buildings. The basic characteristics of the building, such as the number of occupants, the ratio of male to female, and the number and types of fixtures are taken into account. Water temperature is also considered for hot water use. As is the case for the Japanese design standards, the MSWC model allows for computing water load based on two methods: personnel per area and the number of occupants in the room. Murakawa’s method made great improvements in water demand estimation - it allowed for two methods of computation, and for the calculation of hot water demand with consideration of temperature, which were issues not addressed before. The Monte Carlo simulation, which incorporates the dynamics of fixture use via stochastic processes, also increased the robustness of the model by allowing for the simulation of different scenarios.
Turning to SIMDEUM, this was initially used to model residential water demand, and later evolved to include a non-residential use case by implementing the model in a modular approach where rooms were classified according to user characteristics and appliances. 10 There were three types of non-residential buildings considered, namely office buildings, hotels, and nursing homes. Unlike the residential building model, the non-residential building water demand model only considers indoor water use. The end uses were all described by fixed input parameters instead of statistical distributions, which vary depending on the end use in the residential model. 10 The users were found to be an integral part of the methodology since they determine the diurnal pattern of water use, represented by a normal probability distribution.
Mohammed’s study 21 also proposed a model for non-residential buildings. The Water Demand Estimation Model (WDEM) is a stochastic model developed in 2022 at Heriot-Watt University. This model aims to yield design equations for estimating the SPWF by using the maximum occupancy in non-residential buildings. 21 The model uses a Monte Carlo simulation applied to the binomial probability distribution. This part of the methodology will be used in this study, and is explained further in the Methodology.
These studies show the different advancements in SPWF modelling and the different approaches employed to approximate the FUP values to reduce the overestimation of SPWF. Aside from using survey data or recommended values from the CIPHE guidelines, there has been limited advancement in the estimation of the FUP for non-residential buildings. In addition, no data-driven methodology had been developed for computing FUP, opening the opportunity to explore the use of machine learning for this study.
Methodology
The methodology for estimating the FUP involves two main stages: water demand estimation using ANN modelling, and FUP estimation through function optimisation. Figure 1 illustrates the processes involved, showing the parallel modelling process with two underlying models. The implementation was executed in Python 3.11. Methodology framework.
Two computer models for peak water demand were created using ANNs, based on Wistort’s method and WDEM. The ANNs were used to approximate the relationship between the input FUPs and SPWF values, and learn other patterns that cannot be captured in probability functions or closed-form expressions. Global optimisation methods were then employed to estimate FUPs from empirical data, as discussed in the succeeding sections. Equation (3) represents the model where Q denotes the SPWF, η is the computer model function that relates the non-linearity of the fixed variables n and q, with the user behaviour-influenced FUPs (p). α represents the quantile value, indicating the satisfactory performance level at which the water supply system is expected to provide adequate service to building occupants. The 99th percentile (Q0.99) was introduced by Hunter,
6
where this method was designed to calculate an SPWF that will enable the water supply system to deliver a satisfactory service 99% of the time. Succeeding studies and building codes use the same value in designing plumbing and water systems. For this study, the α = 0.99 is also assessed to have a systematic comparison against the established building codes.
Data analysis and dataset preparation for water demand estimation computer model
In the first part of the methodology, theoretical datasets that will be used to train the machine learning-based water demand models were generated. These are not based on any empirical or observed water use data, but on the theoretical models of water use (Hunter, Wistort and WDEM). The preparation of these theoretical datasets involved listing all the fixtures within the building, along with their count and flow rates. The theoretical dataset inputs were then created by generating a grid of all possible combinations of p for each fixture type using permutation
The two models were chosen for the following reasons: • both methods can be verified using aggregated water consumption data, which is more straightforward to collect than end-use data, • both methods have good degrees of flexibility such that they can be used for any type of fixture, and • they can be applied to buildings with a high number of fixtures.
In Wistort’s model, the number of busy fixtures (X) is a random variable with a binomial distribution with a mean and variance of
Using the normal approximation for the binomial distribution (NABD) for K independent fixture groups, Wistort defined the peak water demand (Q) to be the 99th percentile of the total demand:
The above formula arises from the fact that the total demand
In the case of WDEM, the model calculates the SPWF using a binomial approximation model with a stochastic simulation. A Monte Carlo simulation was used to generate random scenarios within the statistical framework with 400,000 runs to capture all possible outcomes for a building of any size. In this method, the number of fixtures in use (X) per fixture type in a trial is given as
Artificial neural networks as a function approximator
In the previous section, the 99% percentile is obtained as a deterministic function of the FUPs (p). However, Q α = η(p) is not an analytic function of the FUPs; for Wistort’s model, it has a closed-form expression, but the normal approximation, as mentioned earlier, is not valid when the number of fixtures is small or moderate. Q α does not have a closed form expression as a function of p in the case of WDEM since the original binomial distributions are used to obtain the 99% of the aggregate water use.
The use of neural networks aims to produce a computer model (η) for estimating the water demand through a regression-based estimation framework. The ANN-derived model Q α = η(p) is incorporated in an estimation framework where the optimal p’s are those that minimise a quantile loss consisting of the observed data and Q α = η(p). The ANNs were trained using the grid of FUPs along with the theoretical SPWF values, as described in the previous section. ANNs were chosen for their ability to create a more general and flexible model that can approximate the SPWF while accounting for unquantifiable factors and uncertainties. Their capacity to model non-linear relationships between inputs and outputs allows them to effectively approximate user behaviour, such as the water usage patterns of building occupants. Although Wistort’s model for Q α is already in closed form, we also demonstrate that ANNs are able to capture the relationship between Q α and p effectively even in the case of the normal approximation. ANNs are used to overcome the limitations of the probability expressions.
The ANN architecture was chosen for the computer models for peak water demand as it achieved the highest accuracy during a preliminary study, in comparison to Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Hybrid RNN-CNN architectures. 22 With 3 or more layers, the ANN architecture is considered a Deep Neural Network (DNN). Hyperparameters, including the number of hidden neurons, kernel initialisers, and activation layers, were optimised using grid search. The model’s goodness-of-fit was measured through its R-squared statistic, also referred to as the coefficient of determination. The details for the ANN methodology are provided in the subsequent sections.
Differential evolution for optimising fixture use probabilities
Once the function, η, was fitted using the training samples, function optimisation was employed to find the optimal estimates of the model inputs, specifically the FUPs, based on empirical data. The objective function was optimised based on independent and identically distributed (IID) observations of the empirical data during the working days of the measurement period. Selecting only the working days or weekdays ensures that the occupancy patterns are more regular, in comparison to the highly variable occupancy behaviours on weekends, especially for non-residential buildings.
The function optimisation technique implemented was the differential evolution (DE) method, which was developed in 1997 by Storn and Price. 23 This is a straightforward global optimisation technique designed to minimise non-linear and non-differentiable continuous space functions through a stochastic parallel direct search method. It is self-organising, which enables ease of use.
The basic DE is comprised of the following four phases: initialisation, mutation, crossover, and selection. 24 The initialisation phase occurs once, while the other phases are repeated until the termination criteria are satisfied. In the first stage, initial solutions for the multi-dimensional optimisation problem are chosen. This can be done randomly, or initial values can be specified. Mutation in DE is a random perturbation executed on selected decision variables, where a mutant vector is established based on a given target vector. During the crossover phase, the mutant vector from the previous process and the target vectors cross the components together through a probabilistic process, producing the trial vector. The selection phase then determines the survival of the target vector or the trial solution in the next iteration of the search process. 24 The last three phases are then executed iteratively until the termination criteria are met. DE aims to minimise the objective function by iteratively improving the candidate solutions through these evolutionary processes. 23
The objective function used in this study was a quantile loss function (L
α
) expressed as:
Case study buildings and data
Case study building water fixtures.
aSimilar to WC.
bSimilar to Building A.
cFlow rate obtained from manual measurements. The toilets are assumed to have 3L effective flush volume based on a 5-credit worth toilet in BREEAM.
dAssumed to be in the middle range as there was not enough information.
eSimilar to Cleaner’s Sink, as a fixture that is not often used compared to washbasins.
The theoretical datasets generated for the ANN modelling were composed of 75,000 rows, with the grid of FUPs as inputs and the corresponding Wistort or WDEM SPWF as outputs. Figure 2 presents the correlation coefficients of the theoretical FUPs and the SPWF (Q(L/s)), together with the number of fixture counts as annotated. Aside from the FUP and fixture counts, it is also important to take note that the fixture flow rates affect the SPWF values in the calculation of the output, although only the FUPs were used as inputs for the ANN modelling. Based on the correlation coefficients, fixtures with higher counts tend to influence the SWPF more than those with only one or two fixtures of the same type. This observation is an important consideration in the FUP optimisation results analysis. In the ANN modelling, the datasets were split into 80% training, 10% validation, and 10% test datasets. The training dataset was used to build the model and find the optimal hyperparameters, while the validation set further tunes the model and ensures that the model has not overfitted or underfitted the training data. Lastly, the test dataset was used to assess the efficiency and robustness of the model by exposing it to new values. Data correlation of the theoretical dataset variables.
The empirical flow rate data, which were used for the FUP optimisation, were measured using a Portaflow 330 Portable Ultrasonic Flowmeter. It is a non-intrusive flowmeter that uses two clamp-on ultrasonic transducers to measure the transit time of ultrasound signals as they travel through the flowing water. The flow meter recorded the instantaneous flow rates at the chosen intervals; in this case, at 5-s and 10-s intervals. Measuring the flow rates at high resolutions therefore ensures that peak water use is captured.
The in-situ data were collected over a 12-day period for Building A, and a seven-day and 26-day period for Building B during term time at the university, reflecting high and regular occupancy patterns. A minimum of 2 weeks’ worth of data is sufficient for training, as this can capture significant patterns and behavioural trends in water consumption. This is particularly true for university buildings, where most students or staff have a regular weekly schedule. The weekends are excluded from the analysis as the university is empty during those days and the occupancy may be irregular, affecting the distribution of the data.
The collected datasets were split into training and testing datasets. The training datasets were used for the FUP optimisation, while the test datasets were used for the evaluation of SPWF results. For Building A, only one set of five-second resolution data was collected from the WDEM study 21 ; therefore, the dataset was split into 90% training and 10% testing datasets through random sampling. Meanwhile, for Building B, two sets of data were collected. The first set of data was collected at a five-second interval for 7 days, and the second set of data was collected at a 10-s interval for 26 days. The second dataset was chosen as the training dataset since it is larger due to its longer duration, and should therefore be more representative of the building’s water use compared to 1 week’s worth of data. The measured or empirical 99th percentile of the peak water demand for Building A was 0.419 L/s, while for Building B this was 0.439 L/s.
The water flow data were subsequently resampled to hourly and five-minute resolutions to satisfy the IID requirements, and to further look into the importance of granularity in SPWF estimation and water consumption studies. During the aggregation, the maximum value for each period was selected, and the zero values were removed. This approach prioritises peak values, which are crucial features for SPWF estimation.
The collected data were analysed to determine whether they comply with the IID requirements. The water flow data is guaranteed to be independent due to the nature of water use, where draw-off time and flow rate are independent from each other, especially for lower resolutions. Meanwhile, to prove that the data have identical distributions, the Kolmogorov-Smirnov test for two independent samples was conducted. The Kolmogorov-Smirnov test evaluates the null hypothesis that two independent samples come from the same distribution by comparing the cumulative frequency distributions of the samples. 25 It is a non-parametric method and is therefore flexible enough to evaluate a wide range of data types. The test was performed by comparing samples by day on a rolling basis. For example, for the first test, sample 1 is the data from March 1st and sample 2 is the data from March 2nd. For the second test, sample 1 was from March 2nd, sample 2 was from March 3rd, and so on. The average of the p-values, which is calculated to assess whether the null hypothesis should be rejected was then obtained. Building A yielded an average p-value of 0.072 while for Building B, this was 0.486. As both exceed the significance threshold of 0.05, the hypothesis was not rejected and it can be concluded that data from both buildings can be considered identically distributed based on the study’s IID requirements.
The plots in Figure 3 illustrate the hourly and five-minute resolution training data. For Building A, the hourly resolution dataset contains 88 independent points, and the five-minute resolution has 210 points. Building B contains 268 points for the hourly resolution and 1307 points for the 5-min resolution. Furthermore, it can be observed that Building B has more regular water use patterns compared to Building A which has some occasional extreme peaks. While these can be considered outliers, the first analysis considers them as true peaks of water consumption. Resampled water flow data.
Results
The neural network test results are shown in Figures 4 and 5, alongside the hyperparameters of the best model configuration. The deep neural network architectures were fully connected neural networks composed of three hidden layers for the WDEM-based model and two hidden layers for the Wistort-based model. The additional hidden layer for the WDEM-based model helped increase the accuracy of the models. The hyperparameters were determined using a grid search algorithm where all possible values within a chosen range were evaluated in all possible combinations, and the best model resulting from the configurations was chosen. Neural network test results for building A. Neural network test results for building B.

The neural network architectures produced computer models that accurately approximate the Q values for Wistort’s Model and WDEM, as reflected in their R-squared values that are all close to one. Figures 4 and 5 also show the regression line between the theoretical and the predicted values. An ideal regression line has a slope close to one with datapoints lying closer to the line, which the models in the study generally satisfy. The percentiles from the WDEM method do not have a close mathematical expression; thus, the analytical approximation of the ANNs to these values is noteworthy. Meanwhile, being based on a probabilistic model, the Wistort-based neural network models had R-squared values close to one. These results reinforce the efficacy of neural networks as computer models for SPWF estimation.
The optimisation function for obtaining the optimal values of FUPs was executed for each model. Ten trials were conducted to analyse whether the algorithm produces stable values within a small range of deviation. The DE was initialised with the CIPHE FUPs for each fixture, as found in Table 1, to guide the initialisation process to historically used FUPs.
Results for the trials are presented in Figure 6(a) for Building A, and Figure 6(b) for Building B. For Building A, the range of the FUPs was small, indicating a stable model. Apart from the WDEM-based model using a 5-min resolution data, the models showed defined values of FUPs per fixture type (i.e. the mean FUPs are not around the same values). The three models also exhibited the same trend where the kitchen sink with separate taps, WC, and washbasin with mixer taps have the highest FUPs, while the rest of the fixtures are between 0.07 and 0.08. Referencing back to the correlation of the variables in Figure 2, the three aforementioned fixtures also have the highest influence on the SPWF value. As a result, the optimisation algorithm was also more sensitive towards these variables and therefore has a better definition than the other fixtures. Fixture use probability results.
Meanwhile, the results from the different models for Building B exhibited similar trends compared to Building A, which can be attributed to more datapoints and higher flowrate values. For all models, the WC and the washbasin with mixer tap resulted in lower FUP values, which was expected as there are 28 of each for those fixture types in the building, meaning that the probability of one WC being used is lower as there are 28 to choose from. Meanwhile, the other FUPs range between 0.05 and 0.07, with the bottle filling station, coffee vending machine, and kitchen sink showing higher FUPs. It is also important to note that the WC and the washbasin in both buildings have FUP values close to each other, as it is practical that the washbasin is used after using the WC.
The SPWF values found using the average probability values shown in Figure 6 are presented in Figure 7, and are compared with the design flow rates from the CIPHE Design Guide and the BS 8558
14
which are used in the United Kingdom. The solid vertical line indicates the empirical flow rate, with the overestimation percentages annotated at the bottom of the bars. The new SPWF values were also computed using the ANN models, and the original Wistort model and WDEM to demonstrate the accuracy of the developed ANN models. Simultaneous peak water flow at 99th percentile.
For Building A, the results indicate that the overestimation of the SPWFs with the new set of FUPs was significantly reduced compared to the design flow rates from the building codes, with the Wistort-based model at the five-minute resolution providing the best estimation. The new models result in overestimations between 17% to 38%, which are significant improvements to the 141% and 150% overestimation of the CIPHE Design Guide and the BS 8558, respectively.
For Building B, the reduction in overestimation was even more significant as the model performed better with more datapoints and higher flow rates compared to Building A. The Wistort-based model at the five-minute resolution once again performed best with a 3% overestimation, yet the WDEM counterpart underestimated the empirical flow rate by 3%. The new models also showed significantly reduced overestimation against the CIPHE Design Guides and the BS 8558, which overestimated the flow rates by 327% and 393%.
The results were evaluated against the test dataset to confirm that the models had not overfitted the training data. In Figure 7, the dotted lines show how the 99th percentile flow rate from the test dataset compared to the new models, with the corresponding overestimation percentages. For Building A, the test flow rate was lower than the training flow rate, so the overestimation percentages increased to around 40%. Since the peaks in the test data were lower than in the training data, the results from the ANN models did not fit the test data properly. Nonetheless, as this study is more concerned with the peak in water flow rate, and for Building A the training data contained higher measured peaks, the test flow rates have less significance for this case. Meanwhile for Building B, although the test flow rate was higher than that of the training flow rate, the SPWF from the new models still approximated the empirical test flow rate well and the new SPWF values did not underestimate the test flow rate apart from the WDEM at five-minute intervals. The proximity of the 99th percentile flowrates between the training and testing datasets also signifies that the measurements were able to capture good representative peak flow rates.
Finally, additional analysis was performed on Building A, where outliers are considered. Going back to the measured flowrates data shown in Figure 3, there are peaks in the data that are not as regular as in Building B. While these measurements can be considered as true peaks where building occupants may have used the fixtures that are not regularly used such as the dishwasher, they may also represent errors in measurements and be considered outliers. Moreover, having such irregular peaks can heavily influence the optimisation process and skew the learning process towards higher values.
The outlier analysis was performed by looking at the Z-score, which checks data points with high standard deviations. A threshold was then set to filter the outliers and replace them with the median value. Careful selection of the Z-score threshold is important so as not to remove the important peaks, but only the possible outliers. Figure 8 presents the SPWF results after processing outliers with a threshold of seven, which replaced eight datapoints with the median value. As shown in the results, the new models reduced the overestimation of the training and testing flowrates further to between 6% and 21%, which was around 10% less than using the data without outliers processing. Simultaneous peak water flow at 99th Percentile for Building A without outliers.
While smoothing out the outliers provided a more robust model, the risk is that the model will only estimate up to the median values and may not capture the true peaks. Ultimately, a dataset measured for a longer duration will address this limitation and ensure that peaks are captured.
Conclusion
This study presents a novel data-driven-based approach for estimating FUP in SPWF calculations. The aim was to demonstrate the applicability and accuracy of data-driven methods and machine learning in peak water demand estimation. It provides a basis for a fundamental factor of SPWF estimation in buildings, tackling the problem of overestimation due to the different levels of uncertainties caused by more efficient water fixtures and changing water consumption patterns.
Previous and existing methods depended heavily on probabilistic computations, which often lack the flexibility and reliability needed, particularly when relying on occupancy surveys to estimate water use probability. This work extracts knowledge from the data available and flexibly adapts and verifies the existing methodologies for applicability to varying building types and water use behaviour. As demonstrated by the results, the proposed methodology produced models that can closely estimate the SPWF in two non-residential buildings. Accurately estimating SPWF through data-driven derivations of FUPs can significantly contribute to decarbonising water systems in the built environment by enabling more precise sizing of water supply systems. This study highlights the robustness and accuracy of the data-driven method in sizing water systems, offering a reliable alternative to design codes. In addition to enhancing water efficiency, more accurate estimations of the peak water demand can also inform the sizing of other equipment, such as boilers and pumps, thereby improving energy efficiency.
The goal was to produce a methodology that is generalisable, easily reproducible and can be reused for different buildings and regions, such as those with different water per capita consumption or different fixture types. There is potential to reuse the same models for buildings with the same size and type, although there is a need for more measured data to see the benefit of data-driven modelling.
For future work, incorporating occupancy data into the model and applying time-series analysis could further enhance the model and provide deeper insights into estimating the SPWF. A time-series analysis can also provide further insights for demand-response systems in water management. As the data used for this study was recorded during high occupancy periods, the methodology also needs to be tested on building data with longer durations to enable further analysis of the percentiles, so that it can be calibrated against varying occupancy levels and water use. Evaluating the methodology’s potential boundaries is also beneficial and can be achieved by applying the methodology to small residential properties and larger multi-storey buildings. However, the challenge of limited data availability persists in this area of research. Development and implementation of advanced metering of water consumption will help address these limitations, along with collaboration among stakeholders.
Footnotes
Acknowledgments
The authors would like to thank Dr. Sarwar Mohammed, Research Associate, School of Energy, Geoscience, Infrastructure and Society, Heriot-Watt University for sharing the water flow data from his research.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
