Abstract
This article presents a data modelling and a semi-automatic data searching method to support cost estimation in the product development process, particularly for low-volume, high-complexity and long-life products typified by defence products and systems. This article covers a literature review in the area of cost estimation in product development, the data sets needed to perform cost estimation, the method of modelling the data and the techniques of supporting cost data searching. The proposed method will be used to support cost estimation of product development decisions for defence electronic products. To compare with the traditional approach, the method has demonstrated that by creating a centralized environment such as the ‘databases and using a data-driven’ approach. The system is made more efficient by reducing the number of processes in carrying out cost estimation, and thus, this provides more information to make an informed concept design decision during the product development process based on the system’s competence of instant cost estimation feedback.
Keywords
Introduction
Accurate and reliable cost estimates require historical data and information to be made available. This is applicable to all types of manufactured products. As pointed out by Bode, 1 cost estimation at the early stage of product development has always been difficult due to the availability of limited attribute information. In addition, accurate and reliable cost estimation can only be obtained at a later stage of the development process when more information and data are presented. This is due to the cost models and systems used that require a large amount of detailed data before a cost calculation could be made. 2
It is common knowledge that in each phase of the development of a product, a company spends money and also incurs future costs. For example, in drugs’ development, the process of developing new drugs is lengthy and costly. As such, a large percentage of the cost in developing new drugs is in research and development. According to DiMasi et al., 3 to produce an estimate of expected cost for a marketed product, the drug development company must allocate the costs of the unsuccessful projects to those that result in a marketed new product. As such, the estimated average out-of-pocket cost per new drug exceeded US$ 400 million in 2003. In the engineering industry, Keys 4 identified that although the design stage only constitutes 5% of the total product cost, the influence on the total life cycle cost is between 75% and 90%. Therefore, the final cost of a product under development is usually an important design attribute, as illustrated in Figure 1. It is thus essential to understand the value of this design attribute as early as possible in the design cycle and preferably in the conceptual design stage. The more the project is advanced, the greater the difficulty of reducing the final cost because of the high costs of modification and change. 5

Cost estimation dilemma in product development. 1
Over the years, many researchers have made considerable effort to tackle the problems of cost estimation at the earliest stage of product development. Some typical approaches and techniques used to support cost estimating decision-making during product development are trade-off analyses, parametric, product family, variant, analytical, learning curve, neural network and life cycle costing. 1,6 –15 However, these methods in general would not be applicable for low-volume, high-complexity (LVHC), long-life products, especially for a brand new design of this kind of product where insufficient data could reduce the statistical significance of the estimation accuracy. A typical example is the multi-function display (MFD) unit that is usually mounted on the instrument binnacle of a fighter cockpit. The cost estimation of this type of product at the development stage is usually hindered by the lack of statistically significant data and the unsuitability of available methods.
This article therefore proposes a new approach of performing cost data modelling coupled with a semi-automatic searching mechanism to address cost estimation at the product development stage for LVHC, long-life products. The modelling method is unique due to dedicated modules that are used to capture relevant data to support the initial product configuration process to generate rapid cost estimation of a new design for LVHC, long-life products. The layout of this article is as follows: section ‘Literature review’ presents the literature review of data modelling and optimization; section ‘Type of data and development of the data library for cost estimation in low-volume products’ describes the data needed and the development of a generic digital data library to support the new approach; section ‘Data sets to support the data searching mechanism’ discusses the data sets and searching mechanism; section ‘Semi-automatic data searching approach’ discusses the methods of data searching; a case study of a multi-display unit is described to verify the approach in section ‘Case study’ and finally in section ‘Conclusions and future work’, the conclusion and further work are presented.
Literature review
Research articles that address cost estimation at the early stage of product development using traditional cost estimation and optimization techniques are summarized as follows. Between 1990 and 1998, there was a lot of interest in performing technology trade-off analysis specifically for electronic systems, 6,7 although these works were focused on manufacturing cost for high-volume rather than low-volume products. Other authors such as Bode 1 compared the performance of neural networks against other conventional cost estimation methods at the early stage of product development. Neural networks are non-parametric techniques that have the ability to be trained and taught to perform accurate estimation with limited attributes that fit non-linear curves. However, the technique relies on the availability of good historical case data, where the predictions are repeatable. Tu et al. 10 presented a cost data index structure with two traditional cost estimation methods, namely, generative and variant cost estimation, for the development of a computer-aided cost estimate and control system in the mass customization of sheet metal products. For cost optimization, an algorithm for the selection of alternative operation routines and suppliers was developed using the dynamic programming technique. However, both the methodologies relied on the availability of good historical knowledge and experience.
Kaufmann et al. 16 proposed an optimization framework to minimize the direct operating cost at a part level based on the cost/weight optimization of composite aircraft structures. The framework was implemented based on the parametric method. Shehab and Abdalla 17 developed an automatic cost estimation system using object-oriented and rule-based methods for decision-making at the early design stage. The cost estimation system was implemented using heuristics data and fuzzy logic techniques. The system allowed users to generate accurate cost estimates for design alternatives by exploring different materials and production processes. Deiab and Al-Ansary 18 developed a systematic multi-phase procedure to optimize the design and manufacturing parameters using the genetic algorithm method. The aim was to minimize the total manufacturing cost under dimensional, weight and machine power constraints. Similar to parametric methods, the approximations are based on past case data for which cost is known. Shen et al. 19 developed a modelling technique for a priori cost and performance estimations for a mixed-signal system. The estimated method was developed based on Rent’s rule. 20
Scanlan et al. 21 describe how cost models cover a range of requirements for input detail, with the most accurate generative detailed models requiring the most effort to implement. There was an attempt to create a detailed, generative model using cost model libraries, but the attempt did not address the availability of sub-models using other cost modelling paradigms, or how they could be fitted together with expressions of uncertainty for the less well-defined models.
A review of commercial cost estimation systems had been conducted by Newnes et al. 22 To understand the current capabilities of commercial cost estimation software, the authors have revisited the literatures of two highly used cost estimation systems in the defence industry; these are PRICE 23 and SEER 24 systems. These two software systems provide useful tools for cost estimating and use historical data to assist in the decision-making process. In particular, they provide accurate costing data for frequent/similar products at the detailed design level. In general, these tools are effective for repeated products that build on a base product model with incremental design changes where much of the detail is available.
All the research articles and commercial systems mentioned above are more applicable to high-volume products, where typically past and historical data are available to support the methodologies. In particular, the research mentioned above did not involve LVHC, long-life products. An additional point is that parametric models in commercial system models are general purpose and derived from typical similar components. Therefore, if some of the initial design is more certain, for instance during product refresh, cost models that are derived for a specific manufacturer or specific component are more accurate. 21 The next section of this article will focus on the data needed in product development of LVHC electronic products and the outline of the modular approach.
Type of data and development of the data library for cost estimation in low-volume products
The focus of this article is to propose a method of utilizing data and rule-based techniques to support the cost estimation of defence electronic systems at the early stages of the product development process. A modular approach of constructing a bill-of-materials (BoM) of a product from a database (as discussed below in ‘Development of the digital data library’) has been used.
Types of data
The BoM of the new product is made up of a set of elements with their associated cost models and cost data from the database. The decision on what elements the model will be constructed from is based on a number of questions, and the answers to these questions may be obtained by implementing, for example, a rule-based ‘production system’. To establish what rules should be used to implement such a system, two further primary questions need to be answered:
Is there an existing cost model or data in the library that could be used for the new product?
How relevant are the data in the library? (The degree of relevance will be used to determine the accuracy of the estimation.)
Only after the above analysis, the elements from the library will be used or be acceptable in a new product configuration. If the elements are not applicable, alternative strategies that could be used should be established, for example by
Constructing the product model from finer detail (e.g. sub-modules of a BoM) or
Choosing a close product family member.
This fundamental aspect is the basis of deriving the data sets that are used to support the data searching method in the cost estimation process.
Development of the digital data library
The steps of developing the Electronic and Mechanical Modular–based Library (EMMbL): 25 (1) outline the structure of the databases, (3) create the main directory to store all relevant data and information, (3) to create subdirectories within the main directory and (4) to create cost elements with the subdirectories. The initial process was to use the Unified Modelling Language (UML) 26 to define the generic data structure of the EMMbL, as shown in Figure 2. The main purpose of the EMMbL is that it is used to capture as much cost-related data and knowledge as possible and makes the data ready to be reused by design/manufacture/cost engineers to perform a cost estimation of a new design at the conceptual stage. As shown in Figure 2, the structure of the EMMbL is classified into a set of domains under the terms of electronics and mechanical modules: electronics and mechanical components, product, process and resource.

UML representation of the EMMbL.
The second step was to create a main directory for the EMMbL and was followed by defining the subdirectories. The final step was to create cost data within the subdirectories, which is illustrated in the examples, as shown in Figure 3. Figure 3(a) and (b) represents the cost data of the video electronic module. Within this module, there are a set of pre-stored cost data sets for videos, such as Video_1, Video_2 and so on, which are directly linked to the performance of the unit. For example, the user can select a video unit based on the resolution requirement or simply select a suitable subsystem based on price.

EMMbL and example cost data.
The EMMbL has included in it two further modules, namely, process and resource. The process module consists of a set of pre-stored cost data sets on assembly and surface mount technology processes. Figure 3(c) and (d) illustrates an example process cost data for the assembly module. The resource module contains cost quantitative information linked to the performance of manufacturing processes, for example, information such as machine’s capability, factories and tooling cost data. The goal of the EMMbL enables designers to configure a product from existing modules’ information. The main applications of these modules allow users to define the relationships between the subsystems and components from the databases. The databases allow designers and cost estimators to perform alternative design concepts of a product in order to meet a customer’s required specification.
Data sets to support the data searching mechanism
A study has been carried out on how industry utilizes data to perform cost estimation. There are five types of data set that have been identified, these are as follows: (1) commercial-off-the-shelf (COTS), (2) parametric, (3) variant, (4) detailed and (5) new and uncertain technology, which could influence and impact the accuracy and confidence levels of the cost estimated. A data searching mechanism (DSM) incorporating these five data sets has been developed using rule-based techniques, 27 as shown in Figure 4.

Rules for data searching mechanism. 25
Several artificial intelligent techniques were tested for the implementation of the DSM, namely, rule-based, case-based, Bayesian net and decision tree. 27 The rule-based reasoning technique was chosen to develop the DSM in this research due to its ease of use, low levels of uncertainty and ease of maintenance. Although the case-based reasoning technique was not selected, it could be represented by one of the ‘variant’ rules since ‘variant’ is capable of being used to derive new cost data from existing products, that is, case-based reasoning. The rules that are used in the DSM are interacted with the following five data sets, which are explained as follows:
COTS – fixed standard cost data from suppliers.
Parametric – cost outputs that are directly dependent on the characteristics or parameters of the product such as weight, volume, length and the number of inputs/outputs. In the case of electronic products, for example, ‘material type’, ‘size of printed circuit board (PCB)’, ‘number of components (resistors and integrated circuits (ICs)) and so on.
Variant – in general, new variants of current products usually involve incremental changes rather than a novel/new design. Thus, variant design applies a ‘product family’ approach. Cost is derived from a mixture of current and non-current products.
Detailed – when parametric data are unavailable, or a more accurate result is required and a number of subsystems exist for further cost analysis, cost data can be obtained using a BoM approach in which the available BoM for the selected design enables historical data to be accessed.
If new and uncertain technology is involved, then the cost can be modelled using the below: Monte Carlo simulation to analyse the uncertainty. This analysis is based on input values from a range of similar products in order to construct a probability distribution. Thus, ‘New Technology against Cost’ characteristic can be obtained. Technology forecasting method: the application of ‘Monte Carlo’ simulation depends on similar technologies and historical data. If there is no information or data available to support ‘Monte Carlo’ simulations, another approach is needed. As illustrated in Figure 4, this research has adapted the ‘technology forecasting’ method to evaluate the impact of this scenario. The technology forecasting technique depends on the ‘technology readiness level’ (TRL).
28
According to Moorhouse,
28
TRL consists of nine levels from ‘basic technology research’ to the technology ready to be ‘launched and operated’. In order to check the impact of emerging technologies, there are three techniques available:
If the emerging technology is in its infancy stage, that is, below level 3, the application of the ‘Delphi method’ 29 should be used to predict the growth pattern of new and uncertain technologies; thus, by ‘quantifying’ the prediction, the data can be used to support a cost model.
If the emerging technology is above level 3, two techniques such as ‘S-curves’ and ‘trend exploration’ can be used as follows:
The method and techniques used to implement the DSM are discussed in the following section.
Semi-automatic data searching approach
Algorithm for cost data searching
Figure 5 illustrates an example of a modular approach in cost estimating of an electronics system. The figure indicates that a system has a number of subsystems, and a subsystem could consist of different kinds of component and assembly operations, and the cost data for each application could be different. It is important that an intelligent method is used to search the availability of the cost data so that the cost can be predicted as accurately as possible.

Example modular approach.
The ‘developed data searching algorithm’ is shown in Figure 6, which is represented as ‘if/then’ statements. The algorithm is a series of steps for performing data searching during design alternatives that are based on a set of rules in the DSM. For example, in order to search what kind of data are needed for the ‘subsystem_2’ (as shown in Figure 5), a search will begin to check the existing data type as follows:
To utilize existing product development data, the designer can perform a ‘variant’ search and this approach can assist designers to determine cost values of a design concept.
If a designer uses COTS, ‘fixed prices’ will be used for a new design’s cost estimation.
The application of the ‘parametric’ technique will be used to derive costs from a cost estimation relationship (CER).
For a ‘detailed/BoM’, then the search will breakdown to finer elements and iterate again to accurately predict the cost.
If the new design is so advanced that there is no match in the databases, that is, this is considered to be ‘uncertain technology’, then the cost estimation needs to run a ‘Monte Carlo simulation’ or run the ‘technology forecasting technique’.

Rule-based searching algorithm.
The implementation of the DSM and the techniques used for its development are discussed in the following section.
Implementation of the DSM
Rule-based techniques 27 were used to develop the DSM algorithm. The data searching utilized a forward chaining method such as a First In, First Out queue processing (also known as data-driven reasoning). 32 As illustrated in Figure 7, the aim of the DSM is for a product designer to search relevant cost information within the databases in design configurations. It is at this stage that the cost of alternative design concepts needs to be determined as accurately as possible to support decision-making in the early product development stage.

Method of supporting cost data searching.
Case study
Figure 8 depicts the validation process of the developed approach. As illustrated in the figure, there are two main processes of the approach: (1) new product configuration and (2) the process of using the developed DSM for searching the associated cost data/models during new product configurations. Both the processes are described in the next sections ‘Product configuration’ and ‘Checking cost data of selected components/subsystems via the DSM’.

Flowchart of the validation processes.
New product configuration
Figure 9 illustrates an example product for the case study. The product is a 5-in MFD unit. The main applications of the MFD are for displaying video, text and graphics. The display head assembly consists of a liquid crystal display (LCD) panel that is driven by a custom-designed interface board and controlled by several temperature and optical sensors. The high output backlight is a cathode fluorescent lamp mounted onto a heat exchanger. The lamp and display are assembled into a rugged machined aluminium frame.

A multi-function display unit.
Figure 10 illustrates the software system’s graphical user interface (GUI), which allows users to perform a new or an existing product configuration. In order to evaluate the cost of a new product, the following procedure is carried out:
The user invokes the EMMbL for selecting the appropriate subsystems/components to perform an initial product configuration and
Uses the DSM to automatically pre-select the cost data of the chosen subsystems/components.

Software system’s GUI.
Figure 11 depicts an example of how an initial concept MFD product is configured from the EMMbL. This figure shows four screens that are linked by arrows (1), (2) and (3). That indicates the information flow based on the modules selected. The configuration process is based on the input and output of each module selected from the EMMbL, as shown in arrow (1). For example, if a display unit requires a maximum power of 50 W, then a power supply of that power rating must be used, as shown in arrow (2). Configuration of a product depends on the ‘input’ and ‘output’ values of the components and subsystems. Arrow (3) shows the completion of an initial product configuration. Once the selection process of an initial product configuration is finished, the next stage is to invoke the DSM mechanism to analyse individual subsystems in order to select the best cost data for the estimation process.

Product concept configuration of an MFD.
Checking cost data of selected components/subsystems via the DSM
A pilot demonstrator of the DSM has been implemented as shown in Figure 12. The DSM functions are as follows:
Step 1:
1. The data inputs are used in conjunction with a database of available models to determine what kind of model will be chosen.
2. The life cycle stage may not be particularly relevant as concept will be the most likely candidate.
3. New product proportion and product knowledge are aimed at the same criterion. The percentage new product is translated to a category (low, medium or high) indicating a level of product knowledge. The translation may be overridden by the user.
Step 2:
4. The database showing the contents of the library is displayed here in a flat form.
5. One module may be chosen, and the choice is used to query for all available models.
Step 3:
6. This is a list of all available models for the chosen library module.
Step 4:
7. Rules are fired using forward chaining; they produce a pruned list of suitable candidates.
8. A priority system then selects the best model from a predefined hierarchical order.

Rules for data searching.
In this case study, the concept stage is being examined, for example, the ‘video_unit’ from the ‘Choose Sub-category’ button, which is directly linked to the EMMbL. The inference engine in the rule-based system will be able to find the best cost model in the EMMbL for the ‘video_unit’. Given a high level of product knowledge, the design engineer will be able to make an informed decision whether to accept the ‘video_unit’ or not. The above example only shows the ‘video_unit module’, the same process will be applied for power supply and the LCD and so on. Once the product configuration process has been finalized, the latest version of the new design can be updated into the EMMbL. The new design can also be retrieved for further modification.
Conclusion and future work
This article has presented and discussed the development of a digital data library. This is a generic digital data library used to support the product configuration process of LVHC, long-life electronic systems, although it is considered that the method can be applied to a variety of other products and high quantity production too.
A new approach in cost data modelling and semi-automatic searching to support cost estimation has been developed. The semi-automatic method has been tested successfully to aid designers to make decisions of selecting suitable subsystems and components from the digital data library during the early product configuration process. The case study has demonstrated that by creating a centralized environment such as the ‘data library and using a data-driven’ approach, different cost modelling paradigms can be used together to create an estimate, depending on the available type of cost data, as described by Scanlan et al. 21 in the literature review. Figure 13 indicates that by comparing with the industrial collaborators’ traditional cost estimation approach for new product development, the proposed centralized system is made more efficient by reducing the number of processes in carrying out cost estimation. In addition, this proposed approach provides more information to make an informed concept design decision during the product development process based on the system’s capabilities of the below:
Storing relevant data, information and rapid cost estimation feedback and
Minimizing data holding from disparate locations.

Comparison of (a) the traditional cost estimation approach and (b) the proposed approach.
Providing the database is sufficiently populated, the approach also enables the enhanced fidelity cost models to be automatically selected at a later stage in design. The approach assumes that software is embedded into the hardware, and thus, the cost was already associated with the selected hardware during the estimation process, for example, COTS. Software cost estimating techniques differ markedly from the hardware and firmware techniques used here except when the parts are priced with embedded software (COTS). There is a case for a hybrid system, but not for this particular test case.
The new and uncertain technologies evaluation has also been defined, which will be implemented as part of the DSM. Further researches to explore are as follows: (1) the utilization of the results from the S-curves and trend exploratory to feed into the cost models, (2) to quantify the expert opinions from the Delphi method into a numerical form so that the result can be used to support the cost estimation process being undertaken, (3) study of obsolescence 33 on how it affects product development costs over a long period of time and (4) the adaptation of the learning curve 34 method as it would be important in LVHC systems as most of the learning occurs early in the process.
Footnotes
Acknowledgements
The authors are grateful for the support of the Industrial Collaborators GE-Aviation and the Defence Equipment and Support (DE&S) Group from the MOD, UK. Furthermore, the authors would also like to express their gratitude to the anonymous reviewers for their timeless comments and suggestions.
Declaration of conflicting interests
The authors declare that there is no conflict of interest.
Funding
The authors would like to thank the following for funding their research activities in this area: the Engineering and Physical Sciences Research Council (EPSRC), UK; the University of Bath’s Innovative Design and Manufacturing Research Centre (IdMRC-GR/R67507/0) and the Innovative Electronics Manufacturing Research Centre (IeMRC-GR/T07459/01).
