Abstract
Animal experiments that are conducted worldwide contribute to significant findings and breakthroughs in the understanding of the underlying mechanisms of various diseases, bringing up appropriate clinical interventions. However, their predictive value is often low, leading to translational failure. Problems like translational failure of animal studies and poorly designed animal experiments lead to loss of animal lives and less translatable data which affect research outcomes ethically and economically. Due to increasing complexities in animal usage with changes in public perception and stringent guidelines, it is becoming difficult to use animals for conducting studies. This review deals with challenges like poor experimental design and ethical concerns and discusses key concepts like sample size, statistics in experimental design, humane endpoints, economic assessment, species difference, housing conditions, and systematic reviews and meta-analyses that are often neglected. If practiced, these strategies can refine the procedures effectively and help translate the outcomes efficiently.
Introduction
Many researchers report issues with the reproducibility of preclinical research.4–8 It has been estimated that irreproducibility of data from pre-clinical research is in the range of 51–89%, contributing to a great impact on the economic aspects of pre-clinical research worldwide.3,7,9 Besides such major issues, ethical consideration of animal experimentation, humane procedures, and sentience of the animals are also important concerns to take care of.10–13 The practice of using animal models of human diseases for drug testing is common practice among biomedical researchers and scientists. Proper experimental design is paramount to good practice and obtaining sound results, and it warrants measure to present unwanted bias, such as allocation concealment, randomization, or blinding of observers, as well as attention to such factors as eligibility criteria (exclusion and inclusion criteria), external validity, internal validity, power, and sample size.14,15 A good experimental design is necessary to justify the ethical argument for carrying out the work as it eventually helps in judging the right number of animals that are needed for the experiment for ensuring reproducible results.15,16 On the other hand, we neglect studies on non-pharmaceutical approaches, like exercise and meditation which may be equally effective for treating depression, but are getting little funding compared to funding for drug research for depression, as a Cochrane Depression, Anxiety and Neurosis Review Group has recently denounced. Also, NIH Director Collins stated that drugs tested on mice have 80–85% chances of failure in toxicity studies in human trials. Also, on average, only 8% of animal models are able to translate further to fruitful intervention in cancer research. 17 With such failure rates, a study also published that 47 out of 53 cancer studies cannot be replicated afterwards even though they are published in esteemed journals marked as “Significant Breakthoughs.” 18 Even under such grave situations, new animal studies are receiving funding to develop more animal models further on despite criticizing, reviewing, and troubleshooting the existing models available. To justify the use of animals in research, first justify the methods and procedures by means of literature surveys, and provide sound grounding for the chosen experimental design taking internal validity and external validity as central aspects, while also refining experiments, taking into consideration issues such as ethics, morals, and sentience. 19 According to Russell and Burch, 11 replacement alternatives refer to the procedures in which one can avoid or replace the use of animals by using inanimate systems, simulated computer programs, or invertebrates which are less susceptible to pain perception than vertebrates; reduction alternatives are the strategies which can minimize the number of animals in an experimental procedure, namely sample size calculations or harm and benefit analysis; and refinement alternatives are the procedures used to modify the surroundings or handling procedures which can enhance the welfare of animals and cause less distress and pain.
The common problems faced by researchers all over the world are experimental design, ethical concerns, animal welfare, statistical analysis, power calculation, sample size, etc.8,10 These are issues that greatly affect the experimental outcomes. 5 A few of the ways forward which can help to resolve these issues are herein discussed so as to provide a better understanding of the use of animals in biomedical research.
Issues in animal research
Economic assessment
According to an estimate from 2010, biomedical research has benefitted from global investment of up to US$240 billion, out of which basic research has been the prime beneficiary. Many of the best research ideas promising translational effects have been failing when it comes to applied research. This has created a bottleneck effect making us question the value of basic research for developing disease prevention and treatment protocols. 18 It takes almost 15 years to take approval of a drug to come to market and the cost of development is nearly $1.3 billion. 17 As Altman stated in his report in 1994, “We need less research, better research, and research done for the right reasons.” 20 Like other science fields, we also need to revise the failed protocols, troubleshoot the problems in hypotheses, and take out a predictive value by systematic review and meta-analysis as tools for creating a working model with reducing economic expenditure and animal lives. 17
Humane endpoint
A refinement procedure, as defined by Morton et al. 21 is “Those methods which avoid, alleviate or minimize the potential pain, distress or other adverse effects suffered by the animals involved, or which enhance animal well-being.” This definition endeavors the practice of humane endpoints and justify their use in experimental design effectively. Humane endpoint as defined by CCAC guidelines: “A humane endpoint can be defined as the point at which an experimental animal’s pain and/or distress can be terminated, minimized, or reduced by actions such as killing the animal humanely, terminating a painful procedure, or providing treatment to relieve pain and/or distress.” 22 If going by definition and implementation, then defining the early endpoints can be a part of good experimental design and planning. 23 Most research proposals submitted to the respective Institutional Animal Ethics Committees (IAEC) under Committee for the Purpose of Control and Supervision on Experiments on Animals (CPCSEA) guidelines in India does not include a description of humane endpoints like other countries. 24 This leads to unjustified animal suffering when animals reach severe stages and are allowed to die from experimental disease. Experiments proposed should hence include humane endpoints, decided as the level of pain or suffering to which animals should not be allowed to exceed. 25 Moreover, experimenting on a suffering or moribund animal will not generate valid experimental results. Researchers should thus emphasize the establishment of humane endpoints while designing the experiment for better outcomes and ethical study design overall. This refinement can thus not only improve the welfare of the animals but might also improve the experimental outcomes. 23
Ashall and Miller 26 have mentioned a perfect way to consider humane endpoints for the study using the endpoint matrix which divides possible humane endpoints into three main categories: scientific endpoints; justifiable endpoints; and unpredicted endpoints. Scientific endpoints are based on the actual outcome achieved after the experiment and hence termination of the study at a given point. Justifiable endpoints are based on the maximum suffering that can be caused to animals in a study as part of the study objective to be achieved after which termination is essential at that point, the so-called humane endpoint. Unpredicted endpoints are mainly based on accidental suffering, which is not covered under the aims and objectives of the study. 26 Keeping in mind such issues, European Directive 2010/63/EU provides examples of procedures with different severities by describing different endpoints applied, based on clinical signs, which include tumor progression. Humane endpoints hence help prevent unnecessary suffering in various animal experiments, improve the validity of the results leading to translational preclinical studies.27,28
Species difference
Lack of understanding about “species difference” is cause of concern. Mice, rat, rabbit, and guinea pigs are commonly used laboratory animals but they are very different from each other. Despite being close to humans in terms of genetic disposition, they might express difference in terms of their pathological conditions, physiological needs, and behavioral patterns. Species differences are due to differences in the quantity and quality of DNA, RNA, and proteins at genetic and molecular levels. 29 But there is more to it than that; species difference is due to evolution, habitat, environmental conditions, geography, and behavior. Hence, researchers should be aware of all the differences between their model of choice and humans since the former can only mimic humans to a certain extent. To understand the differences between species, when developing a specific vertebrate model, it helps to understand the pathophysiology of the disease in the animal with respect to humans. Generally, selection of an animal model is based on the availability or literature available for certain models of disease and not on consideration of human pathogenesis being matched with the model animal used as they can only mimic the changes rather than typically show the exact pathogenesis. This in turn creates an aberration in final outcomes if not given consideration while interpreting the results. 30 Therefore, establishing expected outcomes by analyzing the species difference yields better understanding of the animal model of disease.
The variety of animal strains available nowadays is immense, including those that are genetically altered. Hence, researchers should be able to define their choice of animal strain to suit the particular experiment. For example, in type 2 diabetes research, most reliance is on mice models, and many studies have claimed to show promising results. There is still a high failure rate at clinical levels, which is because of mechanistic differences, such as in human biology the glucose clearance is mostly in muscles whereas clearance in mice is by liver which changes the physiology and pathology drastically. Hence, model selection with the correct species is a prime need. 29 Another study specifying that species difference can change the predictive validity of the experimental outcomes shows how species difference can make a study vulnerable to less translatability. 31 Hence, to reduce the chances one can check the predictive value beforehand by previous literature available. Other methods to reduce the chances are availability of specific strains that minimize the chances of error and increase the chance of getting relevant outcomes out of the desired animal model of disease. Strain specificity plays a critical role in predicting the working of animal models.
Housing conditions
Housing conditions not only affect the behavior of the animals but also the experimental results. Adequate temperature, humidity, and air flow have to be maintained for all the animals in the first place. 32 In animal house facilities, basic requirements are provided but specific needs of each species of animal are hardly taken care of here in India unlike in most countries. Enrichment and refinement procedures can help in reducing the stress of animals in a particular environment.14,32 Enrichment procedures, aimed at providing the animals with an environment which meets their needs, provide them with opportunities to perform their species-specific repertoire and hence cause less stress in the animals which will affect their behavior in a positive way and can be considered a good option. According to the studies, enrichment when given in a mice model of cancer, leads to a significant reduction in tumor weight when compared with standard environment shows an increase in number of COX-2 positive cells leading to elevated inflammatory state of mammary gland. Also in another study fibulin-4 +/– knockout mice when given enrichment have shown less chances of arterial hemorrhage and maintained the integrity of smooth muscle cells and endothelium. 33 Hence, it can be safely assumed that animals can manifest a distorted phenotype because of being housed in captive condition as they do not live in such conditions naturally. This shows that housing conditions play an important role in such studies which otherwise would have shown negative data. This is why it is most often emphasized to maintain proper enrichment and inspection of such small yet much needed factors from time to time.
Sample size and statistics
As discussed earlier, the determination of sample size is a very important aspect of designing an experiment. Most of the studies are designed vaguely on the basis of the literature available without any effort to calculate the sample size. According to a study published by Tsilidis et al., they searched for the use of
Factors that determine sample size
An appropriate sample size generally depends on four study design parameters: (1) minimum expected difference (also known as the effect size); (2) estimated standard deviation; (3) statistical power; and (4) significance criterion. 37
Minimum expected difference
This is the smallest measured difference between comparison groups that the investigator would like the study to detect. The smaller the minimum expected difference, the larger will be the sample size needed to detect it. This parameter can be set based on previous studies or by estimating the magnitude of difference that would be clinically or biologically important.
Estimated measurement standard deviation
This is the expected standard deviation in the measurements made within each comparison group. As the standard deviation increases, the sample size needed to detect the minimum difference increases. Ideally, the variability should be determined on the basis of preliminary data collected from a similar study population. A review of the literature can also provide estimates of this parameter, if a pilot study is not feasible.
Statistical power
This parameter describes the probability that a study would correctly reject a false null hypothesis. When the statistical power increases, sample size also increases. Ideally, one would like the power to be as close to 1 as possible but practically this is not possible since until reaching an 80% power, each animal added adds a lot to the power of the experiment, but from 80% on the curve begins to become shallow and each animal added will contribute considerably less to increase power. Hence, a power of 0.8 or 0.9 is typically considered acceptable.
Significance criterion
This parameter is the maximum
Calculation of sample size
The estimated sample size for comparing the means of the parameter in two groups with the Student’s
where,
Zc and Zp values for common cutoff values.
Minimizing the sample size (number of animals in this context) can be done by taking some precautions in the experimental design. They are: (1) preferring continuous measurements over categorical measurements; (2) acquiring paired data wherever possible; (3) performing one-tailed tests and (4) precise measurements which reduce standard deviation; and (5) using inbred strain of animals for the experiment. By taking care of the abovementioned points while designing the experiment and calculating the sample size, one can optimize the use of animals in the biomedical research.
Systematic review and meta-analysis of literature
Animal models are used in many experiments for understanding mechanisms and etiology of a disease, 38 or to check the safety, efficacy, outcome, and side effects of a new treatment or drug before starting clinical trials.39,40 However, the results from these experiments must be accurate.41,42 Reproducible and consistent results from animal models can provide reliable data of relevance to human medicine. However, if results are biased or imprecise, this might result in exposing humans to unwanted risk in clinical trials. Moreover, experimental animals are subjected to unnecessary suffering when experiments fail to provide meaningful and reliable data without any clinical relevance. 39 Therefore, there must be compelling justification for the use of animals in experiments, also from the translation point of view.
A systematic review is a literature review process focused on answering explicit research questions by identifying, retrieving, and collecting selected data and integrating the results.38,42 This may be followed by a meta-analysis, the statistical method for the compilation and summarization of results and findings of large collection of independent and relevant studies. 43 The effort of combining studies systematically aims to obtain a large body of information, overcoming limitations and inconsistencies of individual studies, and thus provide more accurate information about the outcome.43,44 The first meta-analysis was performed in 1904 by Karl Pearson. Gene Glass coined the term “meta-analysis” to refer to the pooling of findings statistically. Gene Glass suggests that “meta-analysis was created out of the need to extract useful information from the cryptic records of inferential data analyses in the abbreviated reports of research in journals and other printed sources.” 42
Steps in systematic review and meta-analysis
Defining or identifying the research problem is the first step in performing the analysis. Research questions are focused mainly on population / species / strain; intervention / exposure; disease of interest / health problem; and outcome measures.
The criteria should be followed just after defining the objective of the study. It is necessary to define the inclusion and exclusion criteria to avoid selection bias. Inclusion criteria should cover the following: type of study, animal characteristics, interventions, and outcomes. Duplicate articles, reviews, conference papers, commentary, and errata are excluded. Articles are also excluded based on inadequate reporting.
Different databases are searched based on the research question. It is always preferable to search more than one database. Besides electronic databases, other sources such as reference lists of retrieved articles can also be checked to identify relevant studies45–47 and can also be referred for animal filters. The search terms are phrased to cover all potentially relevant articles, combined with various Boolean operators (like “AND” or “OR”).
Based on the inclusion criteria, relevant articles are retrieved by screening of title, abstract, and, where necessary, full text. Judging the work against the inclusion and exclusion criteria is performed independently by two investigators to avoid the selection bias. Dis-agreements or discrepancies are resolved by discussion or by a third investigator.
Several scales are available to improve the quality assessment of the articles. These include the Newcastle-Ottawa Scale (NOS), a method for assessing the quality of non-randomized studies (case-control studies, cohort studies, and time-interrupted series) in meta-analyses. 48 Also, CAMARADES (Collaborative Approach to Meta-Analysis and Review of Animal Data in Experimental Studies) provides supporting framework for the groups involved in the systematic review and meta-analysis of data from the experimental animal studies. 49
Relevant data are extracted from each selected article and it should be concise and focused. Description of study group, size of group, age, gender, diagnoses, treatments, follow-up, ethnicity, methods, etc., should be mentioned. Inconsistencies also need to be described. Data extraction should be performed by the number of investigators and it should be rigorous and reproducible. Discrepancy should be resolved by a third investigator. Guidelines like MOOSE (Meta-analyses of Observational Studies in Epidemiology) and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) can be used for the systematic and complete reporting of the systematic review and meta-analysis.50,51
Data extracted are analyzed by using the following statistical methods: (1) Choice effect size measure; (2) Calculation of an effect size for each comparison; (3) Choice of model: random and fixed effects model; (4) Calculation of a summary effect size; (5) Calculation the heterogeneity and if so, which characteristics, and by which method; (6) Subgroup analysis: influence of factors and the effect of an intervention; and (7) Sensitivity analysis. To note if there is publication bias is an aspect of special consideration.
Description and details of all the studies, results, and quality score must be reported. Graphical display of individual study outcome and overall results should be interpreted. Statistical significance and clinical importance need to be discussed (Figure 1 and Figure 2).

Flow diagram of meta-analysis.

Flowchart of inclusion and exclusion criteria.
Discussion and conclusion
In conclusion, with animal research currently being the backbone of biomedical research, its translational value must be improved as much as possible, for significant scientific breakthroughs in uncovering human diseases and improve healthcare. Using refined study designs, statistically significant sample size, ethically acceptable protocols, and proper humane endpoints in animal experimentation can decide the outcome of the proposed hypothesis and hence refine the research outcome and its reproducibility further on. Another fine strategy to refine and reduce animal number or studies is to use systematic reviews and meta-analyses to deduce the specific problems using already available literature and their chance of success or failure in a model organism. Systematic review and meta-analyses are methods designed to identify and counter the prevalence of bias and discrepancy from individual animal studies. It is essential to pre-outline the aims, objectives, and methodology for performing these techniques. The principle behind performing the analysis is that the identification and data extraction process could be performed by the independent researchers and yield replicate data. Interpretation of the primary review studies is often followed by meta-analyses. 38 Systematic reviews and meta-analyses have provided evidence that methodological error, study design, blinding out assessment, and sample size calculation in pre-clinical trials and animal model studies lead to false treatment effects.38,52 Presumption that animal species predict the human outcome relies on the use of animals as surrogate models for humans. However, bias and conflict of interest make it difficult to confirm the hypothesis and evidence suggests that animal studies are inconsistent in translation to human health;53,54 rather than delivering reliable answers to research questions they are often over interpreted. 55 According to Chan et al., 56 high-quality protocols of systematic review and meta-analyses can lead to transparency, rigorous study implementation, and efficiency of research and external review.
Over 5 million animal studies are available on PubMed out of over an estimated 7 million published.
57
In 2002, why systematic reviews of animal studies were not prevalent was raised in the
Several guidelines are available that improve the reporting in articles. Procedure and study design in all the articles, but also sample size calculation, should be followed accordingly. Animal studies are often small to show the relevance of an outcome, so smaller studies are pooled to increase the power and provide more relevance to significance of outcome. 63 So, the larger the sample size, the smaller the random error is, thus providing more power to the study. Randomization and blinding should be done while designing experiments since if not done the effect size will be overestimated by 21% and 11%, respectively, in both cases. Hence, it is crucial to include both in experimental design. 64 Italian pathologist Pietro Croce argued that “results from animal experiments cannot be applied to humans because of the biological differences between animals and humans and because the results of animal experiments are too dependent on the type of animal model used.” 39 It is proven that translation of animal data to human is very challenging with sufficient fidelity. This translation is affected by numerous factors, such as biological differences between species, internal validity, differences in experimental design between animal studies and clinical trials, insufficient reporting, and publication bias.39,40 Therefore, the rationalized use of animal and sample size discrepancy can be reviewed whereas disparity in translation of animal experiment to clinical trial can be resolved by pooling the inconsistencies in the results and poor sample size of different studies through meta-analysis based on specific questions. If it is considered that the effect and biases are potentially the same, then validation of the signal cannot be proven. Effect-to-bias ratio or signal-to-noise ratio in animal studies affect the predictive values and outcomes. With systematic reviews and meta-analyses, one can retrospectively choose studies with high and low ratios and get significantly closer values for current analysis. 65 It is indirect when results cannot be reproduced under similar conditions, they cannot be expected to be translatable to other species, such as humans. Therefore, it is essential to combine the studies with small and large effect size based on the specified hypothesis to check the pooled results of the independent study, in order to increase the power and the precision. Based on the result of the combined studies, feasibility and translation of animal studies in humans can be further improved.
Similarly, vibration of effect during statistical analysis is another key factor in designing and conducting an animal study. There are many variables which can sway results or expected outcomes (over a range) in a single study. So, at some level biasness is the only way out as ignoring it is not a luxury. Such variables need to be nullified to a workable extent. 65 Hence, by refining some key strategies and training students or researchers with these key concepts in laboratory animal science 66 at the time of designing or proposing a hypothesis before carrying out actual experiments on animals one can help deduce the outcome as accurately as possible and in a refined manner. Hence, researchers should focus on such critical yet often neglected points to refine experimental procedures being used in biomedical research.
Footnotes
Acknowledgements
The authors acknowledge the help extended by Donald Maurice Broom, Vera Baumans, Mohammad Abdulkader Akbarsha, and Anurag Agrawal by way of critical viewpoints about the literature and content. The authors thank Ravisha Rawal for editing the article.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
We would like to acknowledge CSIR funded Project BSC-0403 (Visualisation of Organisms in Action [VISION]) for funding the publication of this article.
