Abstract
Introduction:
Animal tests of cosmetic ingredients and products have been banned in the EU since 2013. However, in Japan, the application of new quasi-drugs requires the generation of 24-hour data on primary and cumulative skin irritation by animal testing. Such data are unreliable because an ingredient predicted as nonirritating after short exposure (4 hours), based on the Organization for Economic Co-operation and Development (OECD) test guidelines (TG)404, may cause irritation after a longer application period in human skin irritation tests. With insufficient data to draw conclusions about the irritation potential of an ingredient, there remains a high probability of skin irritation occurrence after extended exposure to the ingredient.
Materials and Methods:
This study assessed whether the skin irritation caused by quasi-drugs and cosmetic products can be evaluated in a step-by-step manner.
Results:
A workflow was developed considering several key steps such as the component characteristics based on physicochemical properties or the ingredient category based on existing information from animal tests and human patch test results, and its utility was assessed using the reconstructed human epidermis (RhE) test (OECD TG439), animal testing, the human patch test, and the human cumulative skin irritation test.
Conclusion:
The RhE test and the aforementioned human skin tests can be employed to evaluate test substances that cause weak or nonskin irritation categorized as “harmless ingredients”—thereby avoiding animal testing.
Introduction
Animal testing of cosmetic ingredients and products was banned in the EU in 2013. 1 In many regions and jurisdictions, risk assessments for the local toxicity of dermally applied substances can be completed without performing any new animal experiments. Several factors need to be considered when using these substances, such as the physicochemical properties of the ingredient, concentration of the ingredient in the formulation, application of in silico tools, and read across or benchmarking the irritancy of the ingredient or formulation using an in vitro test. 2 However, in some cases, regulations still require the generation of animal test data to assess the local toxicity of dermally applied products and/or their ingredients.
Skin irritation is a skin reaction manifesting as erythema, edema, and desquamation, and is caused by the direct contact of substance with the dermal. Skin corrosion refers to the condition wherein reactions occur to the extent of irreversible tissue damage. Furthermore, the irritation reaction that occurs when the irritant initially comes into contact with the skin is called primary skin irritation, and that when the test substance comes into repeated contact is called cumulative skin irritation. To gain manufacturing and marketing approval of quasi-drugs and to revise cosmetic standards, the 24-hour-exposed primary skin irritation test using rabbits or guinea pigs3,4 has been conventionally used as a means for evaluating primary skin irritation in Japan. In addition, cumulative skin irritation tests3,4 using rabbits or guinea pigs have been employed to assess the degree of skin reaction caused by repeated application of a test substance to the skin. Subsequently, the human 24-hour occlusion patch test (human patch test) is currently used to assess primary skin irritation.3,4 Human cumulative skin irritation tests, such as repeated insult patch test (RIPT), repeated open application test (ROAT), and cumulative use evaluation, 3 involve repeated application to the skin for the safety evaluation of cosmetics and cosmetic raw ingredients.
As an alternative to the skin irritation test, in July 2013, the Organization for Economic Co-operation and Development (OECD) published a test guideline on an in vitro test to assess skin irritation using the “reconstructed human epidermis” (RhE) test. The RhE test was adopted, as per the OECD test guidelines (TG) 439, 5 as an alternative to the skin irritation test (OECD TG404). 6 The RhE test involves the application of a test substance to the skin for 4 hours and can identify Category 2 (hazard identification of skin irritant) substances according to the Globally Harmonized System (GHS) of Classification and Labeling of Chemicals of the United Nations (UN) 7 ; however, Category 3 (mild irritation) substances cannot be identified using this method.
Unfortunately, the RhE test was not developed for evaluating skin irritation after a 24-hour exposure, which is required to obtain manufacturing and marketing approval for quasi-drugs and to revise cosmetic standards. An ingredient predicted to be nonirritating after short exposure (4 hours), as per OECD TG404, may become irritating after a longer application period in human skin irritation tests such as the human patch test and the human cumulative skin irritation test. Furthermore, the in vitro method has been reported to be useful for the prediction of the human patch test results 8 ; however, this has not been validated. Moreover, for cumulative skin irritation, an alternative method for animal experiments has not yet been developed. In the absence of sufficient data on the irritation potential of the ingredient, there could be skin irritation after exposure to it.
Considering the above situation, the purpose of this study was to assess whether the skin irritation caused by quasi-drugs and cosmetic products can be evaluated without the use of animal testing.
Materials and Methods
The supporting team of the Japanese Cosmetic Industry Association (JCIA) collected information on 92 test substances, on the data obtained from the RhE test method, the 24-hour primary skin irritation test in animals (animal testing), the human patch test, and/or the human cumulative skin irritation. The protocols are available in Supplementary Appendix SA1. These data were obtained from open source databases including scientific literature, ECHA website (REACH database), cosmetic ingredient review [CIR] website), with the exception of the RhE data generated for three substances (propylene glycol, pentylene glycol, and ethylhexylglycerin). Out of these 92 test substances, 42 substances had both human dermal irritation data and RhE data. These 42 test substances were used for the analyses presented in this study (Supplementary Appendix SA1).
Physicochemical properties and the ingredient category based on the existing information on the test substance
Available information on the skin permeability of the test substance and the ingredient category of skin irritation was collected. Using the information on the component characteristics based on physicochemical properties and the ingredient category based on the existing information from animal testing and the human patch tests, we determined whether the test substance and ingredients were “harmless ingredients” corresponding to class A (component characteristics) or class B (ingredient category) (Fig. 1).

Skin irritation evaluation flow of a test substance by a stepwise approach incorporating human tests. *RhE, reconstructed human epidermis; **RIPT, repeated insult patch test; ROAT, repeated open application test.
Predictability of the tests
We collected the available data on the RhE test, 5 the 24-hour primary skin irritation test in animal testing,3,4,6 the human patch test,9–11 and the human cumulative skin irritation test.12–15
All data, except for RhE data of the three test substances, were taken from the references cited in Table 2.3–6,9–11 Additional data from the RhE test for propylene glycol (CAS No. 57-55-6; FUJIFILM Wako Chemicals, Japan), pentylene glycol (CAS No. 5343-92-0; Symrise K.K., Japan), and ethylhexylglycerin (CAS No. 70445-33-9; Schülke & Mayr GmbH, Germany) as raw materials were acquired for this case study according to the protocol of Labcyte EPI-MODEL 24 SIT in TG439. 16
In Vitro and In Vivo Results for Skin Irritation Testings
The gray highlight indicates not classified at classes A and B, and positive result in RhE or positive reaction in patch test and/or hRIPT results in class B. The enclosed frames address the positive result in RhE or positive reaction in patch test and/or hRIPT results in class B (Supplementary Appendix SA2).
Conc., concentration; hRIPT, human repeated insult patch test; RhE, reconstructed human epidermis.
Results
Physicochemical properties and ingredient category of the test substance based on existing information
Class A: Component characteristics
Skin irritation caused by test substances mainly appears as erythema and edema, which result from penetration of the test substances through the stratum corneum and/or their accumulation in it, which damages the underlying epidermal cell layer. Therefore, it is important to understand the parameters related to skin penetration for the evaluation of skin irritation, and the test substances that do not penetrate the skin or penetrate less effectively are classified as “harmless ingredients.” The test substances that are expected to have extremely low skin permeability fall under either of the following categories 17 : (1) molecular weight of 1000 Da or more, 18 (2) molecular weight of 500 Da or more and an oil–water partition coefficient (log Kow) of −1 or less or 5 or more,19–21 or (3) inorganic powder that exists as a solid in the formulation (e.g., minerals such as mica and talc, and metal oxides such as zinc oxide, titanium oxide, and iron oxide), and powder that is almost insoluble in water and ethanol. 22
Class B: Ingredient category
To identify the ingredient category of weak or nonskin irritants, the skin irritation potential of the test substances was evaluated through animal testing and the human patch test, and a group of test substances that can be predicted to be weak or nonhuman skin irritants were selected (Table 1).23–25 We referred to published literature and reports of international evaluation organizations to decide whether an ingredient was to be included in the safe ingredient category.
Criteria Used to Determine Ingredients Categories
Certain functional groups or chemical structures might be excluded from these categories as explained in the table.
As given in Table 2, the ingredient category was compared with the results obtained from animal testing, the human patch test, and RIPT. If the data indicated a different outcome in animals, an irritant outcome was given priority. The accuracy of the human patch test and RIPT data was 69.4% (25/36 test substances) compared with that of animal testing data. In total, 7 out of 16 skin irritants in animal tests were negative for class A or class B (Table 2), and the false negative ratio was 43.8% (Table 3). However, the accuracy of the human patch test and RIPT data was 97.3% (36/37) and 84.2% (16/19), compared with that of class A or class B, respectively, and no false negative results were obtained for human skin irritation. Given the differences in skin sensitivities occurring among animals and humans, it is evidence that the skin irritation data in animals poorly reflect skin irritation in humans.
Predictive Capacity of “Harmless Ingredient” Compared with Animal Data, Human Patch Data, and Human Repeated Insult Patch Test Data
The classes are shown in Figure 1.
Data are given in Table 2.
Predictability of the tests
To expand the database, the JCIA generated new data on the RhE test for this study. The additional data and classification using the RhE test for propylene glycol, ethylhexylglycerin, and propylene glycol estimation were 36.6% +11.9% (positive), 12.0% +1.1% (positive), and 89.1% +3.4% (negative), respectively, under the acceptable conditions of Labcyte EPI-MODEL 24 SIT. The classification is presented in Table 2.
Reliability of the RhE test for predicting skin irritation in the human patch test
Thirty-seven test substances included both the RhE and human patch test data (Table 2). The accuracy of the RhE test was 89.2% (33/37), and 3 out of 11 test substances showed a false negative result in the RhE test, with a false negative ratio of 27.3% (Table 4). These three test substances were identified as irritants using the human patch test, although they were negative for the RhE test. These test substances were chemicals No. 7 (propylene glycol), No. 13 (lactic acid), and No. 32 (benzyl alcohol) (Table 2). In contrast, only 1 false positive, chemical No. 17 (lauryl alcohol), was found in the RhE test of the 26 nonirritants in the human patch test (false positive ratio: 3.8%). The RhE test is known to yield false results 5 ; its sensitivity is ∼85% for chemical substances 26 and 44% for pesticide preparations. 5 Therefore, a negative result for the RhE test may not always be correct. However, these results may be more acceptable than those of animal testing. Thus, a test substance should ideally be assessed using the human patch test, and the test should be carefully designed and performed under the supervision of a physician, duly considering the possibility of the occurrence of side effects. A test substance may be a skin irritant if it is positive for the RhE test. In this case, human skin irritation test shall not be conducted and other options such as applying a diluted solution of the test substance to the RhE test should be considered in identifying a weak or nonskin irritant.
Predictive Capacity of Reconstructed Human Epidermis Test Compared with Animal Data, Human Patch Data, and Human Repeated Insult Patch Test Data
Data are given in Table 2.
Reliability of the RhE test for predicting skin irritation in animal testing
Out of 42 test substances, 36 included data from the RhE test and animal testing (Table 2). The accuracy of the RhE test was 80.6% (29/36 test substances), compared with that of the irritation score obtained from animal testing. However, 7 out of 16 skin irritants in animals were negative for the RhE test (Table 2), and the false negative ratio was high (43.8%), regardless of the absence of false positives (Table 4). Therefore, even if the RhE test applied for the test substance assesses it as negative, it would not be useful as an alternative to animal testing because it is not aimed at predicting the primary skin irritation caused by the test substances applied for 24 hours.
Reliability of the RhE test for predicting skin irritation in a human cumulative skin irritation test
A total of 19 test substances included data from the RhE test and RIPT as the human cumulative skin irritation test. The accuracy of the RhE test data was 89.5% (17/19) compared with that of RIPT, and two test substances were negative for the RhE test; there were no false negatives for skin irritation in humans out of two test substances showing RIPT skin irritation effects. However, the irritants were of insufficient size in this case study, and additional data are necessary to prove that the RhE test is available for predicting skin irritation in the human cumulative skin irritation test. Two false positives were found in the RhE test, namely chemicals No. 8 (pentylene glycol) and No. 10 (ethyl hexylglycerin), and the false positive ratio was 11.8% (2/17) (Table 4). Therefore, if a test substance is negative for the RhE test, it should be further assessed using the human cumulative skin irritation test. The test should be carefully designed and performed under the supervision of a physician, considering the possibility of side effects. A test substance may be a skin irritant if it is positive for the RhE test, and other options such as applying diluted solutions of a test substance to the RhE test should be considered in identifying a weak or nonskin irritant.
A step-by-step approach for skin irritation testing
The application of this approach is to test substances that may be determined to be harmless based on the physicochemical properties (class A) or the ingredient category based on existing skin irritants (class B). As a result, we determined the utility of the test substance meeting the criteria for class A or class B, as given in Table 3. If the criteria are met, the test substance is categorized as a “harmless ingredient.” Thereafter, it is evaluated using the RhE test to confirm its safety, because many cosmetic ingredients under class A or class B are mixtures and/or many include impurities. However, this test yielded false negative results compared with the human skin irritation tests, as given in Table 4; therefore, further assessment is necessary to confirm whether the test substance is a weak or nonskin irritant, and the test substance was evaluated using the human patch test and the human cumulative skin irritation test. Notably, a test substance that is not classified as “harmless ingredients” and/or yield positive results in the RhE test should not be applied to human skin irritation tests.
However, the skin irritation data in animals were not reflective of skin irritation in human. Therefore, we did not use the data from animal testing in the workflow. This workflow is adequate for assessing ingredients that are safe for human health and may help avoid unnecessary animal testing.
Based on this information, the workflow designed in this study is shown in Figure 1. As a first step, the scope of this workflow applies to a test substance that corresponds to either of the two conditions: (1) whether the test substance is a registered ingredient that has been employed as a quasi-drug for a sufficient period of time (e.g., difference in usage) or (2) whether the test substance contains ingredients with enough in vivo skin irritation data for quasi-drug application.
If one of these conditions is met, it would be checked whether the test substance meets the physicochemical properties or the existing information in Figure 1. If none of these conditions are met, the test substance is considered to be outside of applicability domain of the proposed approach. To understand the hazard of a test substance in skin irritation, either the skin permeability or the category of the test substance is categorized into class A or class B. If either of the properties is acceptable, the test substance undergoes further evaluation, and if both are unacceptable, it is considered to be outside of applicability domain of the proposed approach. At the third step, the information regarding skin irritation in humans is obtained by searching the existing information for the test substance under “harmless ingredients.” A test substance that could not be classified based on the above steps is considered to be outside of applicability domain of the proposed approach in this case study.
Based on the accepted information, the fourth step involves the RhE test. If the test substance is evaluated as negative from the results of the RhE test, an evaluation of human skin irritation testing is possible as the next step. The test substances that are evaluated as “positive” in the RhE test are considered outside of applicability domain. As a fifth step, a human patch test is performed at several doses, with the consideration of the time of use and to confirm the weak or nonskin irritation dose. Subsequently, human cumulative skin irritation can be evaluated at the sixth step. Thereafter, it is possible to proceed for weak or nonskin irritant assessment of test substances applied at several doses, including a dose–response of raw material applied in the RhE tests.
Discussion
In this study, the “harmless ingredients” category, designated based on physicochemical properties or existing information, constitutes one of the key steps of the proposed workflow. Based on this approach, further testing is needed to be able to evaluate the safety of an ingredient. Indeed, a major concern is that many cosmetic ingredients under class A or class B are mixtures and/or may include impurities. For this purpose, the workflow includes the RhE test, which has been validated and approved by the OECD, as a second key step. However, the RhE test adopted by the OECD is used to distinguish test substances of UN GHS Category 2 or higher (hazard identification of skin irritation) when applied to animals for 4 hours. The RhE test adopted by the OECD was not designed to identify Category 3 (mild irritants). Furthermore, several false negatives were obtained with this test method when compared with the 24 hours exposure animal testing, as well as a few false negatives were observed when compared with the human patch test (Table 4). Therefore, if the test substances in the RhE test result in a negative outcome, human skin irritation tests are still needed to confirm the negative outcome. Furthermore, OECD TG439 can be applied to solids, liquids, semisolids, and waxes, regardless of their solubility in water. In addition, gases and aerosols have not been evaluated in the validation test, and thus, they cannot be evaluated using this test.5,27 Even if gases, aerosols, and test substances with high volatility are not categorized as “harmless ingredients,” they are considered to be outside of the application scope of the RhE test.
Taking into account the above considerations, the human patch test and the cumulative skin irritation testing are another key step of the proposed approach. When conducting human skin irritation tests, clinical evaluation guidelines for quasi-drugs (Notice of Chief of Pharmaceuticals Examination and Management Division, Pharmaceutical and Living Hygiene Bureau, Ministry of Health, Labor and Welfare, Pharmaceuticals and Crude Drugs Trial 0413 No. April 1 13, 2017) 28 should be followed after sufficiently confirming that no health hazard, other than skin irritation, will occur in humans during the safety evaluation. In addition, the study design needs to take into account all necessary ethical consideration for the subjects, 9 and the test plan must be reviewed by an ethics review board.
When conducting RIPT or ROAT, it is necessary before the study to (1) ensure the test substance does not pose skin sensitization hazard and (2) obtain approval from the ethics review committee of the institution. 29 When conducting ROAT, it is preferable to define the application and duration of exposure based on the actual use of the test substance. For the evaluation of skin irritation, the test conditions should be defined under the supervision of a dermatologist, or an equivalent evaluation expert to confirm the assessment criteria. Records should contain photographs, and tests should be performed appropriately. For training on skin reaction determination, we refer to the “Patch Test Reaction Atlas.”30,31
The number and range of the tested substances were not sufficient to assess the predictability of the human skin irritation tests. Therefore, in this workflow, the scope of application was limited to “harmless ingredients.” This study was limited to the evaluation of quasi-drug additives and cosmetic ingredients corresponding to “harmless ingredients” because the database of the main quasi-drug ingredients in cosmetic products was insufficient for human skin irritation tests.
Even if the test substance is considered to be outside of the applicability domain of quasi-drug main ingredients, other options may be considered. For example, if the test substance is positive for the RhE test under “harmless ingredients,” additional information, such as data from animal testing or other in vitro tests, is needed. Furthermore, the human skin irritation tests should be carefully designed based on the application information of cosmetic products, including exposure-led risk assessment, 32 and should be conducted under the supervision of a physician, considering the possibility of side effects. Consumers must be aware of the dose of all the ingredients in cosmetic products that are predicted to cause weak or nonskin irritation.
Conclusions
In this study, we developed a step-by-step workflow to evaluate the applicability of skin irritation tests for quasi-drugs and cosmetic products without animal testing. The workflow was limited to “harmless ingredients,” whose component characteristics were based on physicochemical properties or whose ingredient category was based on existing information from animal testing and the human patch test results. We concluded that the RhE test and the human skin irritation tests can be used to evaluate test substances that cause weak or nonskin irritation that have been categorized as “harmless ingredients,” thereby avoiding animal testing.
Footnotes
Acknowledgments
We appreciate the support from the International Cooperation on Cosmetics Regulation (ICCR) Joint Working Group (JWG). We also thank Atsushi Satou, Kaori Sakurada, Hideki Nishiura, Megumi Sakuma, and Takeshi Kanamori of JCIA for supporting the data collection/analysis and generating new RhE data. We thank Editage (
) for English language editing.
Disclaimer
The views expressed in this article are those of the authors and do not necessarily reflect the official views of the Pharmaceuticals and Medical Devices Agency (PMDA).
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This study was supported by a grant-in-aid from the Japan Agency for Medical Research and Development (AMED) under Grant Number JP20mk0101131 and the Ministry of Health, Labor, and Welfare (MHLW), Japan.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
