Abstract
Freshwater and marine environments are exposed to small concentrations of many different chemicals produced by industrial, agricultural, pharmaceutical, cosmetic, food, and household applications. Due to concerns regarding potential adverse events from these exposures, regulatory agencies around the world have established aquatic toxicology testing protocols that measure untoward responses in a wide variety of freshwater and marine organisms. Following a literature review of databases on the toxicity of chemicals to fish, the embryonic zebrafish (Danio rerio) database compiled by the Tanguay Laboratory at Oregon State University was determined to be well suited for quantitative structure–activity relationship (QSAR) analysis. This database possesses a number of favorable characteristics including large size (1060 unique US Environmental Protection Agency ToxCast phase 1 and 2 chemical compounds), relatively recent data collected using state-of-the-art methods, 18 simultaneously measured toxicological end points, transparent embryos that develop externally thereby facilitating toxicological evaluation, and the vast majority of the genetic code is expressed and active during early life stages. The molecular parameters calculated for each of the chemicals in the database include the logarithm of the octanol–water partition coefficient, molar volume, and molar refractivity. For each chemical, the availability of these molecular parameter values can facilitate future QSAR studies using any of the 18 different toxicological end points measured as the biological activity of interest.
Introduction
Thousands of chemicals are used in industrial, agricultural, pharmaceutical, cosmetic, food, and household applications. 1 Small amounts of these chemicals find their way into both freshwater and marine environments. 2 These unintended aquatic exposures generate significant regulatory concern from the US Environmental Protection Agency (US EPA); state environmental agencies 3 ; the European Chemicals Agency (ECHA); and equivalent agencies in Canada, 4 Korea, Japan, 5 China, Australia, and others.
The extremely large number of potential aquatic toxicants renders testing each possible contaminant an expensive and time-consuming task. Rather than experimentally test every chemical for its aquatic toxicity in a number of different species, quantitative structure–activity relationship (QSAR) methods make it possible to develop computational models from structurally related subsets of chemicals and predict the activity of other structurally similar chemicals (termed congeners). 6 In the present study, we calculated three important molecular parameters for each of the 1060 chemical compounds in the zebrafish database 7 : the calculated logarithm of the octanol–water partition coefficient (ClogP), 8 McGowan molar volume (MgVol), 9,10 and calculated molar refractivity (CMR). 11 These parameters represent hydrophobic, electronic, and steric effects of a chemical on its biological activity and are extremely useful in developing QSAR models to investigate the quantitative relationship between the biological activity of chemicals and their hydrophobic, electronic, steric, and other physical and chemical characteristics. 6
The assay protocols for the collection of toxicity data suitable for QSAR analysis should be the same or quite similar for each chemical tested. The first large fish toxicity database was developed from the study of the “toxicity of 4,346 chemicals to larval lampreys and fishes.” 12 Although a monumental achievement for its time, this database was published in 1957 and the methodologies employed and the format of data reporting do not represent the current state of the art. Similarly, the US Fish and Wildlife Service conducted 1587 acute toxicity tests on 271 chemicals against 28 species of fish and 30 species of invertebrates during the time period from 1965 to 1978. 13 Again, the test protocols employed do not rise to the present technological standard. Inquiries to two of the largest contract aquatic toxicology testing laboratories determined that although this type of facility has tested hundreds and thousands of different chemicals in vertebrate and invertebrate aquatic species under the Organization for Economic Cooperation and Development (OECD) guidelines, these data are not publicly available in a database format. A literature search on aquatic toxicology led to a large number of studies reporting chemical data sets of varying sizes and quality, with the embryonic zebrafish (Danio rerio) database compiled by the Tanguay Laboratory at Oregon State University found to be particularly well suited for QSAR analysis. 7
This database possesses a number of favorable characteristics: large size (1060 unique US EPA ToxCast phase 1 and 2 compounds), relatively recent data collected using state-of-the-art methods, 18 simultaneously measured toxicological end points, transparent embryos that develop externally thereby facilitating toxicological evaluation, and the vast majority of the genetic code is expressed and active during early life stages. 7 The 18 toxicological end points described in the zebrafish database include effects on the following: mortality, developmental delay, spontaneous movement, notochord, yolk sac edema, body axis, eye defect, snout, jaw, otic vesicle, pericardial edema, brain, somite, pectoral fin, caudal fin, pigment, circulation, and truncated body. 14 As mortality represents the ultimate toxicological end point, the lowest effective level (LEL in micromoles) inducing death of zebrafish is noted for each chemical in the database. The chemical diversity of the 1060 chemicals is also notable in that it includes additives (such as plastics/pesticides/insecticides, etc.), intermediates, reactants, reagents, solvents, surfactants, antioxidant, biocide, dyes, industrial agents, pharmaceuticals, and plastics. Broader chemical classes comprising the 1060 unique US EPA ToxCast phase 1 and 2 compounds are given in Table 1, and the details about the type of additive and class for each of these chemicals are listed in Supplementary Table 1.
Broader chemical classes comprising the 1060 unique US EPA ToxCast phase 1 and 2 compounds.
US EPA: US Environmental Protection Agency.
Methods
Calculation of molecular parameters
We used Bio-Loom (version 1.6; Biobyte Corp., Claremont, CA) 15 to compute three parameters used in our QSAR analysis from the simplified molecular input line entry system representation of each chemical compound: ClogP, CMR, and MgVol. The utility of Bio-Loom for comparative QSAR (C-QSAR) analysis in comparative correlation analysis has been discussed in Hansch and Leo. 6 The parameters used in this study are also discussed in detail in Hansch and Leo. 6 In brief, ClogP is the calculated logarithm of the partition coefficient in octanol/water and is a measure of hydrophobicity (or lipophilicity) of a chemical. 6,8 MgVol is the molar volume calculated by the method of Abraham and McGowan, 9,10,16 and CMR is the calculated molar refractivity (MR) for the whole molecule.
MR is calculated as follows:
Note that the ClogP values are for the neutral form of acids and bases that may be partially ionized. If the degree of ionization is about the same for a set of congeners, the ionization factor can be neglected; otherwise, good correlation can be obtained using electronic terms. 8,17 The correlation between experimental LogP and ClogP values for 13,815 chemicals in the CLOG program, which is a part of Bio-Loom, 15 is 0.98 (Experimental LogP = 1.00 ClogP − 0.03 (n = 13,815, r = 0.98, s = 0.35)). Many programs are used for calculating octanol–water partition coefficients and are reviewed in Mannhold et al. 18 However, we used the ClogP parameter in this study as it has been widely used and cited by the QSAR community, both for environmental studies and for drug design. 19 –29 A very high correlation (r = 0.98) between experimental LogP and ClogP gives confidence in using ClogP values whenever experimental LogP values are not available.
Results
Of the 1060 compounds tested, only 449 compounds were found to cause mortality in zebrafish. The mortality toxicological end point, abbreviated as MORT, spanned several orders of magnitude. To facilitate visual comparison of the MORT end point, MORT was rescaled by its inverse logarithm: pMORT = −log10(MORT). Large values of pMORT represent higher toxicological potency. The number of compounds falling within each pMORT order of magnitude is shown in Figure 1. The number of compounds in each pMORT log10 range were as follows: (−2, −1] 136; (−1, 0] 72; (0, 1] 64; (1, 2] 87; (2, 3] 88; (3, 4] 1. This analysis shows that 208 out of 449 compounds have a pMORT value <0 and may not be very toxic.

Number of compounds causing mortality in zebrafish classified by their mortality (pMORT) logarithmic (10-fold) concentration ranges.
The ClogP values represented by the 449 compounds tested the span of an extremely large range from −6.49 to 10.25. A higher value of ClogP indicates the compound is more lipophilic, that is, more soluble in octanol and less soluble in water. Of the 449 compounds causing mortality in zebrafish, the ClogP values also span a large range as displayed in Figure 2. The ClogP range from 0 to 4 is notable in that 254 (56.6%) of the 449 compounds causing zebrafish mortality fall within this range, and 388 (86.4%) of 449 compounds fall within the ClogP range from 0 to 7. Conversely, only 44 (9.8%) of 449 have ClogP values below 0, that is, are more soluble in water than in octanol. Figure 3 shows pMORT versus ClogP for each of the 449 compounds that caused mortality in zebrafish. This figure shows that for each value of pMORT, there is a wide range of ClogP value and no particular pattern is visible.

Number of compounds with experimentally observed zebrafish mortality for a range of ClogP.

ClogP values for compounds with experimentally observed zebrafish mortality (pMORT = −log10[mortality concentration]).
For the 1060 compounds, CMR ranged from 0.58 to 38.4 and MgVol ranged from 0.2 to 10.6. Figure 4 shows pMORT versus CMR, and Figure 5 shows pMORT versus MgVol. As MgVol is the molar volume and CMR is largely a measure of volume with a small correction for polarizability, it is not surprising that Figures 4 and 5 are visually similar. Note that the extreme values in Figures 4 and 5 are represented by the same compound, tannic acid (CAS Number 1401-55-4). This compound is also listed in Supplementary Table 1.

CMR values for compounds with experimentally observed zebrafish mortality (pMORT = −log10[mortality concentration]).

MgVol values for compounds with experimentally observed zebrafish mortality (pMORT = −log10[mortality concentration]).
Figure 6 shows CMR versus ClogP versus pMORT values for compounds with experimentally observed zebrafish mortality. Figure 7 shows MgVol versus ClogP versus pMORT values for compounds with experimentally observed zebrafish mortality. Given the similarity of the MgVol and CMR values, Figures 6 and 7 appear similar as expected. Figures 6 and 7 display the wide range of ClogP and pMORT values and narrower range of molecular volumes as described by CMR and MgVol. It can be noted that the spread of ClogP versus CMR versus MgVol for each mortality group is almost the same.

CMR versus ClogP versus pMORT values for compounds with experimentally observed zebrafish mortality.

MgVol versus ClogP versus pMORT values for compounds with experimentally observed zebrafish mortality.
Discussion
In 2013, the ECHA incorporated the embryonic zebrafish test into its aquatic toxicology testing scheme and issued the following statement: Based on current knowledge, ECHA considers that OECD TG 236
30
might be used within a weight of evidence approach together with other independent, adequate, relevant and reliable sources of information leading to the conclusion that the substance has or does not have a particular dangerous property.
31
While ECHA and US EPA have considered the embryonic zebrafish as an aquatic toxicology model, some researchers have attempted to model human disease in zebrafish. 33 Although zebrafish display many anatomical similarities to mammals, it should be noted that these vertebrates diverged on the evolutionary tree approximately 445 million years ago. 34 Given the expansive time period since the evolutionary divergence, it is not surprising that there are significant differences between zebrafish and mammals including the following: dependency on external heat (ectothermic); lack of cardiac septa, synovial joints, cancellous (spongy or trabecular) bone, arms and legs, and lungs; 35 –37 and possession of a chorionic membrane that might provide a diffusion barrier to particular chemicals. 38 –41 Despite these differences, a small study on 18 toxic compounds reported that toxicity in zebrafish correlated positively with toxicity in rodents. 42 A number of other studies have examined concordance between zebrafish and mammalian embryo toxicity, showing the concordance percentages of 87% (31 compounds), 43 64% (14 compounds), 44 89% (85 compounds), 45 100% (6 compounds), 46 and 55% (271 compounds). 47 Current efforts by ECHA, US EPA, and NTP should assist in the development of a comparative understanding of the relative sensitivity to chemicals between embryonic zebrafish and other species.
In this study, we analyzed the US EPA phase 1 and 2 ToxCast chemical library consisting of 1060 compounds. 7 This data set is amenable to QSAR analysis as it includes a diverse group of chemicals, including biocides, pharmaceutical drugs, and industrial byproducts (see Table 1 and Supplementary Table 1 for the complete list). We obtained the zebrafish mortality end point of the ToxCast compounds from Truong et al. 7 Of the 1060 compounds, 449 were observed to have an in vivo toxicological effect, that is, mortality of zebrafish. Therefore, we used these 449 compounds for our studies.
QSAR analysis is particularly useful in situations where a large number of chemicals have been evaluated in a given biological experiment using the same test protocol. 6 In this database, 7 1060 different compounds were tested against the same organism (zebrafish) to measure their effect on 18 different toxicological end points (including mortality). Although there are many exceptions, lipophilic chemicals tend to be more toxic on average than hydrophilic chemicals as the body uses phase I and phase II metabolism to convert lipophilic chemicals to water-soluble conjugates capable of being excreted by the kidneys. 48 The lipophilic tendency of the ClogP values for the 449 compounds displaying mortality in zebrafish is evident in Figure 3.
Future work
QSAR analysis is best conducted within families of structurally similar chemicals, that is, congeners. 11 Future QSAR studies using the database given in Supplementary Table 1 will extract the members of a congeneric series based on the presence of a similar chemical backbone and use the molecular parameters (ClogP, CMR, and MgVol) calculated in the present study to develop models that correlate the biological activity in question with particular chemical structural elements of the congeners.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
