Abstract
Human saliva contains relatively abundant proteins that are related ancestrally in sequence to the cystatin superfamily. Most, although not all, members of this superfamily are potent inhibitors of cysteine peptidases. Four related genes have been identified, CST1, 2, 4 and 5, encoding cystatins SN, SA, S, and D, respectively. CST1, 4, and probably CST5 are now known to be expressed in a limited number of other tissues in the body, primarily in exocrine epithelia, and the term SD-type cystatin is more appropriate than ’salivary cystatin’. These genes are co-ordinately regulated in the submandibular gland during post-natal development. The organization of these tissue-specifically-expressed genes in the genome, and their phylogeny, indicate that they evolved from an ancestral housekeeping gene encoding the ubiquitously expressed cystatin C, and are members of a larger protein family. Their relationship to rat cystatin S, a developmentally regulated rodent submandibular gland protein, remains to be established. In this review, the evolution of the SD-type cystatins in the cystatin superfamily, their genomics, expression, and structure-function relationships are examined and compared with known cystatin functions, with the goal of providing clues to their biological roles.
(I) Introduction
The cystatin superfamily is comprised of a large group of ancestrally related proteins, most of which are potent inhibitors of cysteine peptidases (CPs). Human saliva contains relatively abundant proteins that are related ancestrally in sequence to the cystatin superfamily. These proteins are now commonly referred to as ’salivary cystatins’, although this is something of a misnomer (see below). It has been several years since the publication of the last comprehensive reviews that focused on the salivary cystatins ( Bobek and Levine, 1992; Henskens et al., 1996c). Since then, significant advances have been made regarding the gene organization, protein structure, and properties of human and rat proteins. Therefore, a major purpose of this article is to review this recent work. What is the function of the ’salivary cystatins’? Despite a great deal of research effort, we still do not have a definitive answer to this question. A major impediment is the lack of a disease to associate with a specific defect in a gene: Functionality must be inferred from circumstantial evidence. A central premise of this review is that clues to ’salivary cystatin’ function are contained within their phylogenetic history, and that by looking at established or likely functions of their ’siblings’ and ’cousins’, one can assess the likelihood of this function(s) continuing into the ’salivary cystatins’. A comprehensive review of the large cystatin superfamily is beyond the scope of a single article. This review will aim to outline the main branches of the superfamily and the common elements of their structure and function, but will pay closest attention to the nearest relatives of the ’salivary cystatins’.
(1) Background
Scission of peptide bonds is an essential reaction in living cells, and there is a considerable number of peptidases that catalyze this reaction with various degrees of specificity. CPs use a cysteine residue in the active site as the nucleophile in the reaction (see Dickinson, 2002, and references therein). Release of proteolytic enzymes outside of their normal compartment has the potential to cause serious degradation and pathology—an effect taken advantage of by pathogenic organisms from a wide range of phyla. Therefore, control of proteolytic enzyme activity is mandatory. Building upon a very early report (see Barrett et al., 1986, for a review of this earlier literature) of a bovine trypsin inhibitor in chicken egg white, a search for egg white inhibitors of plant proteinases (the CPs ficin and papain) identified a relatively small (12.7 kDa) protein that was a potent inhibitor (Ki = 10 nM). The term cystatin was coined for this protein, which was shown also to inhibit mammalian CPs of the papain superfamily, including cathepsins B, C, H, and L. Cystatins form 1:1 reversible complexes with CPs, in competition with the substrate, but in some cases the binding is so tight as to be physiologically irreversible.
(2) The cystatin superfamily
Examination of mammalian tissues and serum revealed several CP inhibitors ranging in size from 11 kDa to 175 kDa. Sequence analysis of the smaller inhibitors from tissues showed that they were related to each other, and to egg white cystatin (reviewed in Barrett et al., 1986). One of these smaller proteins was identical to human γ-trace, a 13,260-Da basic protein first described as a microprotein constituent of normal cerebrospinal fluid, and of urine from patients with renal failure. The protein was named cystatin C. Thus, historically, the cystatins have been defined as proteins with a particular sequence (and hence structural) motif that bind tightly, but reversibly, to CPs, forming an enzymatically inactive complex. Proteins that share a certain level of similarity (e.g., 50%) along with other traits, such as similarity in function or expression pattern, are considered to belong to the same family, and related families are grouped into a superfamily. Underlying this grouping is the concept of descent from a common ancestor, and genes or proteins that are related through common ancestry are described as homologous. Orthologs are formed by speciation, paralogs by duplication.
Alignment of cystatin sequences identifies three regions conserved during more than one billion years of evolution: a glycine residue in the N-terminal region (G11 in human cystatin C numbering), a QXVXG motif in one hairpin loop, and a PW motif in a second (see Figs. 1, 2 ). These regions form a surface on the cystatin molecule that can dock with the substrate binding site of family C1 (papain-like) enzymes (see below). Four main cystatin families have been distinguished, but over the past several years (according to the criterion of sequence similarity), the cystatin superfamily has been greatly expanded.
(a) Type 2 cystatins
(i) Chicken cystatin and cystatin C
Chicken egg white cystatin (hereafter referred to as chicken cystatin) is the prototypical type 2 family member. Human cystatin C is an ortholog. All vertebrates examined (fish, bird, and mammal) have been shown to have a cystatin C-like protein (e.g., Tu et al., 1992; Yamashita and Konagaya, 1996). Type 2 cystatins are typically about 120-125 residues long and contain two disulfide bonds. They are translated with a secretory peptide leader sequence, and so are generally considered to be extracellular. However, there is increasing evidence for intracellular functions (see below). In this review, cystatin residues will be numbered according to the human cystatin C sequence, where a conserved N-terminal glycine (see below) is designated as residue 11 (Fig. 1 ).
The three-dimensional structure of chicken cystatin, as determined by x-ray crystallographic analysis ( Bode et al., 1988), is shown in Fig. 2 . The structure in solution has also been determined by NMR ( Dieckmann et al., 1993). The main feature of the structure is a five-stranded β-sheet wrapped around a five-turn α-helix. The N-terminal 10 residues are disordered and flexible. The conserved G11 is present at the N-terminal of the first short β-strand A, which is connected to the five-turn α-helix 1. A match to the conserved QXVXG motif (QLVSG; see below) is found in a hairpin loop between β-strands B and C. In the crystal, β-strand C is connected to strand D via a short second α-helix 2. In solution, this α-helix is not detected by NMR, and this region forms a loop that lacks secondary structure. This α-helix 2/loop region is anchored by the first disulfide bond. The conserved PW motif is located in a hairpin loop between β-strands D and E, which are linked by the second disulfide bond. The three conserved regions form a wedge-shaped edge complementary to the CP active site. The side-chains of N-terminal residues R8, L9, and V10 in human cystatin C interact with the S4, S3, and S2 substrate-binding subpockets, respectively, of the target enzyme. G11A12 form a wedge-shaped edge complementary to the active site cleft of papain, with G11 in the S1 site. However, this bond would be in an inappropriate conformation and too far away from the reactive site cysteine to be cleaved. Both hairpin loops make major binding interactions with residues in the vicinity of the reactive site cysteine. The three domains interact with the target enzyme fairly independently ( Hall et al., 1993, 1995), and binding of cystatins to papain can occur with little, if any, conformational changes in either protein ( Bode et al., 1988). Cystatin binding to cathepsin B is more complex, involving a two-step mechanism. This CP has a loop that occludes the active site. An initial weak interaction, most likely involving the N-terminal region, is followed by a conformational change in which the inhibitor displaces the occluding loop, allowing tight binding to occur ( Pavlova et al., 2000). Cystatins vary considerably in their ability to displace the loop.
(ii) ’Salivary cystatins’
(a) Human
Early studies aimed at characterizing human salivary proteins by purification, followed by peptide sequencing, established that whole saliva contains several closely related proteins with homology to cystatin C, and having activity as cysteine peptidase inhibitors (see Bobek and Levine, 1992; Henskens et al., 1996c, for reviews of this earlier literature). Three proteins with similar sequences (and related isoforms; see below) were identified, now named cystatins S, SA, and SN. These three proteins, and a more distantly related cystatin D, have been cDNA-cloned ( Al-Hashimi et al., 1988; Frieje et al., 1993a; Bobek et al., 1991). The cDNA sequences are about 740 bp long. Comparison of the encoded pre-proproteins with the purified salivary proteins demonstrates a typical secretory signal peptide of 20 residues, followed by an 11-residue extension N-terminal to the conserved G11 (Fig. 3 ). At the protein level, cystatins S, SA, and SN have about 88% identity (i.e., they are very similar), while cystatin D has 60% or less similarity. The structures of cystatins S, SA, and SN have been modeled based on the structure of chicken cystatin ( Bell et al., 1997) (see Fig. 2 ). Overall, the predicted structures of cystatins SA and SN were more similar to that of chicken cystatin than that of cystatin S, consistent with the different (generally poorer) inhibitory properties of cystatin S (see below). For reasons explained below, cystatins S, SA, and SN will be referred to as the S-like cystatins, and with cystatin D the four human proteins will be referred to as the SD-type cystatins.
(b) Rodent
Rat cystatin S was initially identified as a protein called LM (large, mobile) that was conspicuously induced in saliva following isoproterenol (IPR) treatment (see below), and shown to be a cystatin following cDNA cloning and sequencing ( Shaw et al., 1988). It shows only 40-50% similarity to the S-like cystatins (Fig. 1 ; see below).
(c) Other salivary cystatins
Surprisingly, there are no published reports of murine salivary cystatins. Snake venom glands are modified salivary glands, and a subfamily of type 2 cystatins has been isolated from snake venom. These proteins have a six-residue insertion (compared with cystatin C) in the α-helix 2/loop (Fig. 1 ). A cystatin from snake venom was shown to be a good inhibitor of papain and cathepsins B, L, and S ( Brillard-Bourdet et al., 1998).
(iii) Other type 2 cystatins
A novel human cystatin was independently identified by expressed sequence tag (EST) sequencing of human amniotic and fetal skin epithelial cell cDNA libraries (cystatin E; Ni et al., 1997), and as a down-regulated gene in human breast metastatic tumor cells, as compared with primary tumors (cystatin M; Sotiropoulou et al., 1997). Cystatin E/M is a secreted protein with a relatively low similarity to other human type 2 cystatins (26-34% amino acid identity), and a five-residue insertion in the α-helix 2/loop (Fig. 1 ). Cystatin F, also called leukocystatin and CMAP, was initially independently identified by EST screening of human dendritic cells ( Halfon et al., 1998), and of human cDNA libraries ( Ni et al., 1998), and as a metastasis-associated gene identified by differential display in murine carcinoma cells that showed a high rate of metastasis to the liver ( Morita et al., 1999). The human and mouse cDNA sequences predict a secreted protein with a large 17-residue N-terminal extension before the conserved G residue (Fig. 3 ). Cystatin F is also unusual in possessing a third disulfide bond that anchors the N-terminal to the body of the protein, and a positively charged residue in the normally hydrophobic QXVXG motif.
Although numerous type 2 cystatins have been described from vertebrates, they are not confined to this subphylum. A cystatin has been isolated from the hemocytes of an invertebrate—the horseshoe crab Tachypleus tridentatus (reviewed in Iwanaga et al., 1998). Tachypleus cystatin is a fairly standard type 2 protein: It is secreted, and has the two appropriately positioned disulfide bonds (Fig. 1 ). The protein is a potent CP inhibitor.
(iv) Generation of diversity in mammalian type 2 cystatins
At the level of the gene, diversity is generated by the existence of multigene families (see below) encoding the different proteins introduced above, and by polymorphisms affecting the coding sequence. Only a limited number of polymorphisms in human type 2 cystatins have been identified. Far more diversity in the secreted proteins is generated by several post-translational modifications.
(a) Polymorphisms
There are two common haplotypes of CST3, designated A and B, that differ at three sites. Two base-pair differences are localized to the promoter region, and one in the signal peptide domain that causes an A→T substitution. However, the secreted protein produced by either haplotype is the same. A mutation with the substitution L68Q has been shown to cause the rare autosomal-dominant disease, hereditary cerebral hemorrhage with amyloidosis, Icelandic type (see below). A limited number of polymorphisms affecting the coding sequence of human SD-type cystatins have been found. Two alleles at the CST2 locus (encoding cystatin SA) are known ( Shintani et al., 1994; Saitoh et al., 1998; Haga and Minaguchi, 1999). CST2*1 and CST*2 differ by two point mutations: a G→A transition in exon 2, and an A→T transversion in exon 3. These produce *16*2 substitutions of G59→D59 and E120→D120 in the corresponding SA1 and SA2 proteins. The first substitution is in the QXVXG motif (SA1 QIVGG→SA2 QIVDG). Recombinant SA1 and SA2 differ in their inhibitory activities ( Saitoh et al., 1998). A T/C transition in exon 1 of the CST5 gene produces alleles encoding C25 or R25 (cystatin C numbering) with comparable frequency (0.55, 0.45, respectively) in the population ( Balbin et al., 1993). Both forms have similar Ki values toward CPs. This would be expected, since the variation is in α-helix 1 on the side of the molecule opposite the inhibitory surface ( Balbin et al., 1994).
(b) Glycosylation
Type 2 cystatins are generally described as non-glycosylated. However, there are numerous exceptions: For example, while human cystatin C lacks N-glycosylation sites, about 20% of rat cystatin C is N-glycosylated at a consensus NLT site in the α-helix 2/loop region ( Esnard et al., 1990). Cystatin F has two functional N-glycosylation sites. In cystatin E/M, a functional N-glycosylation site is located at N108, adjacent to the conserved PW motif, and about 30-40% of the protein released by cultured mammary cells is glycosylated ( Ni et al., 1997; Sotiropoulou et al., 1997). The possible effect of glycosylation on CP binding is unknown.
(c) Phosphorylation
In chicken cystatin, but not human cystatin C, a phosphorylated site (S82) is present in the α-helix 2/loop. Both human cystatin S and SA are partially phosphorylated ( Isemura et al., 1991; Lamkin et al., 1991; Ramasubbu et al., 1991; Shintani et al., 1994), whereas cystatin SN is not. Various forms of cystatin S have been purified from saliva, with S(N-terminal), S2, S98, S111, and S114 being identified as phosphorylation sites. S2 and S98 conform to a Golgi kinase site [SXE/S(PO4)], and phosphorylation of S2 would make S(N-terminal) a consensus site. A consensus site is also present at S98 in cystatin SN. S111 conforms to a casein kinase 2 site [S/TXXD/E/S(PO4)]. The effects of dephosphorylation on inhibition have not been examined in detail. Two forms of cystatin S were found in human nasal and broncho-alveolar lavage fluid that likely resulted from phosphorylation of a portion of the protein in the N-terminal ( Lindahl et al., 1999). Interestingly, the relative proportions of the two forms differed between the secretions, and the level of the presumptive phosphorylated form was decreased 10-fold in the broncho-alveolar lavage fluid from smokers. The physiological significance of this finding is unknown.
(d) N-terminal processing
The predicted N-terminal sequences of SD-type cystatins extend an additional 11 residues beyond the conserved glycine (Fig. 3 ). However, isolated proteins are frequently shorter, and N-terminal processing is a common feature of type 2 cystatins that has been observed, for example, in chicken, rat, and human cystatin C ( Esnard et al., 1990; Popovic et al., 1990; Turk and Bode, 1991), human salivary cystatins (reviewed in Saitoh and Isemura, 1993; Baron et al., 1999a), and rat cystatin S ( Nishiura et al., 1991). N-terminal processing differentially affects the inhibitory properties of cystatins (see below). Processing of cystatin C also affects its interaction with neutrophils (see below).
(b) Type 1 cystatins (the stefins)
The type 1 cystatins (also known as stefins) are CP inhibitors from vertebrates. Humans have 2 related proteins, often called stefins A and B. They are about 100 residues in length, and differ from type 2 cystatins in several respects: They lack disulfide bonds and the α-helix 2/loop, there is a kink in α-helix 1, and they have a nine-residue C-terminal extension. The type 1 cystatin genes have different intron positions in comparison with the type 2 cystatins, and they do not encode a secretory peptide signal sequence. Thus, the type 1 cystatins are generally considered to be cytoplasmic proteins, although it does appear that they might be secreted, or at least released, under certain circumstances (reviewed in Turk and Bode, 1991).
(c) Type 3 cystatins
This family consists of the kininogens (reviewed in Turk and Bode, 1991). There are three types: high- and low-molecular-weight kininogen, and T-kininogen (also known as major acute-phase protein). They consist of three tandemly arranged type 2 cystatin domains, followed by a kinin fragment, but differ in their C-termini. They all have the ability to inhibit CPs.
(d) Type 4 cystatins
The fetuins are a small family of abundant mammalian fetal serum and bone glycoproteins (reviewed in Brown et al., 1992; Brown and Dziegielewska, 1997). Fetuin was the name used for the protein from animals in the order Artiodactyla, while α2-Heremans Schmid glycoprotein (α2-HS glycoprotein, α2-HS) and histidine-rich glycoprotein (HRG) (also called histidine-proline-rich glycoprotein) referred to two related human proteins. The fetuins are N- and O-glycosylated and phosphorylated. The N-terminal region consists of two tandem type 2 cystatin domains, followed by a C-terminal region comprised of a histidine-rich domain between two proline-rich domains. The N- and C-terminal regions are linked by a disulfide bond. α2-HS has a structure similar to that of HRG, except that it lacks the histidine-rich tandem repeat. Unlike the kininogen cystatin domains, those of the fetuins lack detectable CP inhibitor activity. Consistent with this, they lack the conserved G, QXVXG, and PW motifs. Orthologs have been found in snake venom, where they act as anti-hemorrhagic factors. Remarkably, although they lack CP inhibitor activity like other type 4 cystatins, they are metalloproteinase inhibitors ( Valente et al., 2001). This adds a new layer of complexity to the cystatin superfamily.
(e) Unassigned proteins
(i) CRP/CRES/testatin
Human and rodent genomes contain several related genes comprising a small multigene family encoding secreted glycoproteins with sequence similarity to type 2 cystatins. They have been given a variety of names, including testatin ( Eriksson et al., 2002), cystatin-related peptide (CRP; Aumuller et al., 1995), cystatin-related epididymal spermatogenic protein (CRES; Cornwall et al., 1999), and cystatin T ( Shoemaker et al., 2000). However, although they possess the PW conserved motif, they lack the conserved N-terminal glycine and consensus QXVXG motif generally required for inhibition of CPs (see below). Thus far, the function of these proteins is unknown.
(ii) ’Atypical’ cystatins from other phyla
Subsequent to the discovery of the mammalian cystatins, small (ca. 100-residue) CP inhibitors were discovered in rice and were called oryzacystatins (reviewed in Arai et al., 1996). Similar proteins have since been identified in a large number of plant species, and are collectively termed phytocystatins. Some are secreted. Although clearly homologous to vertebrate cystatins, they are structurally distinct: Like the type 1 cystatins, they lack the α-helix 2/loop and both disulfide bonds (Figs. 1, 2 ). However, unlike the type 1 cystatins, the phytocystatins generally have the typical C-terminal PW motif, and they lack the C-terminal extension ( Margis et al., 1998). The three-dimensional structure of oryzacystatin I is closer to that of chicken cystatin than stefin A ( Nagata et al., 2000). A variety of cystatins has been identified, primarily by cloning, in invertebrates, such as insects (e.g., Drosophila melanogaster; Delbridge and Kelly, 1990), or nematodes (e.g., Caenorhabditis elegans; see Fig. 1 ). Although they possess the typical cystatin conserved motifs involved in inhibition of CPs, they do not fit the classification of families described above—e.g., the Drosophila cystatin, like the phytocystatins, lacks the α-helix 2/loop and associated disulfide bond, but it does have the C-terminal disulfide bond; conversely, many nematode cystatins have the α-helix 2/loop and associated disulfide bond, but lack the C-terminal bond (Figs. 1, 2 ; Dickinson, unpublished observations; see Maizels et al., 2001, for other examples).
(iii) Other proteins
A novel cystatin-related protein has been isolated from bovine cortical bone and cloned ( Hu et al., 1995). It is a 24-kDa secreted phosphoprotein with a 107-residue N-terminal region that shows closest similarity to the cystatin domains of kininogens, followed by a short phosphorylated serine-rich domain. However, the N-terminal region also showed significant matches to members of the cathelicidin family, a class of mammalian myeloid cell antimicrobial peptides (reviewed in Zanetti et al., 1997). Cathelicidins are characterized by a conserved N-terminal proregion with closest similarity to the kininogen cystatin domains, followed by a highly divergent cationic antimicrobial C-terminal region that is released by proteolysis. A common feature of antimicrobial peptides is a high content of basic residues that facilitate binding to the cell, and a tendency to adopt an amphipathic conformation that mediates membrane disruption. At least some cathelicidins are CP inhibitors, albeit rather poor ones. Thus, the cathelicidins appear to be another branch of the cystatin superfamily. The CP cathepsin F has been cloned, and the zymogen has been shown to have a large N-terminal extension ( Nagler et al., 1999; reviewed in Dickinson, 2002). The first segment of this extension was predicted to fold into a cystatin-like structure with two disulfide bonds. The predicted structure would lack the α-helix 2/loop, which would be replaced by a smaller loop, and also lacks the consensus QXVXG and PW motifs. The properties of this region remain to be characterized.
Based on sequence comparisons, the MHC class II invariant chain protein Ii p41 isoform, a CP inhibitor, was at one time thought to be a member of the cystatin superfamily (reviewed in Brown and Dziegielewska, 1997). The structure is now known to be quite different, and Ii represents an example of convergent evolution (reviewed in Riese and Chapman, 2000). The lipocalins are a large and diverse family of proteins found in fluids such as saliva and tears. They are thought to be involved in binding lipophilic molecules. Their structure is completely different from that of cystatins, but a lipocalin from human Von Ebner’s gland was shown to be a CP inhibitor, apparently another example of convergent evolution (van’t Hof et al., 1997). The serpins, normally considered to be serine proteinase inhibitors, can also inhibit certain CPs, and this property may reflect the similarities in the catalytic mechanisms and active sites of the enzymes.
(3) The place of salivary cystatins in the cystatin superfamily
It is reasonable to assume that the function(s) of the SD-type cystatins are the result of selection. While cases of complete shifts in the function of proteins are known (e.g., the recruitment of proteins as lens proteins), a more common trend in multigene families is specialization by modification of an existing function (e.g., the globins). Thus, clues to the function of SD-type cystatins may be present in the phylogeny of cystatins.
A statistically significant level of similarity between two or more proteins implies common ancestry, and, in principle, similarity can be used to organize homologous genes or proteins into a phylogenetic tree that describes their evolutionary relationships. Various hierarchical terms (superfamily, family, sub-family, group) are used to describe related groups. The choice of similarity level required to group proteins into a family (or any other level) is somewhat discretionary, and also depends upon the proteins under consideration. In general, the term family is the most commonly used, usually to denote a group of proteins that show a clear similarity in sequence to each other (e.g., > 30% residue identity, a BLAST score of < 10-4, etc.) along most of their lengths (or for matching domains of chimeric proteins). Strictly, a family of proteins should be a monophyletic group—that is, the members share a most recent common ancestor that is not also an ancestor of one or more proteins not included in the group. Methods for constructing phylogenetic trees rely on various assumptions, and no single method is without weaknesses. Phylogenetic analysis of the cystatins is problematic, because the proteins are rather small, and the origins of the different families and subfamilies are ancient, so there has been extensive divergence. Further, different branches appear to have evolved at different rates.
Over the years, several evolutionary studies and phylogenetic trees of the cystatin superfamily have been reported (e.g., Rawlings and Barrett, 1990; Saitoh and Isemura, 1993; Brown and Dziegielewska, 1997; Brillard-Bourdet et al., 1998; Ni et al., 1998; Cornwall et al., 1999; Margis et al., 1998), although many did not include a statistical analysis of significance for phylogenetic trees. Several schemes have been proposed for the evolution of the different families (reviewed in Brown and Dziegielewska, 1997). A plausible model for the evolution of type 2 cystatins is as follows: Plants and animals diverged from a common ancestor that possessed a cystatin, perhaps around 1.6 billion years ago (BYa). Modern phytocystatins potentially represent this ancestral form of all types of cystatins. The α-helix 2/loop and first disulfide bond were acquired prior to the divergence of the nematodes (about 1.2 BYa). Since a type 2 cystatin is present in the horseshoe crab, the second disulfide bond and the general features of the type 2 cystatins must have evolved prior to the divergence of protostomes and deuterostomes about 1 BYa. Thus, regardless of the particular dates of these events (there is considerable argument), the type 2 cystatins are ancient proteins that have diversified further in the mammalian lineage. This model would suggest that the insect and type 1 cystatins are secondarily derived.
A typical phylogenetic tree for selected vertebrate type 2 cystatins is shown in Fig. 4 . Reliability of the branches was assessed by bootstrapping, which is very conservative. The human S-like cystatins group with 100% confidence in this analysis, although their order of evolution was not estimated reliably. Importantly, cystatin D forms a monophyletic clade with the S-like cystatins with quite good confidence (84%). Consistent with this, cystatins D and SA have essentially identical expression patterns. Thus, there is no reason to separate cystatin D and the S-like cystatins into separate subfamilies. It is proposed that they be collectively referred to as the SD-type cystatins. Consistent with several other analyses, this tree shows the SD-type cystatins evolving from a common cystatin C-like ancestor. Analyses with more sequences place the time of this divergence around the time of the mammalian radiation (about 100 million years ago [MYa]), but do not reliably determine if it occurred prior to the start of the radiation (leading to SD-type cystatins in all species) or after (in the limit, confining the SD-type cystatins to the primate lineage) ( Margis et al., 1998; Dickinson, unpublished observations). In this tree, the rat cystatin S branch is not positioned with confidence, but it does not group with the SD-type cystatins, implying an independent origin. However, analyses with additional sequences can group it with the SD-type cystatins (Dickinson, unpublished observations). Thus, at this time it is not clear whether the rat cystatin S is a highly divergent ortholog of the human proteins, or a case of independent evolution. Similarly, the position of the snake venom cystatins cannot yet be estimated with any confidence.
(II) Papain-like CP Inhibitory Activities of Human Type 2 Cystatins
Determination of the equilibrium dissociation constant (Ki = koff/kon) provides a measure of the affinity of an inhibitor for an enzyme. In most cases, cystatins bind so tightly that Ki cannot be measured directly, so it is determined as the ratio of the separately measured rate constants, or as the inhibition constant Ki, obtained by measurement of the substrate-dependent apparent inhibition constant Ki,app, followed by a mathematical correction for the presence of the inhibitor (see Bieth, 1995, and references therein). Differences in reported values depend on which constant is being measured, and the experimental and mathematical methods used. To be effective, an inhibitor must be able to bind a high proportion of the target peptidase. This is determined by the ratio of the inhibitor concentration/Ki, which must be > 10 for good efficiency. Complex dissociation is a first-order process. The half-life of a complex depends only on koff, and the inhibitor concentration can be ignored. Thus, high values of Ki may mean that the enzyme can be released in a short time to interact with competing substrate, and preclude a physiological role for an inhibitor with the corresponding enzyme. For a tight-binding inhibitor, pseudo-irreversible inhibition will occur in vivo at inhibitor concentrations > 103 Ki, and > 10 [enzyme].
Inhibition constants and measured concentrations of human cystatins in various fluids are summarized in the Table . It can be seen that, in general, the SD-type cystatins are poorer inhibitors of papain-like CPs than cystatin C. Due to their high concentrations in saliva and tears, S-like cystatins have the potential to inhibit many enzymes, although inhibition in many cases will not be pseudo-irreversible. There is a considerable range in concentration of cystatin levels in saliva in the periodontally healthy population (e.g., Aguirre et al., 1992; Henskens et al., 1994; Baron et al., 1999c). Although it has not been rigorously examined, cystatin and total protein levels in any single individual seem to be fairly constant, at least over a period of a few weeks ( Rudney et al., 1993; Henskens et al., 1994). Since different assay techniques (e.g., immunoassay, papain inhibition, mRNA levels) give the same large population variance, it would appear to reflect true differences in glandular production of cystatins. This could be due to variation in total protein synthesis activity of the glands, or a cystatin-specific variation. Either might be subject to genetic control, or to long-term physiological modulation in response to factors such as diet or oral disease. Total salivary protein does have a genetic component ( Rudney et al., 1994). It also increases with gingivitis and periodontal disease ( Henskens et al., 1993; Rudney et al., 1994). In principle, much of the increase in salivary protein levels with disease could be due to serum transudate. However, the lack of correlation with myeloperoxidase levels, together with an increase in parotid saliva protein, suggests a substantial contribution from a glandular source ( Rudney et al., 1994; Henskens et al., 1996a). A considerable range in S-type cystatin mRNA levels was found among submandibular glands (SMGs) from three individuals ( Dickinson et al., 2002), and levels of cystatin protein and activity have been reported to change in response to oral health status, although these studies are not entirely consistent (see below). There is some discordance among the reported levels of cystatin mRNAs, protein concentrations, and cystatin activity ( Henskens et al., 1996a, c; Dickinson et al., 2002), suggesting a contribution of post-translational and post-secretion events (such as N-terminal processing) to overall inhibitory capacity. Also, for parotid saliva, the contribution of cystatin C to total activity (against papain) may predominate and be subject to change (see below). Once the functions of SD-type cystatins are established, it will be of great interest to relate population variance in activity to oral and overall health status.
(III) Structure-Function Relationships in Type 2 Cystatins
The importance of the three evolutionarily conserved cystatin regions in CP inhibition has been confirmed by enzymatic removal of N-terminal domains, and expression of recombinant mutant proteins. Actinidin and papain are related plant CPs, yet the affinity of chicken cystatin for papain is 10,000-fold higher. Similarly, S-like cystatins are 90% similar, yet cystatin S is a significantly poorer inhibitor than cystatin SA or SN. Collectively, differences in the relative contributions of the three conserved regions—and specific residues within these regions—to the free energy change for complex formation can account in large part for the dissimilarities in Ki values between different cystatins and enzymes ( Auerswald et al., 1995). Thus, for chicken cystatin with papain and cathepsin B, the N-terminal region and the QXVXG loop are more important than the PW loop; with cathepsin L, the N-terminal region appears to dominate. As outlined below, the differences between cystatin C and the SD-type cystatins in inhibitory activities (Table ) are paralleled by differences in the contributions of the conserved domains to binding.
(i) N-terminal domain
Docking models based on the three-dimensional structures of chicken cystatin and papain suggested that one function of the conserved G11 residue could be to provide flexibility of the N-terminal region, allowing it to adopt a conformation that provides maximal binding contribution ( Bode et al., 1988). Consistent with this model, mutation of cystatin C G11 increases the Ki, and the size of the effect depends upon the substitution and the target enzyme ( Hall et al., 1993). Further, removal of the N-terminal decapeptide from these variants has a relatively small effect on the Ki values, unlike the effect in wild-type cystatin C, where the Ki is greatly increased. Neutrophil elastase rapidly cleaves the N-terminal region of cystatin C between V10 and G11 (the conserved glycine) to generate a form lacking 10 residues ( Abrahamson et al., 1991). The activity of the modified cystatin C is reduced, but not uniformly: The affinity for cathepsins B and L is more than three orders of magnitude lower, while for cathepsin H it is only five-fold lower (although this appears to underestimate the importance of this region for this enzyme [ Hall et al., 1995]). Therefore, the N-terminal region of cystatin C likely binds in the substrate-binding pockets of cathepsins B and L, but makes little contribution to cystatin C binding to cathepsin H. To examine the contributions of the individual N-terminal side-chains to binding, investigators have used site-directed mutagenesis to replace residues 8-10 with glycine or other amino acids, either singly or in combination ( Hall et al., 1995; Mason et al., 1998). Residue 10 was found to be responsible for the main contribution to binding affinity for cathepsins B, H, L, and S. Most V10 substitutions tested caused a decrease in the Ki. For example, V10G decreased it by 2-3 orders of magnitude, depending on the target enzyme. Some substitutions increased affinity for a particular enzyme (e.g., V10W increased the affinity for cathepsin L 10-fold but decreased the affinity for cathepsin S about two-fold). R 8 and L9 were also found to make smaller, enzyme-dependent contributions to binding affinity. Thus, residues 8, 9, and 10 are involved in binding specificity, and cathepsin-specific cystatins can be produced from the broadly inhibitory cystatin C by the selection of appropriate residues at these positions.
Human S-type cystatins can be similarly truncated at the N-terminal (Fig. 3 ; see above). In general, truncated forms either isolated from saliva or produced by cleavage with enzymes (e.g., gingipain R), or by recombinant techniques, show, at most, modest differences in inhibitory activity toward papain (and ficin) as compared with the full-length proteins ( Bobek et al., 1994; Blankenvoorde et al., 1996; Saitoh et al., 1998; Baron et al., 1999a). Mutation of the conserved G11 (G11A-G12A) in cystatin SN also had no significant effect on papain inhibition ( Tseng et al., 2000). Three forms of rat cystatin S have been isolated from rat saliva, designated RSC-1, -2, and -3, that differ in their extent of N-terminal processing, with RSC-1 being truncated to G11 ( Nishiura et al., 1991). Although RSC-1 is a poorer inhibitor of papain and ficin than RSC-3, which has an additional three residues, the differences were not large (< 50-fold). In contrast to papain inhibition, a cystatin SAT protein isolated from saliva with a six-residue truncation was found to be a 1000-fold poorer inhibitor of cathepsin L than cystatin SA ( Baron et al., 1999a). N-terminally truncated forms of cystatin D have also been examined ( Hall et al., 1998). A form lacking all residues to G11 inclusive had essentially no activity against cathepsins H, L, or S (Ki > 1 μM), indicating that this region contributes 2-4 orders of magnitude to the Ki value, depending on the enzyme. Exchanging the N-terminal regions of human cystatins C and D resulted in inhibitors with moderately altered affinities for these three cathepsins. Collectively, these studies suggest that the N-terminal extension of the SD-type and rat cystatin makes only a modest contribution to papain binding, but is important for the specificity and strength of binding to cathepsins.
(ii) QXVXG domain
The binding affinity of human cystatin C with a deleted N-terminal region and mutation of the highly conserved W106 (see below) indicates that the QXVXG region contributes 40-60% of the total free energy of binding to actinidin, papain, and cathepsins B and H ( Bjork et al., 1996). Mutagenesis of the QXVXG motif in chicken cystatin demonstrated that increases in the Ki were primarily the result of an effect on koff ( Auerswald et al., 1995). However, the effects of different mutations were not consistent with all enzymes: e.g., alteration of the QXVXG loop had only a relatively modest effect on the Ki for cathepsin L or papain, but gave a > 1000-fold increase in the Ki for cathepsin B. This region appears to be particularly important for the inhibition of cathepsin S by human cystatin C, but less so for cathepsins B, H, and L ( Hall et al., 1995).
Alteration of the QTVGG loop of cystatin SN by deletion or substitution drastically reduces the inhibitory activity toward papain ( Bobek et al., 1994; Hiltke et al., 1999; Tseng et al., 2000).
This region is also essential for activity in rat cystatin S. Replacement of the QVVAG loop with LVL resulted in a protein with minimal activity toward papain ( Bedi et al., 1998). The allelic variants of the QXVXG motif in cystatin SA (plus a co-variant in residue 120; see above) differ in their inhibitory activities ( Saitoh et al., 1998). While SA1 (QIVGG) is a potent inhibitor of plant CPs and a good inhibitor of cathepsin K, SA2 (QIVDG) is a poorer one, especially for papain.
(iii) PW domain
As for the N-terminal region, the contribution of W106 to binding affinity (mainly by affecting koff) depends on the target enzyme. For cathepsin B, it makes a significant, although not dominant, contribution, but is less important for cathepsin S or cathepsin L, where the QXVXG and N-terminal regions, respectively, make the main contributions ( Auerswald et al., 1995; Hall et al., 1995). Substitution of W106 with G in human cystatin C reduced the affinity for papain, actinidin, and cathepsins B and H by 300- to 900-fold ( Bjork et al., 1996). Mutation of the PW motif in cystatin SN (P105G-W106G) had minimal effect on papain inhibition, but decreased the affinity for cathepsin C more than 100-fold ( Tseng et al., 2000). Thus, for papain inhibition by the SD-type cystatins, the QXVXG region has the main effect on binding, while for cathepsin CPs all three conserved regions contribute.
(iv ) Other domains
Remarkably, type 2 cystatins can inhibit mammalian legumain, a CP that belongs to a family (family C13) distinctly different from the papain-like CPs (family C1). Legumains perform protein-processing functions and have been shown to be important in antigen presentation in mammals (reviewed in Dickinson, 2002). Legumains have a strict requirement for Asn at the P1 position of the substrate. The Ki values for human cystatins C, E/M, and F with pig legumain are 0.2 nM, 0.0016 nM, and 10 nM, respectively ( Alvarez-Fernandez et al., 1999). Recombinant cystatin C variants demonstrated that the papain/cathepsin and legumain inhibitory activities are independent, and that N39 is required for legumain inhibition. This residue is located in a loop on the opposite side of the papain-binding surface (Fig. 2 ). Although cystatin D has an asparagine in this region (Fig. 1 ), it was found to be non-inhibitory, perhaps due to an immediately adjacent positive, instead of a negative, charge highly conserved in vertebrate cystatins (Fig. 1 ). All 3 S-like cystatins lack an asparagine in this region, and so would be expected to be inactive against legumain.
Cystatin SN, but not S, SA, or chicken cystatin, was found to form a variant stable complex with papain in which the enzyme remains proteolytically active ( Baron et al., 1999b). This complex does not apparently involve the normal docking of the inhibitor with the active site, since it forms even when the active site is blocked with the irreversible inhibitor E-64. The region involved in this interaction is not known.
(IV) Cystatin Genes in the Mammalian Genome
Diversity in mammalian type 2 cystatins is created by the existence of a multigene family. The human cystatin C gene (CST3) and genes for cystatin S, (CST4), cystatin SA (CST2), cystatin SN (CST1), cystatin D (CST5), cystatins E/M (CST6) and cystatin F (CST7) have been cloned ( Saitoh et al., 1987, 1992; Abrahamson et al., 1990; Freije et al., 1991; Dickinson et al., 1993; Thiesse et al., 1994; see below for references for cystatins E/M and F). There was some initial confusion over the numbering of the human type 2 cystatin genes (see Saitoh et al., 1991, 1992; Thiesse et al., 1994). The CST3 gene spans about 4.3 kb and is comprised of three relatively small exons and two introns. The intron sequences are shorter in the SD-type cystatin genes, with the greatest differences occurring in the first intron, producing genes of about 3.5 kb ( Freije et al., 1991). However, the intron exon boundaries in these genes are the same. CST7 is unusual in that it has a fourth exon ( Halfon et al., 1998). This exon encodes the predicted leader peptide and an additional 5 residues at the N-terminal, and is separated from the following 3 exons by a large ca. 5.1-kb intron. The intron-exon positions in the rest of the gene are typical. Two related pseudogenes (CSTP1, CSTP2) have also been identified ( Saitoh et al., 1987, 1992; Thiesse et al., 1994; Dickinson et al., 2002). Several studies with SD-type cystatin hybridization probes determined that the human genome carries at least 7, but probably no more than 9, genes with sufficient similarity (roughly 60-65% nucleotide identity) to be detected by Southern blotting ( Al-Hashimi et al., 1988; Thiesse et al., 1994). Cystatins E/M and F have much lower levels of similarity and would not be detected. Thus, it is probable that all SD-type genes are now known.
CST1-5, CST7, CSTP1, and CSTP2 have all been localized by fluorescence in situ hybridization (FISH) to 20p11.2 ( Abrahamson et al., 1989; Freije et al., 1993b; Dickinson et al., 1994; Thiesse et al., 1994; Morita et al., 2000). Physical mapping by means of pulsed-field gel electrophoresis and gene-specific hybridization probes localized all seven genes to a cluster spanning no larger than ca. 365 kb, with CST3 at one end ( Dickinson et al., 1994; Thiesse et al., 1994) (Fig. 5 ). CSTP2 and CST1 had previously been shown, by cosmid cloning, to be tandemly linked ( Dickinson et al., 1993). At present, the public domain human genome map in the region of Chromosome 20 containing the known SD-type cystatin genes is incomplete. It may be significant that clones containing certain regions of the S-like cystatin genes are highly unstable in Escherichia coli, even in strains designed to stabilize unusual sequences ( Millar et al., 1992; Dickinson, unpublished observations). Efforts to "walk" between genes were also frustrated by dispersed repetitive sequences found in the regions between the cystatin genes ( Dickinson et al., 1993), and PCR screens of three YAC libraries for clones spanning the gene cluster were unsuccessful (Dickinson and Thiesse, unpublished observations). CST3 and CST4 have been physically linked in the genome sequence map, and placed on the telomere side of a ca. 300-kb gap (Fig. 5 ). A gene localized to the centromere side of this gap is currently designated as ’similar to cystatin SA’. However, comparison of its sequence with those of known genes reveals it to be CSTP1 (data not shown), consistent with the physical mapping described above. Thus, the order and orientation of CST2, CST5, and CSTP2-CST1 within this cluster remain to be established. However, it is interesting to note that the genes in this cluster thus far localized are all tandemly oriented in a head-to-tail manner, suggesting that the gene cluster has evolved primarily by simple unequal crossover. Cystatin F (CST7) has been placed ca. 1 Mb centromeric to this cluster, and CRES (CST8) and three related genes are located within a ca. 150-kb region telomeric to CST3. This gene organization is conserved in the rat and mouse, and rat salivary cystatin S gene (designated as CST4) has been mapped close to the CST3 ortholog ( Alonso et al., 1997; Cornwall et al., 1999; Shoemaker et al., 2000). Therefore, all of these genes have been closely linked at least during mammalian evolution, but probably for much longer. Only a high-stringency Southern blot of the rat genome probed with rat cystatin S has been published ( Cox and Shaw, 1992). It would be of interest to know if the rat genome has other rat CST4-like genes.
Other human cystatin genes map elsewhere. Cystatin E/M (CST6), although a type 2 cystatin, has been mapped to 11q13 by FISH ( Stenman et al., 1997). HRG, α2-HS, and kininogen map to a region of less than 0.5 Mb on chromosome 3q27 ( James et al., 1996), consistent with models for a common evolutionary origin for these three cystatin-domain proteins. Cystatin A also maps to chromosome 3, although at 3q21 it is rather distant from the fetuin gene cluster. Cystatin B maps to 21q22.3. The human secreted phosphoprotein 24 gene (SPP2) has been mapped to 2q37 → qter ( Swallow et al., 1997).
(V) Cystatin Gene Expression
Cystatin gene expression at the mRNA level has been examined in several studies with hybridization probes, at the protein level by immunoassay, and at the activity level by papain inhibition. There are two important caveats to these studies. First, given the degree of homology among the cystatins (particularly the human S-like cystatins), it is not always possible to define precisely which gene is being expressed. Second, when pooled fluid secretions are examined, the tissue source of the protein is not always clear.
(1) Human SD-type cystatins
In a recent comprehensive study, the distribution of human SD-type cystatin gene expression in 23 adult tissues was examined with the use of gene-specific riboprobes in a sensitive RNase protection assay (RPA), and sites of expression by immunohistochemistry ( Dickinson et al., 2002). Three patterns of expression were found: CSTP1 and CSTP2 were confirmed as non-expressed pseudogenes, CST3 demonstrated ubiquitous expression, although levels varied somewhat between different tissues, and the SD-cystatin genes CST1, 2, 4, and 5 were shown be expressed in a differential tissue-specific manner. Most tissues did not have detectable levels of SD-type cystatin mRNA. Based on their distribution and level of expression, the SD-type cystatins could be divided into two subgroups. CST2 and CST5 were expressed only in the SMG and parotid gland, at levels comparable those of CST3. CST1 and CST4 were both expressed in these tissues, but SMG mRNA levels were much higher than those of CST3. Expression was localized to the serous acini and demilunes. CST1 and CST4 were also both expressed at modest levels in the acini of the orbital lobe of the lacrimal gland, and the epithelial linings of the gall bladder and seminal vesicle. CST4 was found in the proximal convoluted tubules of the kidney and at trace levels in the prostate. For the tracheal sample used in this study for mRNA analysis, very low levels of CST1 were detected ( Dickinson et al., 2002). In another sample, significant S-like cystatin expression was observed in the serous acini and demilunes of the tracheal gland. Northern blot analysis of other samples showed quite different levels between and among samples (Dickinson, unpublished observations). These results are generally consistent with those of several earlier studies that have examined SD-type cystatin expression, primarily using Northern blotting or immunohistochemistry in a more limited number of tissues (e.g., Sabatini et al., 1989; Barka et al., 1991; Bobek et al., 1991; Freije et al., 1991; Takahashi et al., 1992). Salivary cystatins (identified by immunoassay) comprise about 10% of tear protein, and cystatins S and SN were identified in tear fluid ( Barka et al., 1991). Cystatin D has also been reported in tear fluid ( Freije et al., 1993a). A likely explanation is that cystatin D is not expressed in the orbital lobe of the lacrimal gland, but in one or more of the other glands that secrete in the eye. Non-purulent bronchial secretions contain up to 30 μg/mL (2.1 μM) S-type cystatins, and up to 6 μg/ mL (0.4 μM) cystatin C ( Buttle et al., 1990). Since SD-type cystatins are not produced by bronchial epithelial cells ( Burnett et al., 1995), tracheal glands appear to be a major source. Cystatin S protein has been specifically identified by electrophoresis and sequencing in both broncho-alveolar and nasal lavage fluids ( Lindahl et al., 1999). It will be of interest to examine CST1 and CST4 expression in additional tracheal gland samples, and to determine if they can be modulated by disease. In summary, SD-type cystatin gene expression is primarily restricted to serous-type acinar and demilune cells of anterior exocrine glands, and to secretory epithelia of a limited number of other tissues in the body. However, it is clear that, as a subfamily, they are more than ’salivary cystatins’.
Expression of human SD-type cystatin genes and CST3 was examined during pre- and post-natal development of the SMG ( Dickinson et al., 2002). CST3 was expressed at modest levels before birth, and showed only a 2.7-fold increase between 2 and 9 months of age. In contrast, all four SD-type cystatin genes were expressed at trace levels before birth. Expression rose to 18-38% of adult levels during the first week of full term, then declined to 1-6% of adult levels by 1-2 months of age. Then, between 2 and 9 months, all four genes showed a dramatic, co-ordinate rise in expression to adult levels.
Very little is known regarding the mechanisms regulating human SD-cystatin gene expression. Independent lines of mice carrying a 22-kb CST1 transgene showed significant expression in the parotid and lacrimal glands, but not the SMG ( Dickinson and Thiesse, 1995). This lack of SMG expression could reflect the absence of an SMG enhancer in the transgene, the lack of a cognate transcription factor in the mouse SMG, or the presence of a repressor. The last two possibilities would be consistent with the seromucous nature of the mouse SMG acini, and the lack of expression of CST1 in human mucous acini. Using phylogenetic footprinting, Shaw and Chaparro (1999) have found conserved motifs in the promoter region of salivary protein genes, including human cystatins. Their functionality remains to be tested. A major hindrance is the lack of cell lines that routinely express their endogenous SD-type genes. An immortalized human submandibular gland cell line (HSG) has been reported to express cystatins immunoreactive with a polyclonal anti-cystatin SN antibody when grown on Matrigel ( Hoffman et al., 1996). However, preliminary tests with gene-specific riboprobes failed to detect expression of any SD-type cystatins in HSG cells (Dickinson and Thiesse, unpublished observations). Cross-reactivity of the antibody is an obvious concern, but this cell line warrants further study.
(2) Rat cystatin S
The rat cystatin S gene has been cloned ( Cox and Shaw, 1992). It has the typical type 2 cystatin 3-exon-2-intron structure, and the same GATAAA variant of the TATA box as the human SD-type cystatins (and cystatin C). However, it is sufficiently divergent in comparison with human SD-type cystatins (< 61% nucleotide identity between exons), such that rat and human probes would not be expected to cross-hybridize significantly. Phylogenetic analyses of the relationship of rat cystatin S and human SD-type cystatins are inconsistent (see above), and although there are similarities between the human and rat genes at the level of gene expression, there are significant differences.
In rats, development of the SMG at the molecular and histological levels continues post-natally (reviewed in Denny et al., 1997; see Nishiura and Abe, 1999, for early references). Cystatin S mRNA was undetectable by Northern blotting in 20-day-old fetuses and newborn or 10-day-old Sprague-Dawley rats, although trace levels of cystatin S mRNA were detected at 1 week by means of a sensitive quantitative reverse-transcriptase/polymerase chain-reaction (RT-PCR) ( Shaw et al., 1990; Nishiura and Abe, 1999). Cystatin S mRNA levels were found to rise dramatically between 21 and 28 days. Expression was confined to the acinar cells and coincided with acinar cell differentiation ( Shaw et al., 1990). Following this short period, cystatin S mRNA levels declined rapidly: By 32 days, expression was near the limits of detection by Northern blotting ( Shaw et al., 1990). This developmental pattern has some similarities to that seen in humans. However, in the rat, the levels do not rise again. Cystatin S protein was undetectable in adult saliva by Western blotting ( Bedi, 1991). Low-level SMG cystatin S gene expression detectable by RT-PCR was shown to persist out to 52 weeks, and mRNA levels in the adult rat were about 100-fold lower than those of cystatin C, which did not show marked developmental regulation ( Nishiura and Abe, 1999). Expression of cystatins C and S in the adult Sprague-Dawley rat SMG detectable by in situ hybridization is confined to the acinar cells ( Barka and van der Noen, 1994). As for human S-like cystatins, expression of rat cystatin S is not limited to the salivary glands. In contrast to the adult rat SMG and parotid, rat cystatin S (or an immunologically similar protein) is expressed at detectable levels in the acinar cells of the adult lacrimal gland, and in a subset of the sebaceous glands ( Takahashi et al., 1992; Cohen et al., 1996). The detection of rat cystatin S by immunoelectron microscopy in normal osteoclasts ( Moroi et al., 1997) suggests that this may also be a normal site of expression in adult animals, thereby implying a role for CPs in the degradation of bone extracellular matrix (see below).
SMG secretion in adult rats is regulated by both parasympathetic and sympathetic nerves of the autonomic nervous system. Chronic treatment with the β-adrenergic agent IPR causes reversible SMG enlargement and induces expression of several proteins. Cystatin S expression, but not that of cystatin C, is rapidly and dramatically up-regulated, reaching salivary levels as high as 1.6 mg/mL following chronic injection with IPR ( Shaw et al., 1990; Bedi, 1991; Barka and van der Noen, 1994). Induction is not restricted to the SMG: In adult female rats, modest levels of cystatin S mRNA are induced in the parotid following IPR treatment. Lacrimal expression is not significantly affected. The effects of β1 or β2 agonists and antagonists, and sympathectomy and parasympathectomy, indicate that regulation of SMG expression involves more than simple control via sympathetic nerves of the autonomic nervous system, β1-adrenergic receptors, and adenylate cyclase-cAMP ( Shaw and Yu, 2000, and references therein). Expression is influenced by both branches of the autonomic nervous system, perhaps involving similar factors (e.g., neuropeptides common to both types of nerve terminals), and both branches are required for the maximum response to IPR. Also, cystatin S induction by IPR in females is greater than in males ( Shaw et al., 1990).
Expression of rat cystatin S in the SMG can be induced by a variety of noxious stimuli. Dietary capsaicin (an active ingredient in hot peppers) was shown to cause enlargement of the SMG and expression of new proteins ( Katsukawa and Ninomiya, 1999). The new protein profile was very similar to that induced by chronic IPR treatment, and included cystatin S-like proteins. Although food consumption was reduced for the first two days of exposure, normal intake was then resumed for all but the highest-dose diet, indicating that the changes were not simply the result of a change in nutritional status. Chemosensory information is conveyed by the glossopharyngeal nerve, and severing this nerve substantially reduced the protein induction (although this nerve does not innervate the SMG). Very similar results were obtained by feeding animals a diet containing papain, which also promoted SMG enlargement and a dramatic increase in cystatin S levels in a dose-dependent manner ( Naito et al., 1992; Ninomiya et al., 1994). These changes, which were not caused by inactivated papain, could be blocked by a selective β1 antagonist, or by glossopharyngeal denervation. Repeated amputation of the lower incisors, or lateral separation of the upper incisors by an orthodontic appliance, also caused SMG enlargement and induction of cystatin S ( Yagil and Barka, 1986; Kamogashira et al., 1988). The common effects of these different noxious stimuli are consistent with a role for rat cystatin S in a response to injury. Consistent with this, SMG acinar cell expression was also found to be induced by the toxic chemicals cyclocytidine (a potent anti-tumor agent with adverse side-effects in humans involving the salivary glands), turpentine (which causes generalized acute inflammation), and potassium dichromate (which causes reversible acute necrosis of renal proximal tubules) ( Cohen et al., 1993a). In contrast, another study in Sprague-Dawley rats found that neither cystatin C nor S is up-regulated in this tissue by acute inflammation induced by turpentine, suggesting potential strain differences ( Barka and van der Noen, 1994). A role in response to injury may not be confined to the SMG. Rat cystatin S (or an immunologically identical protein) is also expressed in the proximal tubules of the kidney following a variety of treatments or agents that cause immunological or chemical injury ( Cohen et al., 1993b). Indicative of a common mechanism, chronic IPR treatment also induces a protein immunologically identical to rat cystatin S in the proximal convoluted tubules of the kidney ( Cohen et al., 1990), a site of low-level SD-type cystatin expression in humans.
Thus, like human SD-type cystatins, cystatin S expression in the rat is developmentally regulated, tissue-specific, and primarily, but not exclusively localized to similar anterior exocrine glands. Within these tissues, expression is also localized to acinar cells. However, the pattern of post-natal expression in the rat SMG (and parotid) differs from that in humans, in that levels fall to trace values after the initial induction during gland maturation, and are at least partially controlled by the autonomic nervous system. Expression in the lacrimal gland is more comparable. Human sebaceous glands have not been examined. Regulatory mechanisms governing rat cystatin S expression are largely unknown. Comparison of the promoter 5′-flanking sequence to other salivary-gland-specific genes (including human CST1 and 2) identified three 9-10-base conserved motifs (designated I, II, and III) that might represent cis-acting elements involved in gland-specific expression, as well as a potential cAMP-responsive and steroid-responsive elements ( Shaw and Chaparro, 1999). Motifs II and III, and a GT27 repeat, are located in a 281-bp region about 700 bases upstream of the transcription start site. Transient expression assays in rat SMG-derived A5 cells with constructs containing fragments of this region indicated the presence of both positive and negative regulatory elements. The GT27 repeat, which has the potential to adopt a non-B-form structure, appeared to have enhancer activity.
(3) Expression of non- SD-type cystatin genes
The CST3 gene encoding human cystatin C is expressed in all tissues and cell types, although mRNA levels vary several-fold between and among tissues (e.g., Abrahamson et al., 1990; Corticchiato et al., 1992; Dickinson et al., 2002). One of the highest levels of expression is found in secretory epithelial cells of the choroid plexus, at least since the common ancestor of birds and mammals ( Tu et al., 1992; Colella et al., 1994). This pattern of ubiquitous cystatin C expression appears to have been conserved since the evolution of bony fishes over 400 MYa (e.g., Cole et al., 1989; Colella et al., 1989; Hakansson et al., 1996; Yamashita and Konagaya, 1996). A high cystatin C level in the mouse parotid has been reported ( Hakansson et al., 1996). The human CST3 promoter region has a high GC content (> 70%) and a CpG-to-GpC ratio near 1 ( Abrahamson et al., 1990). These are features typical of a constitutively expressed ’housekeeping’ gene. Transient expression assays in HeLa cells demonstrated a strong promoter, and identified a region of 123 bases adjacent to the presumptive TATA box (GATAAA) that contained major positive regulatory elements ( Olafsson, 1995).
Cystatin C is not an acute-phase protein in mammals ( Cole et al., 1989). However, cystatin C expression has been shown to be up-regulated by lipopolysaccharide (LPS) in monocyte-derived dendritic cells ( Hashimoto et al., 2000), and by TGF-β in murine embryonic astrocyte precursor cells ( Solem et al., 1990). TGF-β can be released by platelets. Therefore, although routinely described as a housekeeping gene, cystatin C levels are regulated in a manner that suggests a relationship to injury and immune responses (see below). Consistent with a role for cystatins in the immune system, in humans, cystatin F is primarily expressed in cells of the immune system, including resting T-cells, pre-monocytic cells, and activated dendritic cells derived from stem cells ( Halfon et al., 1998; Ni et al., 1998; Hashimoto et al., 2000). Expression was not found in monocytes, or in dendritic cells derived from monocytes, although it was found to be up-regulated in the latter cells in response to LPS. In contrast, in the mouse, expression was found in differentiated T-cells (both Th1 and Th2), but little in naïve and pre-T-cells or dendritic cells, and modest amounts were found in monocytes.
Northern blotting indicated that cystatin E/M is expressed in most human tissues, although, like cystatin C, the levels vary considerably between and among tissues ( Ni et al., 1997; Sotiropoulou et al., 1997). Trace levels were detected in urine. It is particularly abundant in fetal skin and amniotic membrane cells, suggesting a role in fetal development. However, these two studies were not entirely consistent in the levels of expression in various tissues. Fully contradicting these studies, a CST6 gene-specific sensitive RT-PCR screen of a large number of human tissues—including gingiva, trachea, prostate, kidney, and mammary gland—failed to detect cystatin E/M anywhere except the skin ( Zeeuwen et al., 2001). The protein was immunolocalized to the stratum granulosum of normal skin and secretory coils of eccrine sweat glands, and was expressed by epidermal keratinocytes in vitro. These discrepancies remain to be resolved. Expression of cystatins E/M and F is altered in metastasis. Cystatin E/M was initially cloned by differential display of mRNAs down-regulated in a metastatic mammary epithelial cell line as compared with a primary tumor cell line from the same patient ( Sotiropoulou et al., 1997). Cystatin E/M mRNA levels were undetectable in several carcinoma cell lines, consistent with a loss of expression during the progression from a primary to a metastatic tumor. Conversely, cystatin F is selectively overexpressed in murine cell lines that form multiple liver metastases, and is expressed in cell lines derived from human malignant tumors ( Morita et al., 1999).
Expression of the members of the CRP/CRES/testatin subfamily is primarily, but not exclusively, restricted to the male reproductive tract. Expression in these tissues is androgen-dependent, but also requires other unknown testicular factors. Rat CRP1 and 2 are abundant glycoproteins expressed post-natally in the secretory epithelial cells of the ventral prostate, but only CRP1 is expressed in the acinar cells of the parotid ( Winderickx et al., 1990; Aumuller et al., 1995; Vercaeren et al., 1998). No expression in the SMG has been detected ( Winderickx et al., 1990; Shoemaker et al., 2000). Thus, the expression of this subfamily of type 2 cystatin genes is completely different from that of any of the other cystatin gene family members, but still includes secretory epithelial cells.
(V) Potential Functions of Type 2 Cystatins
Over the years, various functions have been ascribed to type 2 cystatins. Most can be grouped into four general categories: direct inhibition of endogenous or exogenous CPs; modulation of the immune system; antibacterial and antiviral activities (that may be unrelated to inhibition of CPs); and a role in control of mineralization at the tooth surface. Potential protective activities have been largely identified by in vitro assay; how many are relevant in vivo? All of the above would certainly seem to be essential for overall oral health. Functional redundancy is often ascribed to the protein constituents of saliva, and several other unrelated salivary proteins have been shown to have activities overlapping those of cystatins. However, a genuinely redundant gene product is a prime candidate for loss through mutation (as is often seen in multigene families), leading to formation of a pseudogene. Thus, a redundant property of any one protein is more likely to reflect a specialized extension of an underlying function maintained by selection either directly or indirectly via the need to maintain a particular structural feature for another purpose. It is also important to note that the functions of proteins in saliva need not be confined to the oral cavity: Both cystatin S in rats and the plant CP bromelain in humans can survive passage through the stomach and be translocated into the serum ( Nishiura et al., 1995; Targoni et al., 1999).
Genetic diseases can provide a powerful tool for the identification of the in vivo function of a gene product. Recently, an association of the CST3 BB genotype and late-onset Alzheimer’s disease has been reported ( Deng et al., 2001, and references therein). A role for cystatin C in protection of injured tissue is discussed below. A mutation in the CST3 gene causing an L68Q substitution has been shown to cause the rare autosomal-dominant disease, hereditary cerebral hemorrhage with amyloidosis, Icelandic type (see Calero et al., 2001, and references therein). The variant is still a functional inhibitor, but amyloid comprised primarily of the cystatin C variant is deposited in the walls of blood vessels in the brain. The variant creates a subtle change in conformation, predisposing the monomer protein to associate into fibrils. As for other salivary proteins, identification of the true in vivo function(s) of the SD-type cystatins is hampered by the lack of identified individuals with functional deficiencies in just these proteins.
(1) Inhibition of endogenous CPs
Although the evidence is not definitive, cystatins appear to have a general role in protection against tissue injury by CPs released during normal processes, or as a result of an insult, and they may serve as a marker for damaged regions. Cystatins may act through inhibition of CPs, either extracellularly or intracellularly, following re-uptake. The epithelia of the oral cavity is without doubt a site of potential mechanical injury from food, particularly in the wild, and the eye from wind-borne abrasives. Constitutive expression of SD-type cystatins at these locations could thus reflect the continuous need for this type of protection.
(a) Physiological regulation by type 2 cystatins
The human genome encodes at least 11 cathepsins related to papain (reviewed in Dickinson, 2002). The best-known cathepsin CPs are the lysosomal CPs cathepsins B, H, and L. However, these and other cathepsins can be released from cells under certain conditions. Cathepsins participate in a wide range of intracellular and extracellular activities in health and disease. They are capable of degrading many components of the extracellular matrix. For example, cathepsins K and S are potent elastolytic and collagenolytic CPs, and cathepsin S, unlike other cathepsin CPs, is relatively stable at neutral pH and retains considerable activity. Clearly, these enzymes must be controlled. The high affinity of cystatin C for human CPs, together with its 0.1-1 μM concentration in biological fluids, implies that it is likely to be the main physiological regulator of CP activity, and particularly cathepsin B, in mammals ( Abrahamson et al., 1986). Several lines of evidence point to multiple functions for cystatin C in controlling extracellular proteolysis. For example, cystatin C is expressed in normal vascular wall smooth-muscle cells, but levels are very low in atherosclerotic lesions, as well as aneurysmal aortic lesions ( Shi et al., 1999). Further, serum levels of cystatin C negatively correlate with the progression of small abdominal aortic aneurysms, consistent with a role in regulating blood vessel integrity ( Lindholt et al., 2001). Macrophage and smooth-muscle-derived cathepsins K and S have been implicated in atherosclerotic lesions, and these CPs are overexpressed at sites of damage (reviewed in Dickinson, 2002). Thus, a balance between cystatin C and cathepsins K and S appears to be important for the remodeling of arterial walls. Cystatin C is a major product of the stroma during decidualization and is co-ordinately regulated with cathepsins B and L during mouse embryo implantation and placentation ( Afonso et al., 1997), consistent with a regulatory role in these events. To what extent, if any, SD-type cystatins are involved in blocking endogenous CPs remains to be established. In periodontally healthy individuals, cystatin SN is present in saliva at a concentration that would allow for effective inhibition of cathepsins B, C, H, and L, and cystatin SA at levels that would inhibit cathepsins C and L (see above).
Cystatins and CPs almost certainly do not operate independent of other degradative systems, but participate in complex feedback systems that can produce a sharp change in proteolytic activity that has been called the proteolytic burst ( Lah et al., 1993; reviewed in Dickinson, 2002). For example, bronchial epithelial cells secrete procathepsin B and cystatin C. The former can be activated, and the latter inactivated, by elastase from inflammatory cells ( Burnett et al., 1995). Cathepsin L has also been shown to cleave cystatin C at the G11G12 bond, thereby rendering it physiologically inactive ( Popovic et al., 1999). This creates a potential positive feedback network. Co-production of CPs and cystatin C is seen in other cells. Cystatin C is secreted by monocytes and macrophages, and the release is down-regulated by the pro-inflammatory LPS and IFNγ ( Warfel et al., 1987). In rheumatoid arthritis synovium, levels of cystatin C and cathepsin B proteins were found to be considerably elevated in macrophage-like and fibroblast-like synoviocytes at sites of cartilage and bone destruction ( Hansen et al., 2000). In healthy tissues, few cells produced these proteins. High levels of cystatin C were also detected in osteoclasts, which produce cathepsin K, and cystatin C inhibits bone resorption in vitro ( Lerner et al., 1997; Yamaza et al., 2001). Rat cystatin S may have a role in control of breakdown of the extracellular matrix of bone, since it can be detected in osteoclasts ( Moroi et al., 1997). Human SD-cystatin expression in bone does not appear to have been examined. These examples are consistent with the notion that CP-mediated proteolysis is a highly regulated process, and that to ensure tight control, many proteolytically active cells do not simply produce the enzymes, but also actively participate in establishing the control systems governing them by producing powerful inhibitors, type 2 cystatins. Combinations of positive and negative feedback controls within a proteolytic burst could create a large, rapid, but normally highly contained response, as is seen in blood clotting.
(b) Cystatins and cancer
Cathepsin B secretion may be important in penetration of the extracellular matrix during metastasis, and cystatin C may be involved in regulating this process ( Corticchiato et al., 1992; reviewed in Dickinson, 2002). The up-regulation of cystatin F and down-regulation of cystatin M associated with cancer (see above) are also indicative of a functional association. Given its seeming importance in controlling CPs, it is remarkable that cystatin C null mice are apparently normal, lack histological abnormalities, and are fertile ( Huh et al., 1999). A further surprise, given the evidence for an involvement of cathepsin B in cancer, is that metastasis of melanoma cells in cystatin C-null mice is actually reduced. Adhesion and seeding of melanoma cells in the lungs were reduced, and their growth in the lung parenchyma was inhibited. Possible explanations put forth were an alteration in the response of lung macrophages, increased proteolysis of growth factors, or loss of the growth factor activity of cystatin C itself (see below). Recently, a correlation between high serum levels of cystatin C and higher risk of death in colorectal cancer patients has been found ( Kos et al., 2000). Up-regulation of cystatin F in murine liver metastatic tumors was noted above. Mice injected with a metastatic cell line stably transfected with cystatin F antisense DNA showed significantly fewer metastases and longer survival. Together, these results suggest that while inhibition of cathepsins by cystatin E/M might be important in control of metastasis, inhibition by cystatins C and F may be less important than other functions of these inhibitors. It would be of interest to learn what effects, if any, SD-type cystatins have on metastasis. Do they have a role in suppressing oral cancer?
(c) Cystatins and injury
Cystatin C appears to be up-regulated in response to injury in the brain. Cystatin C protein was detected by immunohistochemistry in few of the hippocampal pyramidal cells of the normal rat brain, but was present in these cells 3 days after experimental ischemia ( Palm et al., 1995). It was localized to morphologically degenerative neurons, and absent from morphologically viable neurons. In all brains from Alzheimer’s disease-affected individuals, but not in the majority of normal brains, strong localization of cystatin C protein was found in the pyramidal neurons in regions of the brain most susceptible to cell death in this disease ( Deng et al., 2001). The punctate pattern of staining suggested that the cystatin was localized to the endosomes and lysosomes of the neurons. In situ hybridization indicated that the cystatin C was synthesized by the glial cells (which are activated in the vicinity of amyloid deposits), and not the neurons, suggesting that the secreted protein was endocytosed by neurons. Consistent with this finding, cystatin C is up-regulated in glial cells in the rat facial nucleus following axotomy ( Miyake et al., 1996). The exact function of cystatin C in the brain and its role in injury are presently unknown. Its appearance would be consistent with a protective role, perhaps by blocking CP activity in damaged cells to allow for recovery, or by acting as a growth factor (see below). However, based on its association with damaged cells, it is conceivable that it may be a mediator of injury. As noted above, in the rat, expression of cystatin S in the SMG and the kidney can be induced by a variety of insults.
(d) Cystatins, endogenous CPs, and periodontal disease
Lysosomal cathepsins released by neutrophils and macrophages have been implicated in tissue destruction in periodontal disease, and cathepsin B levels in the gingival crevicular fluid (GCF) show a significant correlation with periodontal disease status (reviewed in Dickinson, 2002). Cystatin C levels and activities in the GCF of patients with severe periodontitis have been examined ( Abrahamson et al., 1997; Blankenvoorde et al., 1997). In contrast to serum, which contains about 0.1 μM cystatin C, GCF from periodontal disease sites contains very low levels (about 15 nM) of immunologically detectable cystatin C, and no detectable cystatins S or SN. Further, cystatin C added to periodontal GCF is rapidly cleaved in the N-terminal region, rendering it physiologically inactive (see above). This would explain the detection of active CPs in GCF. Elastase appears to be the main enzyme responsible for N-terminal modification, but peptidases secreted by Porphyromonas gingivalis and Treponema denticola (and other residents in the periodontal microflora) can degrade cystatin C. Gingipain R is capable of rapidly cleaving the R8-L9 bond in the N-terminal region, resulting in a 20-fold reduction in affinity for cathepsin B ( Grenier, 1996; Abrahamson et al., 1997; Blankenvoorde et al., 1997). The Ki of this modified cystatin, 5 nM, is of the same order as the concentration in GCF, and thus it would not be an effective inhibitor of cathepsin B. However, although not sufficient to inhibit the CPs present, some papain inhibitor activity is detectable in GCF in periodontitis ( Blankenvoorde et al., 1997; Chen et al., 1998). The amount of activity present (per 30-second collected fluid sample) was found to be significantly lower after treatment, although the concentration did not change significantly. This argues that the higher levels of CP activity in GCF from periodontally diseased sites are not the result of a decline in cystatin activity. Although generally considered a cytosolic protein due to the absence of a leader peptide, cystatin A was detected by immunoblotting in both the GCF and saliva of periodontitis patients. It is not known if this protein is specifically released by cells, or simply by lysis. The biological significance of cystatin A in GCF is unknown. It is a modest inhibitor of cathepsin B, and a potent inhibitor of cathepsins H, L, and S ( Abrahamson, 1994).
The increased flux of lysosomal enzymes into the oral cavity resulting from gingival inflammation could potentially increase degradation of protective salivary proteins, leading to increased disease. In principle, proteolysis of salivary cystatins could exacerbate this by loss of inhibition of CPs, and might be met with compensating changes in SD-type cystatin secretion by the glands. Several cross-sectional studies have examined salivary cystatin levels with respect to oral health status by measuring CP inhibitory activity (usually against papain) or by immunoassays of protein levels. The conclusions reached have been somewhat contradictory: Cystatin levels have been reported to decline ( Baron et al., 1999c), not change ( Aguirre et al., 1992) or increase ( Henskens et al., 1993, 1996a), in response to increased oral inflammation and periodontal disease. Consistent with increased activity in disease, activity concentrations were found to decline following treatment ( Henskens et al., 1996b). There are several likely reasons for the disparities among these studies. Criteria for inclusion in control and experimental populations have differed, and there is evidence for population heterogeneity in salivary protein levels in both health and disease that may stem, in part, from genetic differences ( Rudney et al., 1993, 1994). As an environmental factor, cystatin activity was found to differ between smokers and non-smokers ( Lie et al., 2001). Therefore, at least some of the discrepancies among studies might simply be the result of sampling variation, especially when relatively small groups were used. Also, methods of saliva collection have differed. Some studies have emphasized specific activity (units/μg total protein), others concentration (units/μL). Measurement of total inhibitory activity in whole saliva may be the most physiologically relevant parameter. However, it represents the sum of multiple inhibitors and their processed forms (with very different inhibitory abilities) derived from multiple secretions, and so is the hardest to interpret. Measurement of S-like cystatin protein levels with polyclonal antiserum also determines an aggregate value that may give undue weight to cystatin S (a relatively poor inhibitor), but does not include cystatin C (a potent papain inhibitor) or cystatin D. Further, it does not distinguish between active and physiologically inactive processed forms. Thus, results from different studies might not be directly comparable.
A positive correlation was found between increased whole saliva protein levels and gingival inflammation and periodontal disease ( Henskens et al., 1993; Rudney et al., 1994). Flow rate did not show significant differences ( Henskens et al., 1996a). The source of the protein increase is unclear. One possibility is serum transudate. Both significant increases (e.g., Henskens et al., 1993) and no changes ( Henskens et al., 1996a; Lie et al., 2001) in albumin concentration have been reported, perhaps reflecting disease severity in the patient population. Increases in glandular secretion protein concentrations have been observed in periodontitis ( Henskens et al., 1996a). Different effects on cystatin activity concentration and specific activity result from these alternatives, and different contributions in different patient populations may account for some differences between studies. In principle, examination of cystatin proteins in saliva could resolve these alternatives. Surprisingly, S-like cystatin protein levels were found to be significantly lower in periodontitis patients (about 1.5-fold concentration, 2- to 2.8-fold specific activity) as compared with healthy controls in two studies that observed opposite effects of disease on total cystatin activity ( Henskens et al., 1996a; Baron et al., 1999c). Moreover, the levels had not returned to normal 6 months after treatment ( Henskens et al., 1996b), although they were trending upward while total activity was declining. Cystatin D levels have not been examined. Cystatin C protein levels were found to be increased 1.3-fold along with total activity in periodontitis patients ( Henskens et al., 1996a). One source of this increase appeared to be the parotid gland ( Henskens et al., 1996a). However, the amount of cystatin C in the tested samples would account for only a small proportion of the total inhibitory activity. Analysis of these data indicates that other cystatins could dominate the papain-inhibitory activity of saliva in some periodontitis patients. These could come from the salivary glands, GCF, or lysis of immune cells entering the oral cavity (see above).
(2) Inhibition of exogenous CPs
Papain-like CPs have been identified as important virulence factors for a limited number of bacterial pathogens and a far larger number of important parasitic protozoa and nematodes (reviewed in Coombs and Mottram, 1997; Tort et al., 1999). Few bacterial CPs have been tested for inhibition by cystatins. In general, type 2 cystatins appear to be inactive against bacterial CPs. An elastolytic CP from Staphylococcus aureus was not inhibited by human cystatin C ( Potempa et al., 1988), and human cystatin C and cystatin SN were found to have no activity against clostripain from Clostridium histolyticum ( Hiltke et al., 1999). P. gingivalis Arg- or Lys-gingipains are not inhibited by S-like cystatins, cystatin C, or chicken cystatin ( Blankenvoorde et al., 1996; Abrahamson et al., 1997; Abe et al., 1998; Baron et al., 1999a), although cystatin S and chicken cystatin were found to inhibit up to 40% of the CP-substrate-hydrolyzing activity in the supernatant from P. gingivalis cultures ( Blankenvoorde et al., 1996). In contrast to bacterial CPs, cystatins are potent inhibitors of cathepsin-like enzymes from parasitic organisms. Chicken cystatin and human cystatin C are potent inhibitors of congopain and the closely related cruzipain, important CP virulence factors from the parasitic protozoa Trypanosoma congolense and T. cruzi, with dissociation constants in the pM range ( Stoka et al., 1995; Chagas et al., 1997). Cystatins SN and SA (both truncated and non-truncated), but not cystatin S, were found to be good inhibitors of the trypanosome CP cruzain, with a Ki (apparent) of between 0.9 and 7 nM ( Baron et al., 1999a). In contrast to the host of eukaryote pathogens that can infect the GI tract, there are only two commonly encountered species of protozoa in the oral cavity, Trichomonas tenax and Entamoeba gingivalis, and no nematodes. This raises the intriguing possibility that the primary function of the SD-type cystatins in saliva, tears, and other fluids is to suppress growth of eukaryote, rather than prokaryote, organisms.
The normal primate diet, and most likely that of our ancestors, consists mainly of fruit and seeds. There is the potential for significant exposure to papain- and legumain-like enzymes from plants. In the rat, inclusion of papain in the diet leads to a retardation of growth until cystatin S is induced ( Naito et al., 1992; Ninomiya et al., 1994). Introduction of high levels of dietary tannins leads to a similar retardation of growth. Growth resumes after 3 days, associated with a 12-fold increase in levels of parotid (but not SMG) proline-rich proteins (PRPs) ( Lu and Bennick, 1998, and references therein). The PRPs bind tannins and are thought to provide protection against their deleterious effects. Human cystatins SN and SA (although not cystatin S) are good inhibitors of papain and related enzymes (see Table ; Baron et al., 1999a). Thus, these cystatins could function to block the noxious effects of dietary CPs, and to protect salivary proteins from degradation. If we assume that our primate ancestor was continuously exposed to such enzymes, it would make sense for our salivary cystatins to be continuously expressed. Further, their induction to high levels of expression in human infants between 2 and 9 months of age would be coincident with initial exposure to CPs in solid food.
(3) Immunomodulation
Cystatins have been shown to have a wide range of effects in immune cells. Pre-treatment of human neutrophils with chicken cystatin (and the broad CP inhibitor E-64) was found to inhibit chemotaxis induced by C5a, but not by formyl-methionine-leucine-phenylalanine (fMLP) or interleukin-8 (IL-8) ( Barna and Kew, 1995). Inhibition had a sharp optimum around 10 μM, which is above normal physiological levels of cystatin C, but may reflect the use of a non-human protein. Human cystatin C variants with (des 1-4) and without (des 1-8) an N-terminal KPPR extension, as well as the tetrapeptide itself (called postin), were found to stimulate human neutrophil migration at physiological concentrations (0.1-2 μM), with the (des 1-8) form being most active ( Leung-Tack et al., 1990b). All three demonstrated chemokinetic activity (movement in the absence of a gradient), while only (des 1-4) cystatin C was active in chemotaxis. (Des 1-4) cystatin C and postin were also found to be potent inhibitors of superoxide production and phagocytosis in a dose-dependent manner: Inhibition at the physiological concentration of 0.1 μM was near 50%. In contrast, (des 1-8) was inactive ( Leung-Tack et al., 1990c). Thus, the KPPR N-terminal sequence, regardless of attachment to the rest of the molecule, would appear to mediate the inhibition. Rat cystatin C was also found to inhibit superoxide production and the phagocytic activity of rat neutrophils ( Leung-Tack et al., 1990a). These results strongly suggest a role for cystatin C, and N-terminal processing, in the modulation of neutrophil behavior. The mechanism remains to be determined, although the similarity of postin to the tetrapeptide neutrophil activator tufsin (TKPR) is suggestive ( Leung-Tack et al., 1990c).
Activated Th1 cells produce the macrophage-activating cytokines IL-2 and interferon-γ (IFN-γ), and nitric oxide (NO) production by cytokine-activated macrophages plays a key role in killing parasitic protozoa. Leishmania donovani, a cause of significant human morbidity and mortality, evades the human immune response by suppressing a parasite-specific Th1 response, and producing a non-protective Th2 response. Physiological levels of chicken cystatin (0.1-0.5 μM) were found to stimulate a six- to eight-fold increase in NO production by IFN-γ-activated murine macrophages, but not in unstimulated cells ( Verdot et al., 1999; Das et al., 2001). The mechanism governing the increase in NO induction has been partially established ( Verdot et al., 1999). It does not depend on CP inhibition. Addition of cystatin after macrophage activation considerably increases NO production, but addition prior to activation does not. The NO increase results from an increased synthesis of NO synthase protein. However, chicken cystatin induces synthesis of tumor necrosis factor-α (TNF-α) and IL-10 in murine macrophages with or without IFN-γ-activation. TNF-α, but not IL-10, mimics the effect of cystatin on IFN-γ-activated macrophages. IL-10 is synergistic with TNF-α if added after macrophage activation, but it inhibits NO production if present before, which explains the lack of induction if cystatin is added prior to activation. Thus, cystatins can modulate macrophage responses to IFN-γ. Cystatin C secretion by monocytes and macrophages is itself decreased by IFN-γ and LPS ( Warfel et al., 1987). Significantly, mice given a lethal dose of L. donovani were completely cured by a combination of chicken cystatin and a suboptimal dose of IFN-γ, which was found to suppress the Th2 response and provide a protective Th1 response ( Das et al., 2001).
CPs have essential functions in antigen-presenting cells (reviewed in Riese and Chapman, 2000; Dickinson, 2002). Cystatin C has been shown to have a surprising role as an intracellular modulator of MHC class-II-mediated antigen presentation in peripheral dendritic cells by controlling cathepsin-S-mediated degradation of the invariant chain (Ii) ( Pierre and Mellman, 1998). In peripheral dendritic cells, cathepsin S mediates cleavage of a p10 Ii intermediate to CLIP, which prevents targeting of the MHC class II molecules to the lysosomes for degradation. The CLIP fragment can then be replaced by peptide antigen fragments in the endosomes and the new complex trafficked to the cell membrane for presentation to T-cells. Immature peripheral dendritic cells efficiently endocytose antigen, but they are inefficient at presentation in part because they do not efficiently process Ii. After migration to lymphoid tissue, the cells mature, and they become the most potent of the antigen-presenting cells (APCs). Maturation was found to be accompanied by an increase in cathepsin S activity mediated not via changes in cathepsin S levels but via changes in intracellular cystatin C. Cystatin C was found in the endocytic pathway of immature cells, probably as a result of trafficking through the Golgi. The levels decreased during maturation, and the intracellular distribution shifted to the Golgi complex. This would cause an increase in endosomal cathepsin S levels, and a concomitant increase in Ii processing and antigen presentation. Cathepsins K and F can also degrade Ii. Cathepsin F is widely distributed. Cathepsin K is found in bronchial epithelial cells that can serve as non-professional APCs. Since salivary and lacrimal glands express MHC class II molecules and may function as non-professional APCs ( Yang et al., 1999), the interesting speculation arises that SD-type cystatins may have an intracellular function in regulating endosomal processing and peptide presentation by these tissues.
Excreted-secreted cystatins have now been described from several parasitic nematodes (reviewed in Maizels et al., 2001). Two abundant cystatins from Brugia malayi (which causes filariasis) are potent inhibitors of lysosomal CPs, including legumain ( Manoury et al., 2001). Parasite cystatins modify the host immune response and allow the parasite to evade it by interfering with the presentation of certain antigen epitopes, or by down-regulating murine T-cell responses to mitogens, receptor cross-linking and specific antigens, and by up-regulating IL-10 levels ( Hartmann et al., 1997; Dainichi et al., 2001; Manoury et al., 2001). To block antigen processing, the cystatin must be taken up by the cells, but the mechanism remains to be determined. Cystatin C can be internalized by Chinese hamster ovary (CHO) cells and trafficked to the lysosomes ( Merz et al., 1997). Interestingly, it is dimerized in the lysosomes, which would render it inactive. However, the physical state of cystatin trafficking through endosomes is unknown. Parasitic protozoa have also been shown to produce CP inhibitors, although their relationship to cystatins, if any, remains to be established ( Irvine et al., 1992).
Given their role in immune responses, it is perhaps not surprising that cystatins (e.g., cat Fel d3; Ichikawa et al., 2001), CPs (e.g., dust mite der P1; John et al., 2000, and references therein), and papain-related enzymes ( Mansfield et al., 1985) can be potent allergens. Analysis of the above data argues for an intimate relationship between cystatins and regulation of the immune system. The restriction of cystatin F expression to hematopoietic cells makes it a prime candidate for a role in immunomodulation. An exciting possibility is that SD-type cystatins in saliva (and other fluids) could modulate antigen presentation in oral dendritic cells. Co-production of cystatins and CPs may be as important to establishing control systems governing the immune response as it is to controlling extracellular proteolysis (see above). It would also be of interest to learn if methotrexate, which has anti-inflammatory properties, can alter cystatin expression in immune cells, as it can in rat sebaceous glands ( Cohen et al., 1996).
(4) Antimicrobial and antiviral activities
Consistent with a defensive function, the cystatin isolated from horseshoe crab hemocytes has antimicrobial activity against Gram-negative bacteria, with IC50s against Salmonella typhimurium, Escherichia coli, and Klebsiella pneumoniae in the 80- to 100-μg/mL range ( Agarwala et al., 1996). In contrast, cystatin C was shown to be ineffective in the inhibition of 190 strains representing 13 bacterial species ( Bjorck, 1990). However, both chicken cystatin and human cystatin S were found to inhibit growth of P. gingivalis with an IC50 of 1.1 and 1.2 FM, respectively, with an apparent bactericidal activity ( Blankenvoorde et al., 1996). Similarly, rat cystatin S has been reported to inhibit growth of P. gingivalis but not of 17 other species tested ( Naito et al., 1995). Growth inhibition by human cystatin S appears to involve the presence of antimicrobial peptide sequences, rather than CP inhibitory activity ( Blankenvoorde et al., 1998).
Several studies argue for a general ability of cystatins to inhibit viral replication. Oryzacystatins have been shown to be potent inhibitors of herpes simplex virus-1 (HSV-1) replication following virus adsorption to monkey kidney epithelial cells ( Aoki et al., 1995). Significantly, daily administration for one week of 7.5 μg of oryzacystatin I as eye drops to mice that had received a lethal eye innoculum of HSV-1 led to a 67% survival rate after 14 days, compared with 0% in the controls. This effect was comparable with that of acyclovir used as a positive control. Human cystatin C was found to block HSV-1 replication completely, with an activity comparable with that of acyclovir (reviewed in Bjorck, 1990). Although cystatin-S-like proteins also inhibited HSV-1 replication, they were not as effective as cystatin C ( Gu et al., 1995). Cystatin C is also an effective inhibitor of replication of coronavirus, which can cause acute gastroenteritis, at slightly above physiological levels ( Collins and Grubb, 1991). Cystatin D at physiological levels (0.12-1.9 FM) has been found to inhibit coronavirus replication in human lung cells ( Collins and Grubb, 1998). Chicken cystatin was shown to block poliovirus replication partially, although neither human cystatin C nor rat cystatin S had an effect on poliovirus replication (reviewed in Bjorck, 1990; Naito et al., 1995). Collectively, these results suggest that cystatins are taken up by cells (also see above), where they can interfere with events in viral replication that require host or viral CPs, such as capsid maturation. However, the targets and mechanisms of inhibition are unknown. If viral inhibition is a function of SD-type cystatins, the target for which they were selected is also unknown. The inhibition of coronaviruses at physiological levels is certainly suggestive. The eye and oral cavity are entry points for adenoviruses. Recently, mixtures of cystatins purified from tears and saliva (thus predominantly S, SA, and SN) were shown to inhibit adenain, a CP encoded by the adenovirus genome that is essential for infectivity, with an estimated Kd of 1.2 nM, although the in vitro binding was considered too weak for a significant role in viral inhibition ( Ruzindana-Umunyana and Weber, 2001). Further tests in vivo would be of interest.
(5) Control of mineralization
S-like cystatins are major components of the pellicle (reviewed in Bobek and Levine, 1992). Phosphorylated S-like cystatins bind to hydroxyapatite. Binding is reduced but not eliminated by dephosphorylation. They also inhibit calcium phosphate precipitation, although only one-tenth as well as statherin (reviewed in Lamkin and Oppenheim, 1993). Molecular modeling of cystatin S identified two negatively charged regions that could potentially be involved in binding to hydroxyapatite ( Bell et al., 1997). One was near the acidic N-terminal region of the α-helix 1, and a tryptic peptide containing this region has been shown to bind hydroxyapatite. The sequence DXDXXDE in this region has similarity to highly conserved acidic motifs in the fetuins. The second region was in the α-helix 2/loop. There is good evidence for a role for mammalian fetuins in regulation of mineralization. They are present at high concentrations in fetal serum, and become concentrated in the mineralized tissues. Recombinant HRG and α2-HS inhibit the formation of apatite from solution ( Schinke et al., 1997). HRG also binds to heparin and fibrinogen, while α2-HS
The relatively poor inhibitory activity and higher hydroxyapatite-binding properties of human cystatin S are consistent with a specialized role for this protein in mineralization. However, if that is the case, why is it expressed—often at significant levels—in tissues that produce secretions that bathe non-mineralized surfaces, such as the lacrimal and tracheal glands?
(6) Other possibilities
Of potential relevance to the role of cystatins in cancer, chicken cystatin has been shown to have growth factor activity with mouse fibroblasts ( Sun, 1989). Further, the effect was more pronounced on transformed cells. The glycosylated form of rat cystatin C was shown to be an autocrine/paracrine growth factor required for FGF-2-induced neural stem cell proliferation ( Taupin et al., 2000). The non-glycosylated form was actually inhibitory. Since this would seem to be a critical function, it is curious that human cystatin C is non-glycosylated and would presumably be non-functional in this role, which might be taken over by the glycosylated cystatins F or E/M. Any effects of SD-type cystatins on cell growth, such as oral epithelial cells, have not been characterized.
(VI) Cystatin Phylogeny and Function
With the possible exception of growth factor activity, none of the potential functions outlined above is unique to a particular branch of the cystatin superfamily: Indeed, similar characteristics can be traced through a phylogeny covering around 1 billion years of evolution. This suggests that the functions mediated by the SD-type cystatins are not due to some peculiarity of their structure, but may represent an application or modification of ancient systemic protective and defensive functions to benefit the oral cavity (and other tissues) of terrestrial animals. Phytocystatins are likely the best representatives of the most ancestral cystatin. Although relatively simple, these are multifunctional proteins. They have been shown to be antiviral (see above), and to suppress the growth of nematode and insect pests, presumably by inhibiting gut enzymes required to digest their food (e.g., Koiwa et al., 2000). In plants, they are induced by fungal infection, wounding, and environmental stresses, consistent with a general protective role ( Pernas et al., 2000). They also have a role in plant development ( Abe et al., 1992). These functions were probably also mediated by the early animal cystatin, since they are seen in vertebrate cystatins (see above). The Tachypleus type 2 cystatin is localized in the hemocytes, which are specialized cells in the hemolymph of invertebrates involved in innate immunity. They release stored antimicrobial substances that kill infecting pathogens (reviewed in Iwanaga et al., 1998). Thus, at least one initial function of type 2 cystatins in animals appears to have been protection of the host, perhaps through both direct action on pathogens and inhibition of inappropriate proteolysis. The Tachypleus cystatin has much more powerful antibacterial activity than proteins in the vertebrate cystatin C-SD lineage, perhaps reflecting specialization of this function in the vertebrate cathelicidin branch of the superfamily. Inhibition of P. gingivalis might be a unique property of proteins in the cystatin C-SD branch. Clearly, SD-type cystatins do not prevent periodontal disease, but they could help control the population of this micro-organism in the oral cavity. Good antiviral activity has been preserved in the cystatin C-SD branch. Given the activity of cystatin D against coronavirus, inhibition of specific viruses (although probably not HSV-1) is a good candidate function for SD-type cystatins. Expression of cystatin E/M and rat cystatin S at the skin surface emphasizes consideration of antipathogen functions.
Vertebrate cystatin C has evolved as a powerful broad-spectrum inhibitor of host (and exogenous) CPs. Several lines of evidence (see above) are consistent with a central role for cystatin C in the regulation of lysosomal cathepsins released during normal and pathological conditions. This breadth in inhibitory activity has involved some trade-offs: Amino acid substitutions could increase the affinity for certain enzymes, but at the cost of reducing the affinity for others ( Mason et al., 1998). When the inhibitory activities of the SD-type cystatins (see Table ) are compared with their phylogenetic origins (see Fig. 4 ), a clear trend is apparent: Successive generations of salivary cystatins are progressively less active against the host lysosomal cathepsins B, H, and L. Cystatin S, which has evolved most recently, is physiologically inactive. Differences among cystatins in the three evolutionarily conserved regions reflect a tuning of the inhibitory spectrum during evolution related to function. Thus, given the interplay between minor changes in cystatin structure and inhibitory profile (see above), the evolutionary trend for the SD-type cystatins appears to have been one of considerable specialization, and apparently away from the capability of a general regulation of endogenous lysosomal cathepsins. A common theme in the evolutionary history of the type 2 cystatins is targeted expression to certain secretory epithelial populations. Even the ubiquitously expressed cystatin C is expressed at high levels in the choroid plexus epithelium, and cystatin E/M in breast luminal epithelial cells and eccrine sweat ducts. Both the SD-type cystatins and the unusual rodent CRP genes are expressed in acinar cells of various glands (see above). A general control of proteolysis at the surfaces bathed by these epithelial secretions is undoubtedly important for the integrity of other proteins in the secretion, and the tissues themselves. However, it seems unlikely that selection would produce inhibitors that are less suited to the task of general inhibition of endogenous CPs than the ancestral gene. This would require the production of large amounts of secreted protein to compensate for reduced inhibitory activity. This argument does not preclude the possibility that SD-type cystatins have evolved to inhibit a very narrow, specialized subset of host CPs, the majority of which remain to be tested. Phylogenetically based structure-function studies may provide important clues to target enzymes. For example, of the host cathepsins B, H, L, and S, cystatin D most effectively inhibits cathepsin S (Table ). However, studies of the contribution of the N-terminal region of cystatin D to cathepsin binding (see above) demonstrated that cystatin D A10, instead of cystatin C V10, reduces the affinity for cathepsin S (as well as cathepsin L). This suggests two possibilities: Cathepsin S is not a normal target for cystatin S (and during evolution, either this change was not important, or it was important to achieve better inhibition of a CP with a shallow S2 subpocket), or, alternatively, cathepsin S is a normal target, and it was important to achieve better discrimination against cathepsin L. A remarkably simple hypothesis, consistent with their specialized properties, is that high-level, constitutively expressed SD-type cystatins evolved in primates to protect mucosa and their secretions from dietary and environmental CPs, while being unable to interfere to a great extent with the activity of endogenous CPs.
The vertebrate type 2 cystatins have maintained a role in innate immunity, but they have also taken on new roles in acquired immunity since its evolution in vertebrates over 400 MYa. The SD-cystatins could have a role in immunomodulation at particular mucosal surfaces. The role of cystatin C in response to brain injury suggests a defensive function reminiscent of that seen in plant cystatins. Induction of rat cystatin S by a variety of potentially injurious agents in both the SMG and the kidney strongly hints at a role in a response to injury for this protein. Aside from the inhibition of a specific CP, this could reflect growth factor activity. The expression pattern of the CRPs, which lack CP-inhibitory activity, highlights the need to consider other roles for cystatins, such as immunomodulators or growth factors. A growth factor role would also be consistent with a peak in expression of rat cystatin S and human SD-type cystatins during post-natal SMG development.
(VII) Animal Models for Salivary Cystatin Function
Plaque-resistant and plaque-susceptible populations of the Wistar-Kyoto strain of rats have been bred. The susceptible animals develop heavy plaque formation, gingivitis, and periodontal pockets when fed commercial chow, whereas the resistant animals do not (see Abe et al., 1998, and references therein). Significantly, plaque-resistant rats have a dramatically higher level of cystatin S by 12 weeks of age than do plaque-susceptible rats, which have only trace levels ( Abe et al., 1998). However, after IPR induction, both groups have comparable high levels. As outlined above, the data are consistent with a role for human salivary cystatins in control of the oral microflora, and for biomineralization. Therefore, as noted by the authors, cystatin S in rats might have a role in plaque resistance. It would be of interest to know the pattern of salivary cystatin expression in wild rat populations. It might be anticipated that a salivary protein that appears to convey a significant selective advantage would be constitutively expressed. Why then is there an underlying regulation governed by β-adrenergic receptors? It is possible that rat cystatin S levels are simply acting as a marker for some other genetically linked property in these animals that remains to be discovered. This rat model clearly warrants further exploration.
The human CST1 gene is expressed in the parotid gland of transgenic mice, and immunologically reactive protein is present at significant levels in the saliva ( Dickinson and Thiesse, 1995, and unpublished observations). This opens up the possibility for human salivary cystatin function to be explored in a genetically manipulable system. However, as noted above, little is known regarding salivary cystatins in mice. Such knowledge is a prerequisite for functional studies of transgenic animals. It would be advantageous to know the phylogenetic distribution of salivary cystatin genes, their expression patterns (i.e., constitutive or inducible), and whether the genes are orthologs or examples of convergent evolution. Such information could provide important clues to the function of cystatins in saliva.
(VIII) Summary and Conclusions
Extensive progress has been made in characterization of the SD-type cystatin genes, their expression in human tissues, and structure-function relationships in the encoded proteins. The exact relationship between the SD-type cystatins and the rat salivary cystatin S remains to be determined. We still do not know what roles these proteins play in the oral cavity or elsewhere in the body, and many important questions are still unanswered. The functions ascribed to SD-type cystatins would seem to be required continuously in all species. However, to date, cystatins in saliva have been reported only in humans (where they are constitutive) and rats (where they are inducible from trace levels, at least in some strains). Are constitutively expressed salivary cystatins limited to primates? If so, how do other mammals maintain oral health in the absence of such apparently important proteins? The emphasis of functional studies thus far has been on oral relevance. However, their patterns of expression clearly indicate that the term ’salivary cystatin’ is something of a misnomer, since some genes are expressed at several other sites in the body. What do these proteins do at these sites? The SD-type cystatins are quite clearly secreted proteins, so most studies have assumed an extracellular function. However, the increasing evidence that type 2 cystatins can be taken up, at least by certain cells, and entered into the endosome pathway raises several interesting possibilities. Do SD-type cystatins have a role in immunomodulation? Has the role of SD-type cystatins in the oral cavity been unknowingly examined in structure-functional studies with papain inhibition as a measure of activity? Are they there to protect us from what we eat? Finally, regardless of their function, the human SD-type cystatins and the rat cystatin S provide important models for our understanding of gene regulation during the development of the salivary glands.

Protein sequence alignment of select cystatins. The aligned sequences extend from 2-3 residues N-terminal to the conserved glycine to the known C-terminal. The alignment was generated and formated essentially as described ( Dickinson, 2002). The gaps around the N39 residue involved in legumain inhibition were adjusted manually. Abbreviations used: SA S, SN, D, C, E/M, and F: the corresponding human type 2 cystatins (GenBank accession numbers NP_001313, NP_001890, NP_001889, NP_001891, NP_000090, NP_001314, and NP_003641, respectively); Rat S, rat salivary cystatin S (P19313); chicken, chicken egg white cystatin (P01038); adder, puff adder (Bitis arietans) venom cystatin (P08935); crab, horseshoe crab Tachypleus tridentatus hemocyte cystatin (JC4536); C. elegans, Caenorhabditis elegans cystatin R01B10.1 (NP_504565); and rice, oryzacystatin I (P09229). A majority consensus sequence and the conserved disulfide bonds are shown below the alignment. The positions of conserved domains are indicated above the alignment (see text). Numbers above the alignment indicate residue number (with the conserved glycine numbered position 11 [cystatin C numbering]). Secondary structure regions (based on chicken cystatin, see text) are indicated above. β-A to β-E denote the five β-strands. α-1 denotes α-helix 1; α-2/loop denotes the region that forms an α-helix in the crystal, but a loop in solution (see text).

Known and predicted structures of cystatins. The known structures of oryzacystatin (1EQK.pdb) (Panel A), chicken egg white cystatin (CEW1.pdb) (Panel C), and the predicted structures of C. elegans RO1B10.1 (Panel B) and human cystatin S (Panel D) are shown. The alignment in Fig. 1 was the basis for homology modeling by means of Deep View (SWISS-MODEL; Guex and Peitsch, 1997). The prominent α-helix 1 is positioned in the center of all four structures, and the α-2/loop region (absent in oryzacystatin) above. The conserved G, QXVXG, and PW regions, and the two disulfide bonds (SS1 is N-terminal, SS2 is C-terminal), are labeled on the chicken cystatin structure, and shown as space-filling atoms in all structures where present. The legumain inhibition region N39 (Leg) is also indicated on chicken cystatin.

N-terminal extensions and cleavage sites. The N-terminal regions (excluding secretory peptide leader sequences) of the human type 2 cystatins are shown up to the conserved G11. Cleavage sites detected in saliva or the purified protein are shown by an arrow ( Al-Hashimi et al., 1988; Saitoh et al., 1988; Popovic et al., 1990; Baron et al., 1999a). +None reported.

Phylogenetic tree of selected vertebrate type 2 cystatins. The alignment shown in Fig. 1 was used to generate a tree by neighbor-joining by means of the PAUP 4.0b8a software package ( Swofford, 2000). The scale shows the genetic distance along branch lengths. Bootstrap values were obtained from 100 replicates and are shown as a percentage beside the branches. The large arrow indicates a possible position for the root of the tree, with proteins on the cystatin C branch of the tree to the top. Abbreviations used: hCys, human cystatin; rat S, rat salivary cystatin S; and adder, puff adder (Bitis arietans) venom cystatin.

Chromosomal organization of the type 2 cystatin and CRES genes at 20p11.2. Identified genes in the most recent version of the public domain human genome sequence map (http://www.ncbi.nlm.nih.gov/) are shown. Arrowheads denote the gene orientation, but are not to scale. The open box denotes a gap in the sequence. See text for details concerning CSTP1, CSTP2, CST1, 2, and 5.
Footnotes
Acknowledgements
The author thanks Dr. Anita Baron for providing additional information on S-like cystatins and their inhibitory properties, and Jason Rueggeberg for assistance with manuscript preparation. This work was supported by the Medical College of Georgia School of Dentistry, and previous grants from the NIDCR and NEI.
