Abstract
Furin is a proprotein convertase that proteolytically cleaves protein precursors to yield functional proteins. Efficient cleavage depends on the presence of a specific sequence motif on the substrate. Currently, the cleavage site motif is described as a four amino acid pattern: R-X-[K/R]-R⇓. However, not all furin cleavage recognition sites can be described by this pattern and not all R-X-[K/R]-R⇓ sites are cleaved by furin. Since many furin substrates are involved in the pathogenesis of viral infection and human diseases, it is important to accurately characterize the furin cleavage site motif. In this study, the furin cleavage site motif was characterized using statistical analysis. The data were interpreted within the 3D crystal structure of the furin catalytic domain. The results indicate that the furin cleavage site motif is comprised of about 20 residues, P14-P6′. Specific physical properties such as volume, charge, and hydrophilicity are required at specific positions. The furin cleavage site motif is divided into two parts: 1) one core region (8 amino acids, positions P6-P2′) packed inside the furin binding pocket; 2) two polar regions (8 amino acids, positions P7–P14; and 4 amino acids, positions P3′-P6′) located outside the furin binding pocket. The physical properties of the core region contribute to the binding strength of the furin substrate, while the polar regions provide a solvent accessible environment and facilitate the accessibility of the core region to the furin binding pocket. This furin cleavage site motif also revealed a dynamic relationship linking the evolution of physical properties in region P1′-P6′ of viral fusion peptides, furin cleavage efficacy, and viral infectivity.
Background
Many secreted proteins are synthesized as precursors that require proteolytic cleavage of part of the polypeptide to become fully functional. Furin, a proprotein convertase involved in the proteolytic cleavage of many of these protein precursors, cycles between the trans-Golgi network and the cell surface.1,2 In the secretory pathway, furin recognizes the cleavage motif on the protein precursors and cleaves the substrates, resulting in functional proteins.
Furin substrates include both host proteins and pathogen proteins. The known host substrates of furin cleavage include precursors of extracellular matrix proteins, extracellular proteases, receptors, and hormones. Many pathogenic proteins, such as fusion proteins on the viral envelope and secreted toxins of bacteria, require furin mediated cleavage to initiate pathological infectivity. The cleavage of these furin substrates influences the molecular pathogenesis of a wide range of human diseases, including cancer, neurological disorders, and various viral infections.1–4 Therefore, an accurate description of the furin cleavage site motif is crucial for understanding the molecular mechanism of furin cleavage-mediated viral infections and furin cleavage-associated human diseases.
Many furin cleavage sites share the sequence pattern R-X-[K/R]-R⇓ However, not all furin cleavage sites include this pattern, e.g. the cleavage sites of human albumin precursor RGVFRR 24 ⇓ DA (gi|4502027) 5 and human C-type natriuretic peptide precursor RLLR73⇓DL. 6 In addition, not all sites comprising R-X-[K/R]-R are efficiently cleaved by furin, e.g. a mutated form of Sindbis Virus PE2 protein SGRSKR328⇓LV (position 328 of Sindbis virus polyprotein P130 containing PE2 protein). 7
In this report, a statistical analysis of 130 known furin cleavage sites was performed, and the consequence of biochemical mutations of the furin cleavage site was examined. The statistical data were interpreted within the 3D crystal structure of the furin catalytic domain solved by Henrich et al. 8 A sequence motif of about 20 amino acids coding for a furin cleavage recognition site was revealed. The motif is divided into two regions: a core region (8 amino acids, P6–P2′) and two polar regions (8 amino acids, positions P7–P14; and 4 amino acids, positions P3′–P6′). In addition, specific physical properties are required at specific positions. Characterization of the furin cleavage site motif suggested a relationship between furin cleavage, the evolution of the physical properties in region P1′-P6′ of viral fusion peptides, and the infectivity of dozens of viruses.
Materials and Methods
Collection of known furin cleavage sites
The furin cleavage sites were collected through both published literature via PubMed and the Swissprot database annotation. 9 Only published and experimentally verified furin cleavage sites with positions of the cleavage sites indicated were included for statistical analysis. In total, 130 published and experimentally verified furin cleavage sites were used. 10 The table of substrates, their taxons, and taxons of furins that are responsible for cleavage are publicly available on the associated website www.nuolan.net.
Statistical analysis using physical properties
Furin is an extracellular enzyme and furin cleavage generally takes place after the secreted signal peptides of a protein sequence are cleaved off. Therefore, secreted signal peptides in the original proprotein sequences were substituted with gap symbols.
The nseq = 130 furin cleavage sites were aligned.
Physical properties of amino acids ν were retrieved from the AA index database (http://www.genome.jp/aaindex). 11
An approach similar to the procedure described by Georg Neuberger 12 was used to calculate deviations devfurin (i) and correlation coefficients Rfurin (i) of physical properties v at a specific position i.
To reduce the sequence redundancy, a sequence weight Wk was assigned to each furin cleavage site by the sum of mismatch method. 13
deVfurin (i): deviation of physical property ν from UNIREF database. 14
νfurin (i): mean value of physical property ν of collected known furin cleavage sites at a specific position i.
ν Uni Re f and δ Uni Re f: mean value and standard deviation from UNIREF database. 14
To analyze the physical properties ν at a specific position, position i, the correlation coefficients Rfurin(i) between two variables were calculated:
the vector of 20 observed amino acids counts cfurin(a, I) and
the vector ν of 20 amino acid physical property value ν a
Correlation coefficients were interpreted within the 3D structure of the dec-RVKR-cmk inhibited mouse furin binding pocket (Protein Data Bank accession code IP8J). 8 To model the 3D structure and analyze the active site, the procedures described by Guex et al. on the Swiss–Pdb Viewer site http://www.expasy.org/spdbv/were followed. 15
Results and Discussion
Length of the furin cleavage recognition site motif
Many furin cleavage site motifs share a short four amino acids pattern: R-X-[K/R]-R⇓, which is located inside the binding pocket of furin. However, one plausible structural model of the furin cleavage site motif comprises not only a short core region located inside the furin binding pocket, but also two extended polar regions located outside the binding pocket that facilitate solvent accessibility. If this hypothesis is correct, consecutive polar and flexible regions encompassing the furin cleavage site are expected. To test this hypothesis, the mean values of two representative physical properties, EISD840101, a consensus normalized hydrophobicity scale of amino acids, 16 and VINM940101, normalized flexibility parameters of amino acids, 17 of 130 known furin cleavage sites were compared with the background value of the UNIREF database. 14
As shown in Figure 1, the continuous deviations demonstrate the existence of flexible and polar regions on both sides of the furin cleavage site. By setting the threshold to 10% of the sum of absolute values of the two deviations, the plot identified two consensus boundary positions, P14 and P6′, although the exact boundaries of the furin cleavage site motif might vary from substrate to substrate. Therefore, the furin cleavage recognition site comprises about 20 amino acids, from position P14 to position P6′.

Motif length and boundaries (20 amino acids, P14-P6′) of the furin cleavage site. The graph illustrates the deviations of mean values of the hydrophobicity scales EISD840101 16 and the flexibility parameter VINM940104 17 over 51 positions (P26-P25′), encompassing the 130 known furin cleavage sites. The furin cleavage site is indicated as a vertical red line. The UNIREF average corresponds to 0% deviation and is indicated as a horizontal black baseline. The mean values of the physical properties were calculated as the deviation from the UNIREF average in percent of the UNIREF standard deviation. The plot was smoothed by running the average over three positions. Continuous deviations were observed on both sides of the furin cleavage sites. By setting the threshold to 10% of the sum of absolute values of the two deviations, the plot identified two consensus boundary positions: P14 and P6′ (shown in purple), although the exact boundaries of the furin cleavage site motif might vary from substrate to substrate.
Physical properties of the core region
Position P1
In the data set collected from the published literature, all 130 furin cleavage sites have a positively charged arginine residue at position P1. A mutation of this arginine at position P1 diminished the detectable furin cleavage,18–21 indicating that arginine is required at position P1. In fact, mutations of arginine at position P1 are routinely used as negative controls in biochemical experiments that test furin cleavage.
Positions P2, P3, and P4
Positively charged residues arginine and lysine were the residues most often found at position P2. A high correlation coefficient (0.99) for the positive charge property ZVEL_CH_P1 22 was observed at this position (Table 1, R2). As shown in Figure 2, a positively charged residue at position P2 extends its side chain to form favourable interactions with negatively charged aspartic acids 153, 154, and 191 in the furin binding pocket.
List of selected correlation coefficients for physical properties (The selected physical properties provide the best representation and most concordant results. The complete list is available on the associated website www.nuolan.net).

Interactions between the furin binding pocket and substrate residues at positions P1–P4. At position P1, a positively charged arginine residue forms specific interactions with a densely negatively charged ring formed by glutamic acid 257, aspartic acid 258, aspartic acid 301, aspartic acid 306 and glutamic acid 331 of the furin binding pocket. At position P2, a positively charged lysine residue tends to form interactions with negatively charged aspartic acid 153, aspartic acid 154 or aspartic acid 191 of the furin binding pocket. At position P3, the side chain of valine is oriented outwards and it does not appear to form any specific interaction. At position P4, a positively charged arginine tends to form specific interactions with glutamic acid 236 and aspartic acid 264 of the furin binding pocket. This structure was modeled based on the published 3D structure of the furin catalytic domain in complex with an inhibitor (Protein Data Bank accession code IP8J). 8 The figure was generated with Swiss-Pdb Viewer. 47
Interestingly, ~20% of the substrates possess residues other than arginine or lysine at position P2, notably small residues such as glycine, alanine, serine, and threonine. As indicated in Figure 2, the positively charged arginine at position P1 interacts with the negatively charged ring formed by glutamic acids 257 and 331 and aspartic acids 258, 301, and 306, while the arginine at position P4 interacts with glutamic acid 236 and aspartic acid 264. A small, flexible residue at position P2 might facilitate the furin cleavage process by allowing residues at positions P1 and P4 to form those specific interactions with the furin binding pocket. Indeed, furin substrate specificity assays revealed that motifs like KRTTR⇓ were sometimes more efficiently cleaved than some of the canonical R-X-[K/R]-R⇓ motifs. 23 Conversely, bulky and rigid aromatic residues like phenylalanine, histidine, tyrosine, and tryptophan might not be accepted at position P2 because they may impose local structural constraints. A mutation study of human albumin precursor (gi|4502027) reported that furin cleavage efficiency was greatly reduced when the arginine at position P2 was replaced with histidine, a positively charged residue with a rigid ring structure on its side chain. 5
At position P3, a preference for a positive charge was observed, but the correlation coefficient for the positive charge property ZVEL_CH_P1 22 was much weaker (0.69; Table 1, R4). This is consistent with the structural environment surrounding position P3. As Figure 2 shows, the side chain of the residue at position P3 orients outward and does not appear to form a specific interaction with the furin binding pocket.
Positions P4, P5 and P6
Approximately 90% of the 130 collected furin cleavage sites have positively charged residues at position P4. Most of them are arginines, rarely lysine and histidine. As Figure 3 shows, the positively charged arginine is preferred at this position because it may interact with glutamic acid 236, aspartic acid 264, and tyrosine 308 of the furin binding pocket.

The structural environment explains the compensatory effect for positive charge over positions P4, P5, and P6. At position P4, the positively charged residue arginine appears to interact with side chains of glutamic acid 236, aspartic acid 264, and tyrosine 308 of the furin binding pocket. If position P4 is occupied by an aliphatic residue instead of a positively charged residue, the loss of interaction between positive charge and negative charge at position P4 can be compensated for by the gain of interaction between positive charge and negative charge at position P5 or P6, where side chains of the positively charged residues might interact with negatively charged glutamic acid 230 and glutamic acid 257. This structure was modeled based on the published 3D structure of the furin catalytic domain in complex with an inhibitor (Protein Data Bank accession code IP8J). 8 The figure was generated with Swiss-Pdb Viewer. 47
Aliphatic residues such as valine, leucine, and isoleucine are also accepted at position P4, representing about 8% of the substrates. In that case, the absence of a positive charge at substrate position P4 was observed to be compensated by the presence of a positive charge at substrate position P5 or P6, and known examples are listed in Table 2. The 3D structure of the furin binding pocket suggests the possible structural mechanism of this compensatory effect. The absence of a positively charged residue at position P4 results in loss of favourable interactions with negatively charged glutamic acid 236 and aspartic acid 264 of the furin binding pocket. However, a positively charged residue at position P5 or P6 may interact with the negatively charged glutamic acid 230 or glutamic acid 257 of the furin binding pocket. As a consequence, the gain of an interaction between a positive charge and a negative charge at substrate position P5 or P6 compensates for the loss of the interaction between a positive charge and a negative charge at substrate position P4. In fact, 129 of the 130 sites contain at least one positively charged residue at position P4 or P6. The exception is one of the furin cleavage sites of Human Vitamin K-dependent protein C precursor (gi| 131067), 24 which has a positively charged arginine at position P5. Therefore, the absence of a positively charged residue at position P4 can be compensated for by the presence of a positively charged residue at P6, or to lesser extent at position P5.
List of known furin cleavage sites with an aliphatic residue at position P4 and a positively charged residue at position P5 or P6.
Region P1–P6
The structural environment of the furin catalytic domain that surrounds region P1–P6 of the substrate is fairly negatively charged (Fig. 2 and Fig. 3), 8 and consequently, a general trend toward basicity is favoured in this region. The significant correlation coefficients of physical properties, such as positive charge ZVEL_CH_P1 22 and isoelectric point ZIMJ680104, 25 support this supposition (Table 1, R1–R8). Therefore, the presence of positively charged residues at positions P2–P6 should increase furin cleavage efficiency.
The effect of cysteine in region P1–P6
No cysteine was found in region P2-P6 in the known furin cleavage sites, although other small hydrophilic residues, such as serine and threonine, and even small hydrophobic residues, such as alanine, glycine, and proline, were frequently observed. Cysteine is a fairly acidic residue. Its isoelectric point is 5.02, the third lowest of all amino acids. Therefore, it is possible that the fairly negatively charged environment of the furin binding pocket does not favour the presence of cysteine in region P2–P6. Another plausible explanation could be that furin is an extracellular enzyme and most of its substrates are located in the extracellular space, where the -SH groups of cysteines have a tendency to form disulfide bonds. Introducing cysteine might entirely alter the local structure of substrates in region P1–P6 and reduce furin cleavage. In fact, a mutation study of proalbumin reported that a mutation from lysine to cysteine at position P2 greatly diminished the cleavage of proalbumin. 5 The effect of introducing cysteine into region P2–P6 was also demonstrated in the case of a single nucleotide polymorphism of isoform II of Ectodysplasin-A protein (EDA). Mutated EDA with cysteine present at position P2 or position P5 was not efficiently cleaved by furin, and this mutation was found in patients with X-linked hypohydrotic ectodermal dysplasia. Loss of furin cleavage sites due to the presence of cysteine was thought to be one of the causes of this disease. 26
Position P1′–P2′
At position P1′, a preference for small hydrophilic residues was observed. A high correlation coefficient (0.73) for the tiny residues property, ZVEL_ TINY_, 22 was detected (Table 1, R14). As illustrated in Figure 4, residues at position P1′ are located in a small hydrophilic pocket formed by serine 253, asparagine 295, tyrosine 329, threonine 365, threonine 367, and serine 368 of the furin binding pocket. This small pocket can limit the volume of the residue at position P1′ and the hydrophilicity explains the preference for hydrophilic residues at this substrate position. No furin cleavage site was observed with bulky, hydrophobic residues leucine, isoleucine, valine, or tryptophan at position P1′. Moreover, furin cleavage efficiency was greatly reduced when the aspartic acid at position P1′ of the furin cleavage site RGV-FRR 24 ⇓DA of human albumin precursor was mutated to valine RGVFRR 24 ⇓VA. 5

The structural environment explains the preference for small hydrophilic residues at position P1′. At position P1′, a serine residue of the substrate (the colour of the hydrophilic serine was changed from yellow to green for visualization purposes) is packed tightly in a small hydrophilic pocket. This structural environment should generally favor a small hydrophilic residue. The presence of positively charged histidine 194 and histidine 364 suggests that negatively charged residues are also accepted at position P1′. This structure was modeled based on the published 3D structure of the furin catalytic domain in complex with an inhibitor (Protein Data Bank accession code IP8J). 8 The figure was generated with Swiss-Pdb Viewer. 47
At position P1′, a negatively charged residue is also accepted, and about 22% of the furin cleavage sites have large, negatively charged aspartic acid or glutamic acid. As Figure 4 shows, a negatively charged residue at position P1′ may stabilize itself by extending its side chain towards positively charged histidine 194 or histidine 364.
At position P2′, hydrophobic residues, particularly the aliphatic residue valine, appear to be favoured. A significant correlation coefficient (0.8) for aliphatic residues, ZVEL_ALI_2, 22 was observed (Table 1, R15). The structural environment of position P2′ is less clear, but the strong preference for valine suggests that the residue is located in a hydrophobic pocket.
Physical properties of the two flexible polar regions: P7–P14 and P3′–P6′
Exit from the core region and solvent accessibility: P7–P10 and P3′-P6′
On both sides of the furin cleavage site, P7–P10 and P3′–P6′, changes in preferred physical properties were observed. Residues in these two regions share a tendency to be polar and flexible, which is supported by the correlation coefficients for many polarity and flexibility related physical properties: HUTJ700101 (Heat capacity), 27 VASM830102 (Relative population of conformational state C), 28 ISOY800104 (Normalized relative frequency of bend R), 29 WERD780101 (Propensity to be buried inside), 30 VINM940103 (Normalized flexibility parameters), 1 7 KARP850101(Flexibility parameter for no rigid neighbours), 17 and ROSG850101(Mean area buried on transfer) 31 (Table 1, R9–R13 and R16–R20). However, neither single residue type preference nor concordantly significant correlation coefficients for charge or volume were detected. The tendency toward polarity and flexibility of P7–P10 and P3′–P6′ suggests that residues in these two regions expose their side chains to the solvent and may form linkers. These two polar linkers do not seem to interact directly with the furin binding pocket; rather, it is plausible that they extend their side chains to the solvent and form weak interactions with the polar surface of furin. Thus, the solvent accessibility of these two regions (P7–P10 and P3′–P6′) facilitates binding of the core region (P6–P2′).
Taken together, the data indicate that:
the exit substrate positions from the furin binding pocket are position P7 and position P3′.
regions P7–P10 and P3′–P6′ are located outside the furin binding pocket where they form two flexible, solvent accessible linkers.
Region P11–P14
In general, there were no significant preferences for residue type or for physical properties at single positions in region P11–P14. Nonetheless, Figure 1 illustrates that deviations of hydrophobicity and flexibility do exist and that region P11–P14 is a solvent accessible transition region.
Glycosylation of substrate residues at position P1′ may interfere with viral infectivity
The structural analysis revealed a relationship between glycosylation at position P1′ and viral infectivity. There is a limitation on residue volume at position P1′ (see section Position P1′-P2′), where small, hydrophilic residues favour furin cleavage. Surprisingly, a mutation in the PE2 protein sequence of the Sindbis virus that substituted asparagine for the wild-type serine at position P1′ resulted in defective furin cleavage. 7 According to the structural model and statistical analysis, a small, hydrophilic residue like asparagine should be acceptable at position P1′. However, N-linked glycosylation on the amide nitrogen of the asparagine side chain would increase the volume of the asparagine to an extent that hinders the substrate from fitting into the narrow furin binding pocket. Although glycosylated asparagine at position P1′ greatly reduces furin cleavage, non-glycosylated asparagine should be accepted. Indeed, non-glycosylated asparagine has been found in the final cleaved product of the PE2 protein. 32
Glycosylation is a frequently observed feature of viral proteins. Both N-linked glycosylation on asparagine and O-linked glycosylation on serine or threonine can dramatically increase residue volume. Therefore, the presence of asparagine, serine, or threonine at substrate position P1′ deserves particular attention in the analysis of furin cleavage-mediated viral infection. In addition, the observation that glycosylation interferes with furin cleavage emphasizes the importance of studying the physical properties and the structural environment of a small functional motif rather than only relying on analyzing the sequence motif itself with pattern match based approaches.
Hydrophobicity of region P3′–P6′ in viral substrates
Many virus-host fusion related viral envelope proteins require furin-mediated cleavage before they can initiate viral fusion. Surprisingly, many viral substrates include a stretch of hydrophobic amino acids instead of hydrophilic residues in region P3′–P6′. Known examples include hemagglutinin precursor of influenza A virus,33–36 fusion glycoprotein F0 precursor of Newcastle disease virus, 37 fusion glycoprotein F0 precursor of human parainfluenza 3 virus, 38 fusion glycoprotein F0 precursor of human respiratory syncytial virus, 39 fusion glycoprotein F0 precursor of measles virus strain Edmonston, 40 envelope polyprotein precursor gp72 of bovine leukemia virus, 41 envelope surface glycoprotein gp160 precursor of human immunodeficiency virus 1, 42 envelope polyprotein precursor of human T-lymphotropic virus 2, 43 and envelope polyprotein precursor gp140 of human immunodeficiency virus type 2 (isolate ST). 44 The average hydrophobicity scale of region P3′–P6′ of those viral substrates is 0.546, while the average hydrophobicity scale of region P3′–P6′ of all 130 collected furin cleavage substrates is −0.183 (the hydrophobicity scale is calculated with physical property CIDH920105 normalized average hydrophobicity scale of amino acids). 45
In theory, in order to provide a solvent accessible environment in region P3′–P6′, hydrophilic residues should be favoured (see section P7–P10 and P3′–P6′). Therefore, the increased hydrophobicity of region P3′–P6′ of the viral substrates might decrease the furin cleavage efficiency, thus decreasing viral infectivity. However, on the other hand, the hydrophobicity of this region is essential to the cellular function of the final cleaved products, which are responsible for the fusion of the viral envelope with the host's cytoplasmic membrane. 46 The virus solves this dilemma by including small, hydrophobic amino acids such as glycine, alanine, and proline in region P3′–P6′. Consequently, viral substrates maintain sufficient hydrophobicity for the viral fusion process without compromising furin cleavage efficiency. In the rapid evolution of viruses, a subtle balance between the infectivity of viral particles and the furin cleavage efficiency of these viral fusion peptides is achieved by fine-tuning the hydrophobic amino acid composition of region P3′–P6′ (Fig. 5).

The hydrophobicity of region P3′–P6′ is fine-tuned by the virus and it affects the efficiency of furin cleavage and the efficiency of viral fusion.
Conclusion
This study demonstrated that the furin cleavage site motif is comprised of about 20 residues (P14–P6′). Specific physical properties are required at specific positions and regions of this motif. The furin cleavage site motif is divided into two parts: 1) a core region (8 amino acids, position P6–P2′) inside the furin binding pocket that contributes to the binding strength of the substrate and 2) two polar regions (8 amino acids, positions P7–P14; and 4 amino acids, positions P3′–P6′) outside the furin binding pocket that mainly contribute to the solvent accessibility of the substrate. The motif length and physical properties of the furin cleavage recognition site are summarized in Figure 6.

Summary of motif length and physical properties of furin cleavage site.
Footnotes
Acknowledgement and Disclosure
This work is an extension and revised study of author's PhD thesis. This work was self financed and completed in author's own time. The author received PhD degree from the Institute for Genomics and Bioinformatics at Graz University of Technology in June 2007. The author's PhD study was funded by GENAU Bioinformatics Integration Network PhD programme 2005–2007 (PhD programme and grant leader: Professor Zlatko Trajanoski at TUGraz). The author has presented his PhD study at GENAU BIN PhD students meeting (March 2007) and summarized his results in his PhD thesis (June 2007) at the Institute for Genomics and Bioinformatics at Graz University of Technology
. Sun Tian thank Professor Zlatko Trajanoski.
Conflict of Interests
The author had a full time job after PhD graduation in June 2007. This work was self financed and completed in author's own time. The author has declared that no conflict of interest exists.
