Nan Xiao, Dong-Sheng Cao, Min-Feng Zhu, and Qing-Song Xu. (2015). protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. _Bioinformatics_ 31 (11), 1857-1859.BibTeX entry: ```bibtex @article{Xiao2015, author = {Xiao, Nan and Cao, Dong-Sheng and Zhu, Min-Feng and Xu, Qing-Song.}, title = {protr/{ProtrWeb}: {R} package and web server for generating various numerical representation schemes of protein sequences}, journal = {Bioinformatics}, year = {2015}, volume = {31}, number = {11}, pages = {1857--1859}, doi = {10.1093/bioinformatics/btv042} } ``` ## An example predictive modeling workflow Here, we use the subcellular localization dataset of human proteins presented in @chou2008cell to demonstrate the basic usage of protr. The complete dataset includes 3,134 protein sequences (2,750 different proteins) classified into 14 human subcellular locations. We selected two classes of proteins as our benchmark dataset. Class 1 contains 325 _extracell_ proteins and class 2 includes 307 _mitochondrion_ proteins. We aim to build a random forest classification model to classify these two types of proteins. First, we load the protr package, then read the protein sequences stored in two separated FASTA files with `readFASTA()`: ```{r} library("protr") # Load FASTA files # Note that `system.file()` is for accessing example files # in the protr package. Replace it with your own file path. extracell <- readFASTA( system.file( "protseq/extracell.fasta", package = "protr" ) ) mitonchon <- readFASTA( system.file( "protseq/mitochondrion.fasta", package = "protr" ) ) ``` To read protein sequences stored in PDB format files, use `readPDB()` instead. The loaded sequences will be stored as two lists in R, and each component is a character string representing one protein sequence. In this case, there are 325 _extracell_ protein sequences and 306 _mitonchon_ protein sequences: ```{r, eval = FALSE} length(extracell) ``` ```{r, eval = FALSE} ## [1] 325 ``` ```{r, eval = FALSE} length(mitonchon) ``` ```{r, eval = FALSE} ## [1] 306 ``` To ensure that the protein sequences only have the 20 standard amino acid types, which is usually required for the descriptor computation, we use the `protcheck()` function to do the amino acid type sanity check and remove the _non-standard_ sequences: ```{r, eval = FALSE} extracell <- extracell[(sapply(extracell, protcheck))] mitonchon <- mitonchon[(sapply(mitonchon, protcheck))] ``` ```{r, eval = FALSE} length(extracell) ``` ```{r, eval = FALSE} ## [1] 323 ``` ```{r, eval = FALSE} length(mitonchon) ``` ```{r, eval = FALSE} ## [1] 304 ``` Two protein sequences were removed from each class. For the remaining sequences, we calculate the Type II PseAAC descriptor, i.e., the amphiphilic pseudo amino acid composition (APseAAC) descriptor [@chouapaac] and make class labels for classification modeling. ```{r, eval = FALSE} # Calculate APseAAC descriptors x1 <- t(sapply(extracell, extractAPAAC)) x2 <- t(sapply(mitonchon, extractAPAAC)) x <- rbind(x1, x2) # Make class labels labels <- as.factor(c(rep(0, length(extracell)), rep(1, length(mitonchon)))) ``` In protr, the functions of commonly used descriptors for protein sequences and proteochemometric (PCM) modeling descriptors are named after `extract...()`. Next, we will split the data into a 75% training set and a 25% test set. ```{r, eval = FALSE} set.seed(1001) # Split training and test set tr.idx <- c( sample(1:nrow(x1), round(nrow(x1) * 0.75)), sample(nrow(x1) + 1:nrow(x2), round(nrow(x2) * 0.75)) ) te.idx <- setdiff(1:nrow(x), tr.idx) x.tr <- x[tr.idx, ] x.te <- x[te.idx, ] y.tr <- labels[tr.idx] y.te <- labels[te.idx] ``` We will train a random forest classification model on the training set with 5-fold cross-validation, using the `randomForest` package. ```{r, eval = FALSE} library("randomForest") rf.fit <- randomForest(x.tr, y.tr, cv.fold = 5) rf.fit ``` The training result is: ```{r, eval = FALSE} ## Call: ## randomForest(x = x.tr, y = y.tr, cv.fold = 5) ## Type of random forest: classification ## Number of trees: 500 ## No. of variables tried at each split: 8 ## ## OOB estimate of error rate: 25.11% ## Confusion matrix: ## 0 1 class.error ## 0 196 46 0.1900826 ## 1 72 156 0.3157895 ``` With the model trained on the training set, we predict on the test set and plot the ROC curve with the `pROC` package, as is shown in Figure 1. ```{r, eval = FALSE} # Predict on test set rf.pred <- predict(rf.fit, newdata = x.te, type = "prob")[, 1] # Plot ROC curve library("pROC") plot.roc(y.te, rf.pred, grid = TRUE, print.auc = TRUE) ``` The area under the ROC curve (AUC) is: ```{r, eval = FALSE} ## Call: ## plot.roc.default(x = y.te, predictor = rf.pred, col = "#0080ff", ## grid = TRUE, print.auc = TRUE) ## ## Data: rf.pred in 81 controls (y.te 0) > 76 cases (y.te 1). ## Area under the curve: 0.8697 ``` ```{r} #| fig-roc, #| echo=FALSE, #| out.width="60%", #| fig.align="center", #| fig.cap="Figure 1: ROC curve for the test set of protein subcellular localization data." knitr::include_graphics("figures/roc.png") ``` ## Package overview The protr package [@Xiao2015] implemented most of the state-of-the-art protein sequence descriptors with R. Generally, each type of the descriptor (feature) can be calculated with a function named `extractX()` in the protr package, where `X` stands for the abbreviation of the descriptor name. The descriptors are: - Amino acid composition - `extractAAC()` - Amino acid composition - `extractDC()` - Dipeptide composition - `extractTC()` - Tripeptide composition - Autocorrelation - `extractMoreauBroto()` - Normalized Moreau-Broto autocorrelation - `extractMoran()` - Moran autocorrelation - `extractGeary()` - Geary autocorrelation - CTD descriptors - `extractCTDC()` - Composition - `extractCTDT()` - Transition - `extractCTDD()` - Distribution - Conjoint triad descriptors - `extractCTriad()` - Conjoint triad descriptors - Quasi-sequence-order descriptors - `extractSOCN()` - Sequence-order-coupling number - `extractQSO()` - Quasi-sequence-order descriptors - Pseudo-amino acid composition - `extractPAAC()` - Pseudo-amino acid composition (PseAAC) - `extractAPAAC()` - Amphiphilic pseudo-amino acid composition (APseAAC) - Profile-based descriptors - `extractPSSM()` - `extractPSSMAcc()` - `extractPSSMFeature()` The descriptors commonly used in Proteochemometric Modeling (PCM) implemented in protr include: - `extractScales()`, `extractScalesGap()` - Scales-based descriptors derived by Principal Components Analysis - `extractProtFP()`, `extractProtFPGap()` - Scales-based descriptors derived by amino acid properties from AAindex (a.k.a. _Protein Fingerprint_) - `extractDescScales()` - Scales-based descriptors derived by 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.) - `extractFAScales()` - Scales-based descriptors derived by Factor Analysis - `extractMDSScales()` - Scales-based descriptors derived by Multidimensional Scaling - `extractBLOSUM()` - BLOSUM and PAM matrix-derived descriptors The protr package integrates the function of parallelized similarity score computation derived by local or global protein sequence alignment between a list of protein sequences; Biostrings and pwalign provide the sequence alignment computation, and the corresponding functions listed in the protr package include: - `twoSeqSim()` - Similarity calculation derived by sequence alignment between two protein sequences - `parSeqSim()` - Parallelized pairwise similarity calculation with a list of protein sequences (in-memory version) - `parSeqSimDisk()` - Parallelized pairwise similarity calculation with a list of protein sequences (disk-based version) - `crossSetSim()` - Parallelized pairwise similarity calculation between two sets of protein sequences (in-memory version) protr also integrates the function for parallelized similarity score computation derived by Gene Ontology (GO) semantic similarity measures between a list of GO terms / Entrez Gene IDs, the GO similarity computation is provided by GOSemSim, the corresponding functions listed in the protr package include: - `twoGOSim()` - Similarity calculation derived by GO-terms semantic similarity measures between two GO terms / Entrez Gene IDs; - `parGOSim()` - Pairwise similarity calculation with a list of GO terms / Entrez Gene IDs. To use the `parSeqSim()` function, we suggest the users to install the packages foreach and doParallel first in order to make the parallelized pairwise similarity computation available. In the following sections, we will introduce the descriptors and function usage in this order. ## Commonly used descriptors **Note**: Users need to evaluate the underlying details of the descriptors provided intelligently, instead of using protr with their data blindly, especially for the descriptor types that have more flexibility. It would be wise for the users to use some negative and positive control comparisons where relevant to help guide the interpretation of the results. A protein or peptide sequence with $N$ amino acid residues can be generally represented as $\{\,R_1, R_2, \ldots, R_n\,\}$, where $R_i$ represents the residue at the $i$-th position in the sequence. The labels $i$ and $j$ are used to index amino acid position in a sequence, and $r$, $s$, $t$ are used to represent the amino acid type. The computed descriptors are roughly categorized into 4 groups according to their major applications. A protein sequence can be partitioned equally into segments. The descriptors designed for the complete sequence can often be applied to each individual segment. ### Amino acid composition descriptor The amino acid composition describes the fraction of each amino acid type within a protein sequence. The fractions of all 20 natural amino acids are calculated as follows: $$ f(r) = \frac{N_r}{N} \quad r = 1, 2, \ldots, 20. $$ where $N_r$ is the number of the amino acid type $r$ and $N$ is the length of the sequence. As was described above, we can use the function `extractAAC()` to extract the descriptors (features) from protein sequences: ```{r, extractAAC} library("protr") x <- readFASTA(system.file( "protseq/P00750.fasta", package = "protr" ))[[1]] extractAAC(x) ``` Here, using the function `readFASTA()`, we loaded a single protein sequence (P00750, Tissue-type plasminogen activator) from a FASTA format file. Then, we extracted the AAC descriptors with `extractAAC()`. The result returned is a named vector whose elements are tagged with the name of each amino acid. ### Dipeptide composition descriptor Dipeptide composition gives a 400-dimensional descriptor, defined as: $$ f(r, s) = \frac{N_{rs}}{N - 1} \quad r, s = 1, 2, \ldots, 20. $$ where $N_{rs}$ is the number of dipeptides represented by amino acid type $r$ and type $s$. Similar to `extractAAC()`, we use `extractDC()` to compute the descriptors: ```{r, extractDC} dc <- extractDC(x) head(dc, n = 30L) ``` Here, we only showed the first 30 elements of the result vector and omitted the rest of the result. The element names of the returned vector are self-explanatory, as before. ### Tripeptide composition descriptor Tripeptide composition gives an 8000-dimensional descriptor, defined as: $$ f(r, s, t) = \frac{N_{rst}}{N - 2} \quad r, s, t = 1, 2, \ldots, 20 $$ where $N_{rst}$ is the number of tripeptides represented by amino acid type $r$, $s$, and $t$. With the function `extractTC()`, we can easily obtain the length-8000 descriptor, to save some space, here we also omitted the long outputs: ```{r, extractTC} tc <- extractTC(x) head(tc, n = 36L) ``` ### Autocorrelation descriptors Autocorrelation descriptors are defined based on the distribution of amino acid properties along the sequence. The amino acid properties used here are various types of amino acids index (Retrieved from the [AAindex database](https://www.genome.jp/dbget/aaindex.html), see @aaindex1999, @aaindex2000, and @aaindex2008; see Figure 2 for an illustrated example). Three types of autocorrelation descriptors are defined here and described below. All the amino acid indices are centralized and standardized before the calculation, i.e. $$ P_r = \frac{P_r - \bar{P}}{\sigma} $$ where $\bar{P}$ is the average of the property of the 20 amino acids: $$ \bar{P} = \frac{\sum_{r=1}^{20} P_r}{20} \quad \textrm{and} \quad \sigma = \sqrt{\frac{1}{2} \sum_{r=1}^{20} (P_r - \bar{P})^2} $$ ```{r} #| fig-aaindex, #| echo=FALSE, #| out.width="80%", #| fig.align="center", #| fig.cap="Figure 2: An illustrated example in the AAIndex database." knitr::include_graphics("figures/AAindex.png") ``` #### Normalized Moreau-Broto autocorrelation descriptors For protein sequences, the Moreau-Broto autocorrelation descriptors can be defined as: $$ AC(d) = \sum_{i=1}^{N-d} P_i P_{i + d} \quad d = 1, 2, \ldots, \textrm{nlag} $$ where $d$ is called the lag of the autocorrelation; $P_i$ and $P_{i+d}$ are the properties of the amino acids at position $i$ and $i+d$; $\textrm{nlag}$ is the maximum value of the lag. The normalized Moreau-Broto autocorrelation descriptors are defined as: $$ ATS(d) = \frac{AC(d)}{N-d} \quad d = 1, 2, \ldots, \textrm{nlag} $$ The corresponding function for this descriptor is `extractMoreauBroto()`. A typical call would be: ```{r, extractMoreau1} moreau <- extractMoreauBroto(x) head(moreau, n = 36L) ``` The eight default properties used here are: - AccNo. CIDH920105 - Normalized Average Hydrophobicity Scales - AccNo. BHAR880101 - Average Flexibility Indices - AccNo. CHAM820101 - Polarizability Parameter - AccNo. CHAM820102 - Free Energy of Solution in Water, kcal/mole - AccNo. CHOC760101 - Residue Accessible Surface Area in Tripeptide - AccNo. BIGC670101 - Residue Volume - AccNo. CHAM810101 - Steric Parameter - AccNo. DAYM780201 - Relative Mutability Users can change the property names of the AAindex database with the argument `props`. The AAindex data shipped with protr can be loaded by `data(AAindex)` which has detailed information on each property. With the arguments `customprops` and `nlag`, users can specify their own properties and lag value to calculate with. For example: ```{r, extractMoreau2} # Define 3 custom properties myprops <- data.frame( AccNo = c("MyProp1", "MyProp2", "MyProp3"), A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101), N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59), C = c(0.29, -1, 47), E = c(-0.74, 3, 73), Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1), H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57), L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73), M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91), P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31), T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130), Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43) ) # Use 4 properties in the AAindex database and 3 customized properties moreau2 <- extractMoreauBroto( x, customprops = myprops, props = c( "CIDH920105", "BHAR880101", "CHAM820101", "CHAM820102", "MyProp1", "MyProp2", "MyProp3" ) ) head(moreau2, n = 36L) ``` About the standard input format of `props` and `customprops`, see `?extractMoreauBroto` for details. #### Moran autocorrelation descriptors For protein sequences, the Moran autocorrelation descriptors can be defined as: $$ I(d) = \frac{\frac{1}{N-d} \sum_{i=1}^{N-d} (P_i - \bar{P}') (P_{i+d} - \bar{P}')}{\frac{1}{N} \sum_{i=1}^{N} (P_i - \bar{P}')^2} \quad d = 1, 2, \ldots, 30 $$ where $d$, $P_i$, and $P_{i+d}$ are defined in the same way as in the first place; $\bar{P}'$ is the considered property $P$ along the sequence, i.e., $$ \bar{P}' = \frac{\sum_{i=1}^N P_i}{N} $$ $d$, $P$, $P_i$ and $P_{i+d}$, $\textrm{nlag}$ have the same meaning as above. With `extractMoran()` (which has the identical parameters as `extractMoreauBroto()`), we can compute the Moran autocorrelation descriptors (only print out the first 36 elements): ```{r, extractMoran} # Use the 3 custom properties defined before # and 4 properties in the AAindex database moran <- extractMoran( x, customprops = myprops, props = c( "CIDH920105", "BHAR880101", "CHAM820101", "CHAM820102", "MyProp1", "MyProp2", "MyProp3" ) ) head(moran, n = 36L) ``` #### Geary autocorrelation descriptors Geary autocorrelation descriptors for protein sequences can be defined as: $$ C(d) = \frac{\frac{1}{2(N-d)} \sum_{i=1}^{N-d} (P_i - P_{i+d})^2}{\frac{1}{N-1} \sum_{i=1}^{N} (P_i - \bar{P}')^2} \quad d = 1, 2, \ldots, 30 $$ where $d$, $P$, $P_i$, $P_{i+d}$, and $\textrm{nlag}$ have the same meaning as above. For each amino acid index, there will be $3 \times \textrm{nlag}$ autocorrelation descriptors. The usage of `extractGeary()` is exactly the same as `extractMoreauBroto()` and `extractMoran()`: ```{r, extractGeary} # Use the 3 custom properties defined before # and 4 properties in the AAindex database geary <- extractGeary( x, customprops = myprops, props = c( "CIDH920105", "BHAR880101", "CHAM820101", "CHAM820102", "MyProp1", "MyProp2", "MyProp3" ) ) head(geary, n = 36L) ``` ### Composition/Transition/Distribution The CTD descriptors are developed by @dubchak1 and @dubchak2. ```{r} #| fig-ctd, #| echo=FALSE, #| out.width="100%", #| fig.align="center", #| fig.cap="Figure 3: The sequence of a hypothetic protein indicating the construction of composition, transition, and distribution descriptors of a protein. The sequence index indicates the position of an amino acid in the sequence. The index for each type of amino acid in the sequence (`1`, `2` or `3`) indicates the position of the first, second, third, ... of that type of amino acid. 1/2 transition indicates the position of `12` or `21` pairs in the sequence (1/3 and 2/3 are defined similarly)." knitr::include_graphics("figures/CTD.png") ``` **Step 1: Sequence Encoding** The amino acids are categorized into three classes according to their attributes, and each amino acid is encoded by one of the indices 1, 2, 3 according to which class it belongs. The attributes used here include hydrophobicity, normalized van der Waals volume, polarity, and polarizability. The corresponding classification for each attribute is listed in Table 1. **Group 1** **Group 2** **Group 3** --------------------------------- ------------------------ ------------------------------------------------ --------------------- Hydrophobicity Polar Neutral Hydrophobicity R, K, E, D, Q, N G, A, S, T, P, H, Y C, L, V, I, M, F, W Normalized van der Waals Volume 0-2.78 2.95-4.0 4.03-8.08 G, A, S, T, P, D, C N, V, E, Q, I, L M, H, K, F, R, Y, W Polarity 4.9-6.2 8.0-9.2 10.4-13.0 L, I, F, W, C, M, V, Y P, A, T, G, S H, Q, R, K, N, E, D Polarizability 0-1.08 0.128-0.186 0.219-0.409 G, A, S, D, T C, P, N, V, E, Q, I, L K, M, H, F, R, Y, W Charge Positive Neutral Negative K, R A, N, C, Q, G, H, I, L, M, F, P, S, T, W, Y, V D, E Secondary Structure Helix Strand Coil E, A, L, M, Q, K, R, H V, I, Y, C, W, F, T G, N, P, S, D Solvent Accessibility Buried Exposed Intermediate A, L, F, C, G, I, V, W R, K, Q, E, N, D M, S, P, T, H, Y : Table 1: Amino acid attributes and the three-group classification of the 20 amino acids by each attribute For example, for a given sequence `MTEITAAMVKELRESTGAGA`, it will be encoded as `32132223311311222222` according to its hydrophobicity. **Step 2: Compute Composition, Transition, and Distribution Descriptors** Three types of descriptors, _Composition_ (_C_), _Transition_ (_T_), and _Distribution_ (_D_) can be calculated for a given attribute as follows. #### Composition Composition is defined as the global percentage for each encoded class in the protein sequence. In the above example, using the hydrophobicity classification, the numbers for encoded classes `1`, `2`, `3` are 5, 10, and 5, so that the compositions for them will be $5/20=25\%$, $10/20=10\%$, and $5/20=25\%$, where 20 is the length of the protein sequence. The composition descriptor can be expressed as $$ C_r = \frac{n_r}{n} \quad r = 1, 2, 3 $$ where $n_r$ is the number of amino acid type $r$ in the encoded sequence; $N$ is the length of the sequence. An example for `extractCTDC()`: ```{r, extractCTDC} extractCTDC(x) ``` The result shows the elements that are named as `PropertyNumber.GroupNumber` in the returned vector. #### Transition A transition from class 1 to 2 is the percent frequency with which 1 is followed by 2 or 2 is followed by 1 in the encoded sequences. The transition descriptor can be computed as $$ T_{rs} = \frac{n_{rs} + n_{sr}}{N - 1} \quad rs = \text{12}, \text{13}, \text{23} $$ where $n_{rs}$, $n_{sr}$ are the numbers of dipeptide encoded as `rs` and `sr` in the sequence; $N$ is the length of the sequence. An example for `extractCTDT()`: ```{r, extractCTDT} extractCTDT(x) ``` #### Distribution The _distribution_ descriptor describes the distribution of each attribute in the sequence. There are five "distribution" descriptors for each attribute, and they are the position percent in the whole sequence for the first residue, 25% residues, 50% residues, 75% residues, and 100% residues for a certain encoded class. For example, there are 10 residues encoded as `2` in the above example, the positions for the first residue `2`, the 2nd residue `2` (25% * 10 = 2), the 5th `2` residue (50% * 10 = 5), the 7th `2` (75% * 10 = 7) and the 10th residue `2` (100% * 10) in the encoded sequence are 2, 5, 15, 17, 20, so that the distribution descriptors for `2` are: 10.0 (2/20 * 100), 25.0 (5/20 * 100), 75.0 (15/20 * 100), 85.0 (17/20 * 100), 100.0 (20/20 * 100). To compute the distribution descriptor, use `extractCTDD()`: ```{r, extractCTDD} extractCTDD(x) ``` ### Conjoint triad descriptors Conjoint triad descriptors are proposed by @shenjw. The conjoint triad descriptors were used to model protein-protein interactions based on the classification of amino acids. In this approach, each protein sequence is represented by a vector space consisting of descriptors of amino acids. To reduce the dimensions of vector space, the 20 amino acids were clustered into several classes according to their dipoles and volumes of the side chains. The conjoint triad descriptors are calculated as follows: **Step 1: Classification of Amino Acids** Electrostatic and hydrophobic interactions dominate protein-protein interactions. These two kinds of interactions may be reflected by the dipoles and volumes of the side chains of amino acids, respectively. Accordingly, these two parameters were calculated using the density-functional theory method B3LYP/6-31G and the molecular modeling approach. Based on the dipoles and volumes of the side chains, the 20 amino acids can be clustered into seven classes (See Table 2). Amino acids within the same class likely involve synonymous mutations because of their similar characteristics. No. Dipole Scale$^1$ Volume Scale$^2$ Class ----- ------------------ ------------------ -------------------- 1 $-$ $-$ Ala, Gly, Val 2 $-$ $+$ Ile, Leu, Phe, Pro 3 $+$ $+$ Tyr, Met, Thr, Ser 4 $++$ $+$ His, Asn, Gln, Tpr 5 $+++$ $+$ Arg, Lys 6 $+'+'+'$ $+$ Asp, Glu 7 $+^3$ $+$ Cys : Table 2: Classification of amino acids based on dipoles and volumes of the side chains

^{1} Dipole Scale (Debye): $-$, Dipole < 1.0;
$+$, 1.0 < Dipole < 2.0; $++$, 2.0 < Dipole < 3.0; $+++$, Dipole > 3.0;
$+'+'+'$, Dipole > 3.0 with opposite orientation.

^{2} Volume Scale
($\overset{\lower.5em\circ}{\mathrm{A}}\lower.01em^3$):
$-$, Volume < 50; $+$, Volume > 50.

^{3} Cys is separated from class 3
because of its ability to form disulfide bonds.

^{1} The number depends on the choice of the
number of properties of amino acids and the choice of the maximum
values of the `lag`. The default is to use 8 types of properties and
`lag = 30`.

^{2} The number depends on the maximum value
of `lag`. By default, `lag = 30`. Two distance matrices are used,
so the descriptor dimension is $30 \times 2 = 60$ and
$(20 + 30) \times 2 = 100$.

^{3} The number depends on the choice of
the number of the set of amino acid properties and the choice of
the $\lambda$ value. The default is to use 3 types of properties
proposed by Kuo-Chen Chou and $\lambda = 30$.

^{4} The number depends on the choice of
the $\lambda$ vlaue. The default is $\lambda = 30$.