Espritz web server predicts disordered regions from primary sequence. It utilizes Bi-directional Recursive Neural Networks and can process proteins on a genomic scale with little effort and state-of-the-art accuracy.
get_espritz(data, ...) # S3 method for character get_espritz(data, ...) # S3 method for data.frame get_espritz(data, sequence, id, ...) # S3 method for list get_espritz(data, ...) # S3 method for default get_espritz( data = NULL, sequence, id, model = c("X-Ray", "Disprot", "NMR"), FPR = c("best Sw", "5% FPR"), simplify = TRUE, progress = FALSE, ... ) # S3 method for AAStringSet get_espritz(data, ...)
data | A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class |
---|---|
... | currently no additional arguments are accepted apart the ones documented bellow. |
sequence | A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank. |
id | A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank. |
model | One of c('X-Ray', 'Disprot', 'NMR'), default is 'X-Ray'. Determines the model to be used for prediction. See details. |
FPR | One of c('best Sw', '5"%" FPR'). default is 'best Sw'. Determines the cutoff probability for prediction. 'best Sw' maximizes a weighted score rewarding correctly disorder prediction more than order prediction. |
simplify | A Boolean indicating the type of returned object, defaults to TRUE. |
progress | Boolean, whether to show the progress bar, at default set to FALSE. |
http://old.protein.bio.unipd.it/espritz/
If simplify == TRUE: A data frame (one row per disordered region) with columns:
Integer, indicating the sequence position of disordered region start.
Integer, indicating the sequence position of disordered region end.
Character, indicating the protein identifier.
If simplify == FALSE: A data frame (one row per protein) with columns:
Character, indicating the protein identifier.
List column of numeric vectors, vectors contain probabilities of disorder for each residue.
Character, indicating the prediction: D - disordered, O - ordered for each residue.
Three models trained on different data sets are available and can be selected via the argument model: X-Ray - based on missing atoms from the Protein Data Bank (PDB) X-ray solved structures. If this option is chosen then the predictors with short disorder options are executed. Disprot - contains longer disorder segments compared to x-ray. In particular, disprot a manually curetted database which is often based on functional attributes of the disordered region was used for this definition. Disorder residues are defined if the disprot curators consider the residue to be disordered at least once. All other residues are considered ordered. If this option is chosen then the predictors with long disorder options are executed. 'NMR' - based on NMR mobility. NMR flexibility is calculated using the Mobi server optimized to replicate the ordered-disordered NMR definition used in CASP8. These models provide quite different predictions. For further details visit http://old.protein.bio.unipd.it/espritz/help_pages/help.html and http://old.protein.bio.unipd.it/espritz/help_pages/methods.html.
The Espritz web server has a limit on the amount of daily queries by ip. The function will inform the user when the limit has been exceeded.
Walsh I, Martin AJM, Di domenico T, Tosatto SCE (2012) ESpritz: accurate and fast prediction of protein disorder. Bioinformatics 28(4): 503 - 509
#> start end id #> 1 1 12 ATCG00660.1 #> 2 49 54 ATCG00660.1 #> 3 111 117 ATCG00660.1 #> 4 1 9 AT2G43600.1 #> 5 266 273 AT2G43600.1 #> 6 1 8 AT2G28410.1 #> 7 47 61 AT2G28410.1 #> 8 115 115 AT2G28410.1 #> 9 1 8 AT2G22960.1 #> 10 16 19 AT2G22960.1 #> 11 178 184 AT2G22960.1 #> 12 1 9 AT2G19580.1 #> 13 83 88 AT2G19580.1 #> 14 256 270 AT2G19580.1 #> 15 1 12 AT2G19690.2 #> 16 104 108 AT2G19690.2 #> 17 143 148 AT2G19690.2 #> 18 1 12 AT2G19690.1 #> 19 104 109 AT2G19690.1 #> 20 139 147 AT2G19690.1 #> 21 1 10 AT2G33130.1 #> 22 15 17 AT2G33130.1 #> 23 65 103 AT2G33130.1 #> 24 1 128 AT2G05520.1 #> 25 130 145 AT2G05520.1 #> 26 1 118 AT2G05520.2 #> 27 123 138 AT2G05520.2