Espritz web server predicts disordered regions from primary sequence. It utilizes Bi-directional Recursive Neural Networks and can process proteins on a genomic scale with little effort and state-of-the-art accuracy.

get_espritz(data, ...)

# S3 method for character
get_espritz(data, ...)

# S3 method for data.frame
get_espritz(data, sequence, id, ...)

# S3 method for list
get_espritz(data, ...)

# S3 method for default
get_espritz(
  data = NULL,
  sequence,
  id,
  model = c("X-Ray", "Disprot", "NMR"),
  FPR = c("best Sw", "5% FPR"),
  simplify = TRUE,
  progress = FALSE,
  ...
)

# S3 method for AAStringSet
get_espritz(data, ...)

Arguments

data

A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class SeqFastaAA resulting from read.fasta call. Alternatively an AAStringSet object. Should be left blank if vectors are provided to sequence and id arguments.

...

currently no additional arguments are accepted apart the ones documented bellow.

sequence

A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.

id

A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.

model

One of c('X-Ray', 'Disprot', 'NMR'), default is 'X-Ray'. Determines the model to be used for prediction. See details.

FPR

One of c('best Sw', '5"%" FPR'). default is 'best Sw'. Determines the cutoff probability for prediction. 'best Sw' maximizes a weighted score rewarding correctly disorder prediction more than order prediction.

simplify

A Boolean indicating the type of returned object, defaults to TRUE.

progress

Boolean, whether to show the progress bar, at default set to FALSE.

Source

http://old.protein.bio.unipd.it/espritz/

Value

If simplify == TRUE: A data frame (one row per disordered region) with columns:

start

Integer, indicating the sequence position of disordered region start.

end

Integer, indicating the sequence position of disordered region end.

id

Character, indicating the protein identifier.

If simplify == FALSE: A data frame (one row per protein) with columns:

id

Character, indicating the protein identifier.

probability

List column of numeric vectors, vectors contain probabilities of disorder for each residue.

prediction

Character, indicating the prediction: D - disordered, O - ordered for each residue.

Details

Three models trained on different data sets are available and can be selected via the argument model: X-Ray - based on missing atoms from the Protein Data Bank (PDB) X-ray solved structures. If this option is chosen then the predictors with short disorder options are executed. Disprot - contains longer disorder segments compared to x-ray. In particular, disprot a manually curetted database which is often based on functional attributes of the disordered region was used for this definition. Disorder residues are defined if the disprot curators consider the residue to be disordered at least once. All other residues are considered ordered. If this option is chosen then the predictors with long disorder options are executed. 'NMR' - based on NMR mobility. NMR flexibility is calculated using the Mobi server optimized to replicate the ordered-disordered NMR definition used in CASP8. These models provide quite different predictions. For further details visit http://old.protein.bio.unipd.it/espritz/help_pages/help.html and http://old.protein.bio.unipd.it/espritz/help_pages/methods.html.

Note

The Espritz web server has a limit on the amount of daily queries by ip. The function will inform the user when the limit has been exceeded.

References

Walsh I, Martin AJM, Di domenico T, Tosatto SCE (2012) ESpritz: accurate and fast prediction of protein disorder. Bioinformatics 28(4): 503 - 509

Examples

library(ragp) espritz_test <- get_espritz(at_nsp[1:10,], sequence, Transcript.id) espritz_test
#> start end id #> 1 1 12 ATCG00660.1 #> 2 49 54 ATCG00660.1 #> 3 111 117 ATCG00660.1 #> 4 1 9 AT2G43600.1 #> 5 266 273 AT2G43600.1 #> 6 1 8 AT2G28410.1 #> 7 47 61 AT2G28410.1 #> 8 115 115 AT2G28410.1 #> 9 1 8 AT2G22960.1 #> 10 16 19 AT2G22960.1 #> 11 178 184 AT2G22960.1 #> 12 1 9 AT2G19580.1 #> 13 83 88 AT2G19580.1 #> 14 256 270 AT2G19580.1 #> 15 1 12 AT2G19690.2 #> 16 104 108 AT2G19690.2 #> 17 143 148 AT2G19690.2 #> 18 1 12 AT2G19690.1 #> 19 104 109 AT2G19690.1 #> 20 139 147 AT2G19690.1 #> 21 1 10 AT2G33130.1 #> 22 15 17 AT2G33130.1 #> 23 65 103 AT2G33130.1 #> 24 1 128 AT2G05520.1 #> 25 130 145 AT2G05520.1 #> 26 1 118 AT2G05520.2 #> 27 123 138 AT2G05520.2