SignalP 4.1 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.
get_signalp(data, ...) # S3 method for character get_signalp( data, org_type = c("euk", "gram-", "gram+"), Dcut_type = c("default", "sensitive", "user"), Dcut_noTM = 0.45, Dcut_TM = 0.5, method = c("best", "notm"), minlen = NULL, trunc = 70L, splitter = 1000L, attempts = 2, progress = FALSE, ... ) # S3 method for data.frame get_signalp(data, sequence, id, ...) # S3 method for list get_signalp(data, ...) # S3 method for default get_signalp(data = NULL, sequence, id, ...) # S3 method for AAStringSet get_signalp(data, ...)
data | A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class |
---|---|
... | currently no additional arguments are accepted apart the ones documented bellow. |
org_type | One of c("euk", "gram-", "gram+"), defaults to "euk". Which model should be used for prediction. |
Dcut_type | One of c("default", "sensitive", "user"), defaults to "default". The default cutoff values for SignalP 4 are chosen to optimize the performance measured as Matthews Correlation Coefficient (MCC). This results in a lower sensitivity (true positive rate) than SignalP 3.0 had. Setting this argument to "sensitive" will yield the same sensitivity as SignalP 3.0. This will make the false positive rate slightly higher, but still better than that of SignalP 3.0. |
Dcut_noTM | A numeric value, with range 0 - 1, defaults to 0.45. For experimenting with cutoff values. |
Dcut_TM | A numeric value, with range 0 - 1, defaults to 0.5. For experimenting with cutoff values. |
method | One of c("best", "notm"), defaults to "best". Signalp 4.1 contains two types of neural networks. SignalP-TM has been trained with sequences containing transmembrane segments in the data set, while SignalP-noTM has been trained without those sequences. Per default, SignalP 4.1 uses SignalP-TM as a preprocessor to determine whether to use SignalP-TM or SignalP-noTM in the final prediction (if 4 or more positions are predicted to be in a transmembrane state, SignalP-TM is used, otherwise SignalP-noTM). An exception is Gram-positive bacteria, where SignalP-TM is used always. If you are confident that there are no transmembrane segments in your data, you can get a slightly better performance by choosing "Input sequences do not include TM regions", which will tell SignalP 4.1 to use SignalP-noTM always. |
minlen | An integer value corresponding to the minimal predicted signal peptide length, at default set to 10. SignalP 4.0 could, in rare cases, erroneously predict signal peptides shorter than 10 residues. These errors have in SignalP 4.1 been eliminated by imposing a lower limit on the cleavage site position (signal peptide length). The minimum length is by default 10, but you can adjust it. Signal peptides shorter than 15 residues are very rare. If you want to disable this length restriction completely, enter 0 (zero). |
trunc | An integer value corresponding to the N-terminal truncation of input sequence, at default set to 70. By default, the predictor truncates each sequence to max. 70 residues before submitting it to the neural networks. If you want to predict extremely long signal peptides, you can try a higher value, or disable truncation completely by entering 0 (zero). |
splitter | An integer indicating the number of sequences to be in each .fasta file that is to be sent to the server. Default is 1000. Change only in case of a server side error. Accepted values are in range of 1 to 2000. |
attempts | Integer, number of attempts if server unresponsive, at default set to 2. |
progress | Boolean, whether to show the progress bar, at default set to FALSE. |
sequence | A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank. |
id | A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank. |
https://services.healthtech.dtu.dk/service.php?SignalP-4.1
A data frame with columns:
Character, as from input
Numeric, C-score (raw cleavage site score). The output from the CS networks, which are trained to distinguish signal peptide cleavage sites from everything else. Note the position numbering of the cleavage site: the C-score is trained to be high at the position immediately after the cleavage site (the first residue in the mature protein).
Integer, position of Cmax. position immediately after the cleavage site (the first residue in the mature protein).
Numeric, Y-score (combined cleavage site score), A combination (geometric average) of the C-score and the slope of the S-score, resulting in a better cleavage site prediction than the raw C-score alone. This is due to the fact that multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site. The Y-score distinguishes between C-score peaks by choosing the one where the slope of the S-score is steep.
Integer, position of Ymax
Numeric, S-score (signal peptide score). The output from the SP networks, which are trained to distinguish positions within signal peptides from positions in the mature part of the proteins and from proteins without signal peptides.
Integer, position of Smax
Numeric, The average S-score of the possible signal peptide (from position 1 to the position immediately before the maximal Y-score)
Numeric, D-score (discrimination score). A weighted average of the mean S and the max. Y scores. This is the score that is used to discriminate signal peptides from non-signal peptides.
Character, does the sequence contain a N-sp
Numeric, as from input, Dcut_noTM if SignalP-noTM network used and Dcut_TM if SignalP-TM network used
Character, which network was used for the prediction: SignalP-noTM or SignalP-TM
Logical, did SignalP predict the presence of a signal peptide
Integer, length of the predicted signal peptide.
This function creates temporary files in the working directory.
Petersen TN. Brunak S. Heijne G. Nielsen H. (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods 8: 785-786
library(ragp) signalp_pred <- get_signalp(data = at_nsp[1:10,], sequence, Transcript.id) signalp_pred#> id Cmax Cmax.pos Ymax Ymax.pos Smax Smax.pos Smean Dmean is.sp #> 1 ATCG00660.1 0.210 30 0.162 30 0.260 2 0.146 0.154 N #> 2 AT2G43600.1 0.860 23 0.894 23 0.984 13 0.930 0.913 Y #> 3 AT2G28410.1 0.779 23 0.826 23 0.940 15 0.877 0.853 Y #> 4 AT2G22960.1 0.701 23 0.790 23 0.948 15 0.891 0.844 Y #> 5 AT2G19580.1 0.422 26 0.586 26 0.885 17 0.798 0.671 Y #> 6 AT2G19690.2 0.797 29 0.870 29 0.987 18 0.952 0.914 Y #> 7 AT2G19690.1 0.797 29 0.870 29 0.987 18 0.952 0.914 Y #> 8 AT2G33130.1 0.318 33 0.530 27 0.990 16 0.933 0.748 Y #> 9 AT2G05520.1 0.633 24 0.782 24 0.990 16 0.966 0.881 Y #> 10 AT2G05520.2 0.633 24 0.782 24 0.990 16 0.966 0.881 Y #> Dmaxcut Networks.used is.signalp sp.length #> 1 0.450 SignalP-noTM FALSE 30 #> 2 0.450 SignalP-noTM TRUE 23 #> 3 0.450 SignalP-noTM TRUE 23 #> 4 0.450 SignalP-noTM TRUE 23 #> 5 0.500 SignalP-TM TRUE 26 #> 6 0.450 SignalP-noTM TRUE 29 #> 7 0.450 SignalP-noTM TRUE 29 #> 8 0.450 SignalP-noTM TRUE 27 #> 9 0.450 SignalP-noTM TRUE 24 #> 10 0.450 SignalP-noTM TRUE 24