SignalP 4.1 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.

get_signalp(data, ...)

# S3 method for character
get_signalp(
  data,
  org_type = c("euk", "gram-", "gram+"),
  Dcut_type = c("default", "sensitive", "user"),
  Dcut_noTM = 0.45,
  Dcut_TM = 0.5,
  method = c("best", "notm"),
  minlen = NULL,
  trunc = 70L,
  splitter = 1000L,
  attempts = 2,
  progress = FALSE,
  ...
)

# S3 method for data.frame
get_signalp(data, sequence, id, ...)

# S3 method for list
get_signalp(data, ...)

# S3 method for default
get_signalp(data = NULL, sequence, id, ...)

# S3 method for AAStringSet
get_signalp(data, ...)

Arguments

data

A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class SeqFastaAA resulting from read.fasta call. Alternatively an AAStringSet object. Should be left blank if vectors are provided to sequence and id arguments.

...

currently no additional arguments are accepted apart the ones documented bellow.

org_type

One of c("euk", "gram-", "gram+"), defaults to "euk". Which model should be used for prediction.

Dcut_type

One of c("default", "sensitive", "user"), defaults to "default". The default cutoff values for SignalP 4 are chosen to optimize the performance measured as Matthews Correlation Coefficient (MCC). This results in a lower sensitivity (true positive rate) than SignalP 3.0 had. Setting this argument to "sensitive" will yield the same sensitivity as SignalP 3.0. This will make the false positive rate slightly higher, but still better than that of SignalP 3.0.

Dcut_noTM

A numeric value, with range 0 - 1, defaults to 0.45. For experimenting with cutoff values.

Dcut_TM

A numeric value, with range 0 - 1, defaults to 0.5. For experimenting with cutoff values.

method

One of c("best", "notm"), defaults to "best". Signalp 4.1 contains two types of neural networks. SignalP-TM has been trained with sequences containing transmembrane segments in the data set, while SignalP-noTM has been trained without those sequences. Per default, SignalP 4.1 uses SignalP-TM as a preprocessor to determine whether to use SignalP-TM or SignalP-noTM in the final prediction (if 4 or more positions are predicted to be in a transmembrane state, SignalP-TM is used, otherwise SignalP-noTM). An exception is Gram-positive bacteria, where SignalP-TM is used always. If you are confident that there are no transmembrane segments in your data, you can get a slightly better performance by choosing "Input sequences do not include TM regions", which will tell SignalP 4.1 to use SignalP-noTM always.

minlen

An integer value corresponding to the minimal predicted signal peptide length, at default set to 10. SignalP 4.0 could, in rare cases, erroneously predict signal peptides shorter than 10 residues. These errors have in SignalP 4.1 been eliminated by imposing a lower limit on the cleavage site position (signal peptide length). The minimum length is by default 10, but you can adjust it. Signal peptides shorter than 15 residues are very rare. If you want to disable this length restriction completely, enter 0 (zero).

trunc

An integer value corresponding to the N-terminal truncation of input sequence, at default set to 70. By default, the predictor truncates each sequence to max. 70 residues before submitting it to the neural networks. If you want to predict extremely long signal peptides, you can try a higher value, or disable truncation completely by entering 0 (zero).

splitter

An integer indicating the number of sequences to be in each .fasta file that is to be sent to the server. Default is 1000. Change only in case of a server side error. Accepted values are in range of 1 to 2000.

attempts

Integer, number of attempts if server unresponsive, at default set to 2.

progress

Boolean, whether to show the progress bar, at default set to FALSE.

sequence

A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.

id

A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.

Source

https://services.healthtech.dtu.dk/service.php?SignalP-4.1

Value

A data frame with columns:

id

Character, as from input

Cmax

Numeric, C-score (raw cleavage site score). The output from the CS networks, which are trained to distinguish signal peptide cleavage sites from everything else. Note the position numbering of the cleavage site: the C-score is trained to be high at the position immediately after the cleavage site (the first residue in the mature protein).

Cmax.pos

Integer, position of Cmax. position immediately after the cleavage site (the first residue in the mature protein).

Ymax

Numeric, Y-score (combined cleavage site score), A combination (geometric average) of the C-score and the slope of the S-score, resulting in a better cleavage site prediction than the raw C-score alone. This is due to the fact that multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site. The Y-score distinguishes between C-score peaks by choosing the one where the slope of the S-score is steep.

Ymax.pos

Integer, position of Ymax

Smax

Numeric, S-score (signal peptide score). The output from the SP networks, which are trained to distinguish positions within signal peptides from positions in the mature part of the proteins and from proteins without signal peptides.

Smax.pos

Integer, position of Smax

Smean

Numeric, The average S-score of the possible signal peptide (from position 1 to the position immediately before the maximal Y-score)

Dmean

Numeric, D-score (discrimination score). A weighted average of the mean S and the max. Y scores. This is the score that is used to discriminate signal peptides from non-signal peptides.

is.sp

Character, does the sequence contain a N-sp

Dmaxcut

Numeric, as from input, Dcut_noTM if SignalP-noTM network used and Dcut_TM if SignalP-TM network used

Networks.used

Character, which network was used for the prediction: SignalP-noTM or SignalP-TM

is.signalp

Logical, did SignalP predict the presence of a signal peptide

sp.length

Integer, length of the predicted signal peptide.

Note

This function creates temporary files in the working directory.

References

Petersen TN. Brunak S. Heijne G. Nielsen H. (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods 8: 785-786

See also

Examples

library(ragp) signalp_pred <- get_signalp(data = at_nsp[1:10,], sequence, Transcript.id) signalp_pred
#> id Cmax Cmax.pos Ymax Ymax.pos Smax Smax.pos Smean Dmean is.sp #> 1 ATCG00660.1 0.210 30 0.162 30 0.260 2 0.146 0.154 N #> 2 AT2G43600.1 0.860 23 0.894 23 0.984 13 0.930 0.913 Y #> 3 AT2G28410.1 0.779 23 0.826 23 0.940 15 0.877 0.853 Y #> 4 AT2G22960.1 0.701 23 0.790 23 0.948 15 0.891 0.844 Y #> 5 AT2G19580.1 0.422 26 0.586 26 0.885 17 0.798 0.671 Y #> 6 AT2G19690.2 0.797 29 0.870 29 0.987 18 0.952 0.914 Y #> 7 AT2G19690.1 0.797 29 0.870 29 0.987 18 0.952 0.914 Y #> 8 AT2G33130.1 0.318 33 0.530 27 0.990 16 0.933 0.748 Y #> 9 AT2G05520.1 0.633 24 0.782 24 0.990 16 0.966 0.881 Y #> 10 AT2G05520.2 0.633 24 0.782 24 0.990 16 0.966 0.881 Y #> Dmaxcut Networks.used is.signalp sp.length #> 1 0.450 SignalP-noTM FALSE 30 #> 2 0.450 SignalP-noTM TRUE 23 #> 3 0.450 SignalP-noTM TRUE 23 #> 4 0.450 SignalP-noTM TRUE 23 #> 5 0.500 SignalP-TM TRUE 26 #> 6 0.450 SignalP-noTM TRUE 29 #> 7 0.450 SignalP-noTM TRUE 29 #> 8 0.450 SignalP-noTM TRUE 27 #> 9 0.450 SignalP-noTM TRUE 24 #> 10 0.450 SignalP-noTM TRUE 24