Query SignalP web server.

SignalP 4.1 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.

get_signalp(data, ...)

# S3 method for character
get_signalp(
  data,
  org_type = c("euk", "gram-", "gram+"),
  Dcut_type = c("default", "sensitive", "user"),
  Dcut_noTM = 0.45,
  Dcut_TM = 0.5,
  method = c("best", "notm"),
  minlen = NULL,
  trunc = 70L,
  splitter = 1000L,
  attempts = 2,
  progress = FALSE,
  ...
)

# S3 method for data.frame
get_signalp(data, sequence, id, ...)

# S3 method for list
get_signalp(data, ...)

# S3 method for default
get_signalp(data = NULL, sequence, id, ...)

# S3 method for AAStringSet
get_signalp(data, ...)

Arguments

data	A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class `SeqFastaAA` resulting from `read.fasta` call. Alternatively an `AAStringSet` object. Should be left blank if vectors are provided to sequence and id arguments.
...	currently no additional arguments are accepted apart the ones documented bellow.
org_type	One of c("euk", "gram-", "gram+"), defaults to "euk". Which model should be used for prediction.
Dcut_type	One of c("default", "sensitive", "user"), defaults to "default". The default cutoff values for SignalP 4 are chosen to optimize the performance measured as Matthews Correlation Coefficient (MCC). This results in a lower sensitivity (true positive rate) than SignalP 3.0 had. Setting this argument to "sensitive" will yield the same sensitivity as SignalP 3.0. This will make the false positive rate slightly higher, but still better than that of SignalP 3.0.
Dcut_noTM	A numeric value, with range 0 - 1, defaults to 0.45. For experimenting with cutoff values.
Dcut_TM	A numeric value, with range 0 - 1, defaults to 0.5. For experimenting with cutoff values.
method	One of c("best", "notm"), defaults to "best". Signalp 4.1 contains two types of neural networks. SignalP-TM has been trained with sequences containing transmembrane segments in the data set, while SignalP-noTM has been trained without those sequences. Per default, SignalP 4.1 uses SignalP-TM as a preprocessor to determine whether to use SignalP-TM or SignalP-noTM in the final prediction (if 4 or more positions are predicted to be in a transmembrane state, SignalP-TM is used, otherwise SignalP-noTM). An exception is Gram-positive bacteria, where SignalP-TM is used always. If you are confident that there are no transmembrane segments in your data, you can get a slightly better performance by choosing "Input sequences do not include TM regions", which will tell SignalP 4.1 to use SignalP-noTM always.
minlen	An integer value corresponding to the minimal predicted signal peptide length, at default set to 10. SignalP 4.0 could, in rare cases, erroneously predict signal peptides shorter than 10 residues. These errors have in SignalP 4.1 been eliminated by imposing a lower limit on the cleavage site position (signal peptide length). The minimum length is by default 10, but you can adjust it. Signal peptides shorter than 15 residues are very rare. If you want to disable this length restriction completely, enter 0 (zero).
trunc	An integer value corresponding to the N-terminal truncation of input sequence, at default set to 70. By default, the predictor truncates each sequence to max. 70 residues before submitting it to the neural networks. If you want to predict extremely long signal peptides, you can try a higher value, or disable truncation completely by entering 0 (zero).
splitter	An integer indicating the number of sequences to be in each .fasta file that is to be sent to the server. Default is 1000. Change only in case of a server side error. Accepted values are in range of 1 to 2000.
attempts	Integer, number of attempts if server unresponsive, at default set to 2.
progress	Boolean, whether to show the progress bar, at default set to FALSE.
sequence	A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
id	A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.

Source

https://services.healthtech.dtu.dk/service.php?SignalP-4.1

Value

A data frame with columns:

id: Character, as from input
Cmax: Numeric, C-score (raw cleavage site score). The output from the CS networks, which are trained to distinguish signal peptide cleavage sites from everything else. Note the position numbering of the cleavage site: the C-score is trained to be high at the position immediately after the cleavage site (the first residue in the mature protein).
Cmax.pos: Integer, position of Cmax. position immediately after the cleavage site (the first residue in the mature protein).
Ymax: Numeric, Y-score (combined cleavage site score), A combination (geometric average) of the C-score and the slope of the S-score, resulting in a better cleavage site prediction than the raw C-score alone. This is due to the fact that multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site. The Y-score distinguishes between C-score peaks by choosing the one where the slope of the S-score is steep.
Ymax.pos: Integer, position of Ymax
Smax: Numeric, S-score (signal peptide score). The output from the SP networks, which are trained to distinguish positions within signal peptides from positions in the mature part of the proteins and from proteins without signal peptides.
Smax.pos: Integer, position of Smax
Smean: Numeric, The average S-score of the possible signal peptide (from position 1 to the position immediately before the maximal Y-score)
Dmean: Numeric, D-score (discrimination score). A weighted average of the mean S and the max. Y scores. This is the score that is used to discriminate signal peptides from non-signal peptides.
is.sp: Character, does the sequence contain a N-sp
Dmaxcut: Numeric, as from input, Dcut_noTM if SignalP-noTM network used and Dcut_TM if SignalP-TM network used
Networks.used: Character, which network was used for the prediction: SignalP-noTM or SignalP-TM
is.signalp: Logical, did SignalP predict the presence of a signal peptide
sp.length: Integer, length of the predicted signal peptide.

Note

This function creates temporary files in the working directory.

References

Petersen TN. Brunak S. Heijne G. Nielsen H. (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods 8: 785-786

Examples