Query hmmer web server.

hmmer web server offers biosequence analysis using profile hidden Markov Models. This function allows searching of a protein sequence vs a profile-HMM database (Pfam-A).

get_hmm(data, ...)

# S3 method for character
get_hmm(data, ...)

# S3 method for data.frame
get_hmm(data, sequence, id, ...)

# S3 method for list
get_hmm(data, ...)

# S3 method for default
get_hmm(
  data = NULL,
  sequence,
  id,
  verbose = FALSE,
  sleep = 1,
  attempts = 2L,
  timeout = 10,
  progress = FALSE,
  ievalue = NULL,
  bitscore = NULL,
  ...
)

# S3 method for AAStringSet
get_hmm(data, ...)

Arguments

data	A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class `SeqFastaAA` resulting from `read.fasta` call. Alternatively an `AAStringSet` object. Should be left blank if vectors are provided to sequence and id arguments.
...	currently no additional arguments are accepted apart the ones documented bellow.
sequence	A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
id	A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
verbose	Boolean, whether to print out the output for each sequence, defaults to FALSE.
sleep	Numeric indicating the pause in seconds between server calls, at default set to 1.
attempts	Integer, number of attempts if server unresponsive, at default set to 2.
timeout	Numeric, time in seconds to wait for server response.
progress	Boolean, whether to show the progress bar, at default set to FALSE.
ievalue	Numeric, all sequences with independent E-value lower or equal to this value will be retained in the function output. Used to filter out low similarity matches. If set some queried sequences might be discarded from the output. Suggested values: 1e-2 - 1e-5.
bitscore	Numeric, all sequences with bitscore greater or equal to this value will be retained in the function output. Used to filter out low similarity. If set some queried sequences might be discarded from the output. Suggested values: 10 - 20.

Source

https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan

Value

A data frame with columns:

id: Character, as supplied in the function call
name: Character, PFAM family name
acc: Character, PFAM family accession
desc: Character, PFAM family description
clan: Character, PFAM clan
align_start: Numeric, start of domain alignment in query sequence
align_end: Numeric, end of domain alignment in query sequence
model_start: Numeric, start of alignment in domain model
model_end: Numeric, end of alignment in domain model
ievalue: Numeric, the "independent E-value", the E-value that the sequence/profile comparison would have received if this were the only domain envelope found in it, excluding any others. This is a stringent measure of how reliable this particular domain may be. The independent E-value uses the total number of targets in the target database.
cevalue: Numeric, the "conditional E-value", a permissive measure of how reliable this particular domain may be.
bitscore: Numeric, the domain bit score.
reported: Logical, is the result reported on the hmmer site. The hmmer web server outputs more hmm profile matches than it presents to the user. Results below a certain threshold are not reported (hidden) on the site.

Note

hmmscan does not handle sequences longer than 1000 amino acids. get_hmm splits these sequences into shorter substrings which overlap by 300 amino acids and queries hmmscan. Some results might be redundant or partially overlapping in this case. When this is an issue it is advisable to provide a subsequence of appropriate length as get_hmm input.

Examples


pfam_pred <- get_hmm(data = at_nsp[1:5,],
                    sequence = sequence,
                    id = Transcript.id,
                    verbose = FALSE,
                    sleep = 0)
pfam_pred
#>            id           name        acc                    desc   clan
#> 1 ATCG00660.1  Ribosomal_L20 PF00453.20   Ribosomal protein L20   <NA>
#> 2 AT2G43600.1 Glyco_hydro_19 PF00182.21       Chitinase class I   <NA>
#> 3 AT2G43600.1 Glyco_hydro_19 PF00182.21       Chitinase class I CL0037
#> 4 AT2G43600.1 Glyco_hydro_19 PF00182.21       Chitinase class I   <NA>
#> 5 AT2G28410.1           <NA>       <NA>                    <NA>   <NA>
#> 6 AT2G22960.1  Peptidase_S10 PF00450.24 Serine carboxypeptidase   <NA>
#> 7 AT2G22960.1  Peptidase_S10 PF00450.24 Serine carboxypeptidase CL0028
#> 8 AT2G19580.1    Tetraspanin PF00335.22      Tetraspanin family CL0347
#>   align_start align_end model_start model_end ievalue cevalue    bitscore
#> 1           3       108           1       104 2.0e-32 1.0e-36 111.5875473
#> 2          38        54          76        93 9.1e+03 4.8e-01  -3.7497714
#> 3          83       219           1       156 2.3e-30 1.2e-34 106.1714096
#> 4         227       273         185       232 1.5e-04 8.1e-09  21.6809082
#> 5          NA        NA          NA        NA      NA      NA          NA
#> 6          30        48           1        19 8.9e+02 4.6e-02  -0.8947501
#> 7          48       181         276       416 6.1e-36 3.2e-40 124.8126907
#> 8           6       251           3       230 3.3e-30 1.7e-34 105.5670090
#>   reported
#> 1     TRUE
#> 2    FALSE
#> 3     TRUE
#> 4    FALSE
#> 5    FALSE
#> 6    FALSE
#> 7     TRUE
#> 8     TRUE

Arguments

Source

Value

Note

See also

Examples