This short section explains how get started using ragp
, from installation and basic function arguments to manipulation of function outputs.
There are several ways to install R packages hosted on git-hub, however the simplest is to use remotes::install_github()
which will perform all the required steps automatically.
To install ragp
run:
# install.packages("remotes") #if not present
# install.packages("git2r") #if not present
remotes::install_github("missuse/ragp")
alternatively run:
# install.packages("remotes")
# install.packages("git2r")
remotes::install_git("https://github.com/missuse/ragp",
build_vignettes = FALSE)
to build vignettes which can be viewed by:
browseVignettes("ragp")
Most ragp
functions require single letter protein sequences and the corresponding identifiers as input. These can be provided in the form of basic R
data types such as vectors or data frames. Additionally ragp
imports the seqinr
package for the manipulation of .FASTA
files, so the input objects can be a list of SeqFastaAA
objects returned by the seqinr::read.fasta()
. The location of a .FASTA
file is also possible as a type of input. As of ragp version 0.3.5 objects of class AAStringSet
are also supported.
Input options will be illustrated using scan_ag()
function:
character
vector of protein sequences to the sequence
argument and a character
vector of protein identifiers to the id
argument:
library(ragp)
data(at_nsp) #a data frame of 2700 Arabidopsis sequences
input1 <- scan_ag(sequence = at_nsp$sequence,
id = at_nsp$Transcript.id)
data.frame
to data
argument, and names of columns containing the protein sequences and corresponding identifiers to sequence
and id
arguments:
input2 <- scan_ag(data = at_nsp,
sequence = "sequence",
id = "Transcript.id")
quoting column names is not necessary:
input3 <- scan_ag(data = at_nsp,
sequence = sequence,
id = Transcript.id)
SeqFastaAA
objects to data
argument:
library(seqinr) #to create a fasta file with protein sequences
#write a FASTA file
seqinr::write.fasta(sequence = strsplit(at_nsp$sequence, ""),
name = at_nsp$Transcript.id, file = "at_nsp.fasta")
#read a FASTA file to a list of SeqFastaAA objects
At_seq_fas <- read.fasta("at_nsp.fasta",
seqtype = "AA",
as.string = TRUE)
input4 <- scan_ag(data = At_seq_fas)
.FASTA
file to be analyzed as string:
input5 <- scan_ag(data = "at_nsp.fasta") #file at_nsp.fasta is in the working directory
AAStringSet
object:
dat <- Biostrings::readAAStringSet("at_nsp.fasta") #file at_nsp.fasta is in the working directory
input6 <- scan_ag(data = dat)
All of the outputs are equal:
all.equal(input1,
input2)
#> [1] TRUE
all.equal(input1,
input3)
#> [1] TRUE
all.equal(input1,
input4)
#> [1] TRUE
all.equal(input1,
input5)
#> [1] TRUE
all.equal(input1,
input6)
#> [1] TRUE
The only exceptions to this design are the plotting function plot_prot()
which requires protein sequences to be supplied in the form of string vectors (input1 in the above example) and pfam2go()
which does not take sequences as input.
All ragp
functions return basic R data structures such as data frames, lists of vectors and lists of data frames, making them convenient for manipulation to anyone familiar with R. An especially effective way to manipulate these objects is by utilizing the tidyverse
collection of packages, especially dplyr
and ggplot2
. Several dplyr
functions that will be especially handy for data wrangling are:
dplyr::left_join()
dplyr::mutate()
dplyr::group_by()
dplyr::summarise()
dplyr::filter()
dplyr::distinct()
Examples on usage of these functions on objects returned by ragp
functions are provided in HRGP filtering and HRGP analysis tutorials. Additionally there are extensive examples on the internet on usage of the mentioned functions.
Obtaining pretty visualizations is usually the goal of the above mentioned data manipulations. The golden standard of R graphics at present is the ggplot2 package and we recommend it to graphically summarize the data. Additionally ragp
contains plot_prot()
function which is a wrapper for ggplot2
, and while plot_prot()
can be used without knowing ggplot2
syntax, to tweak the plot style at least a basic knowledge of ggplot2
is required. Examples are provided in protein sequence visualization tutorial.