Hydroxyproline rich glycoproteins (HRGPs) are one of the most complex families of macromolecules found in plants, due to the diversity of glycans decorating the protein backbone, as well as the heterogeneity of the protein backbones. While this diversity is responsible for the wide array of physiological functions associated with HRGPs, it hinders attempts for homology based identification. Current approaches, based on identifying sequences with characteristic motifs and biased amino acid composition, are limited to prototypical sequences.
ragp
is an R package for mining and analyses of HRGPs, with emphasis on arabinogalactan protein sequences (AGPs). The ragp
filtering pipeline exploits one of the HRGPs key features, the presence of hydroxyprolines which represent glycosylation sites. Main package features include prediction of proline hydroxylation sites, amino acid motif and bias analyses, efficient communication with web servers for prediction of N-terminal signal peptides and glycosylphosphatidylinositol modification sites, as well as the ability to annotate sequences through CDD or hmmscan and subsequent GO enrichment, based on predicted Pfam domains.
The workflow in ragp
is illustrated with the following diagram (ragp
functions to be used for each of the tasks are boxed grey):
The filtering layer:
get_signalp5()
which queries SignalP5 (Almagro Armenteros et al. 2019).predict_hyp()
) and filter sequences containing several (for example three or more) potential hydroxyprolines.The analysis layer:
scan_ag()
) among the hydroxyproline containing sequences.maab()
) to classify hydroxyproline containing sequences.get_hmm()
) and enrich with gene ontology (GO) terms (pfam2go()
). Alternatively annotate domains using CDD.get_big_pi()
, get_pred_gpi()
and get_netGPI()
).get_espritz()
).get_phobius()
and get_tmhmm()
)Additionally ragp
provides tools for visualization of the mentioned attributes via plot_prot()
.
You can install ragp from github with:
# install.packages("remotes") #if not present
# install.packages("git2r") #if not present
remotes::install_github("missuse/ragp")
Or alternatively to build vignettes use:
# install.packages("remotes")
# install.packages("git2r")
remotes::install_git("https://github.com/missuse/ragp",
build_vignettes = FALSE)
Vignettes can be viewed by:
browseVignettes("ragp")
If you encounter undesired behavior in ragp
functions or you have ideas how to improve them please open an issue at: https://github.com/missuse/ragp/issues.
A shiny web interface to ragp
functions predict_hyp()
, scan_ag()
and maab()
is available at https://ragp.shinyapps.io/Rapp/.
If you find ragp
useful in your own research please cite our Glycobiology paper.
Milan B Dragićević, Danijela M Paunović, Milica D Bogdanović, Slađana I Todorović, Ana D Simonović (2020) ragp: Pipeline for mining of plant hydroxyproline-rich glycoproteins with implementation in R, Glycobiology 30(1) 19–35, https://doi.org/10.1093/glycob/cwz072
You can get citation info via citation("ragp")
or by copying the following BibTex entry:
@article{10.1093/glycob/cwz072,
author = "{Dragićević, Milan B and Paunović, Danijela M and Bogdanović, Milica D and Todorović, Slađana I and Simonović, Ana D}",
title = "{ragp: Pipeline for mining of plant hydroxyproline-rich glycoproteins with implementation in R}",
journal = "{Glycobiology}",
issn = "{1460-2423}",
publisher = "{Oxford University Press}",
year = "{2020}",
volume = "{30}",
number = "{1}",
pages = "{19–35}",
url = "{https://doi.org/10.1093/glycob/cwz072}",
doi = "{10.1093/glycob/cwz072}",
eprint = "{https://academic.oup.com/glycob/article-pdf/30/1/19/5567434/cwz072.pdf}"
}
This software was developed with funding from the Ministry of Education, Science and Technological Development of the Republic of Serbia (Projects TR31019 and OI173024).