Mining for Hydroxyproline rich glycoprotein sequences • ragp

Hydroxyproline rich glycoproteins (HRGPs) are one of the most complex families of macromolecules found in plants, due to the diversity of glycans decorating the protein backbone, as well as the heterogeneity of the protein backbones. While this diversity is responsible for the wide array of physiological functions associated with HRGPs, it hinders attempts for homology based identification. Current approaches, based on identifying sequences with characteristic motifs and biased amino acid composition, are limited to prototypical sequences.

ragp is an R package for mining and analyses of HRGPs, with emphasis on arabinogalactan protein sequences (AGPs). The ragp filtering pipeline exploits one of the HRGPs key features, the presence of hydroxyprolines which represent glycosylation sites. Main package features include prediction of proline hydroxylation sites, amino acid motif and bias analyses, efficient communication with web servers for prediction of N-terminal signal peptides and glycosylphosphatidylinositol modification sites, as well as the ability to annotate sequences through CDD or hmmscan and subsequent GO enrichment, based on predicted Pfam domains.

The workflow in ragp is illustrated with the following diagram (ragp functions to be used for each of the tasks are boxed grey):

The filtering layer:

predict the presence of secretory signals (N-sp) and filter sequences containing them. Several prediction algorithms are available. Currently recommended is to use get_signalp5() which queries SignalP5 (Almagro Armenteros et al. 2019).
predict proline hydroxylation (predict_hyp()) and filter sequences containing several (for example three or more) potential hydroxyprolines.

The analysis layer:

find localized clusters of characteristic arabinogalactan motifs (AG glycomodules) to identify potential AGPs (scan_ag()) among the hydroxyproline containing sequences.
perform motif and amino acid bias (MAAB, (Johnson et al. 2017)) classification scheme (maab()) to classify hydroxyproline containing sequences.
annotate domains using Pfam (get_hmm()) and enrich with gene ontology (GO) terms (pfam2go()). Alternatively annotate domains using CDD.
predict the presence of potential glycosylphosphatidylinositol attachment sites (get_big_pi(), get_pred_gpi() and get_netGPI()).
predict disordered regions in proteins (get_espritz()).
predict transmembrane regions in proteins (get_phobius() and get_tmhmm())

Additionally ragp provides tools for visualization of the mentioned attributes via plot_prot().

Installation

You can install ragp from github with:

# install.packages("remotes") #if not present
# install.packages("git2r") #if not present
remotes::install_github("missuse/ragp")

Or alternatively to build vignettes use:

# install.packages("remotes") 
# install.packages("git2r") 
remotes::install_git("https://github.com/missuse/ragp",
                     build_vignettes = FALSE)

Vignettes can be viewed by:

browseVignettes("ragp")

Bug reports

If you encounter undesired behavior in ragp functions or you have ideas how to improve them please open an issue at: https://github.com/missuse/ragp/issues.

Shiny application

A shiny web interface to ragp functions predict_hyp(), scan_ag() and maab() is available at https://ragp.shinyapps.io/Rapp/.

Citation

If you find ragp useful in your own research please cite our Glycobiology paper.

Milan B Dragićević, Danijela M Paunović, Milica D Bogdanović, Slađana I Todorović, Ana D Simonović (2020) ragp: Pipeline for mining of plant hydroxyproline-rich glycoproteins with implementation in R, Glycobiology 30(1) 19–35, https://doi.org/10.1093/glycob/cwz072

You can get citation info via citation("ragp") or by copying the following BibTex entry:

@article{10.1093/glycob/cwz072,
    author = "{Dragićević, Milan B and Paunović, Danijela M and Bogdanović, Milica D and Todorović, Slađana I and Simonović, Ana D}",
    title = "{ragp: Pipeline for mining of plant hydroxyproline-rich glycoproteins with implementation in R}",
    journal = "{Glycobiology}",
    issn = "{1460-2423}",
    publisher = "{Oxford University Press}",
    year = "{2020}",
    volume = "{30}",  
    number = "{1}",
    pages = "{19–35}",
    url = "{https://doi.org/10.1093/glycob/cwz072}",
    doi = "{10.1093/glycob/cwz072}",
    eprint = "{https://academic.oup.com/glycob/article-pdf/30/1/19/5567434/cwz072.pdf}"
}

Acknowledgements

This software was developed with funding from the Ministry of Education, Science and Technological Development of the Republic of Serbia (Projects TR31019 and OI173024).

References

Almagro Armenteros, José Juan, Konstantinos D. Tsirigos, Casper Kaae Sønderby, Thomas Nordahl Petersen, Ole Winther, Søren Brunak, Gunnar von Heijne, and Henrik Nielsen. 2019. “SignalP 5.0 Improves Signal Peptide Predictions Using Deep Neural Networks.” Nature Biotechnology 37: 420–23. https://doi.org/0.1038/s41587-019-0036-z.

Johnson, Kim L., Andrew M. Cassin, Andrew Lonsdale, Antony Bacic, Monika S. Doblin, and Carolyn J. Schultz. 2017. “Pipeline to Identify Hydroxyproline-Rich Glycoproteins.” Plant Physiology 174 (2): 886–903. https://doi.org/10.1104/pp.17.00294.

ragp - hydroxyproline aware filtering of hydroxyproline rich glycoprotein sequences

Installation

Bug reports

Shiny application

Citation

Acknowledgements

References

Links

License

Citation

Developers

Dev status