The function splits a fasta formatted file to a defined number of smaller .fasta files for further processing.
split_fasta( path_in, path_out, num_seq = 20000, trim = FALSE, trunc = NULL, id = FALSE )
path_in | A path to the .FASTA formatted file that is to be processed |
---|---|
path_out | A path where the resulting .FASTA formatted files should be stored. The path should also contain the prefix name of the fasta files on which _n (integer from 1 to number of fasta files generated) will be appended along with the extension ".fa" |
num_seq | Integer defining the number of sequences to be in each resulting .fasta file. Defaults to 20000. |
trim | Logical, should the sequences be trimmed to 4000 amino acids to bypass the CBS server restrictions. Defaults to FALSE. |
trunc | Integer, truncate the sequences to this length. First 1:trunc amino acids will be kept. |
id | Logical, should the protein id's be returned. Defaults to FALSE. |
if id = FALSE, A Character vector of the paths to the resulting .FASTA formatted files.
if id = TRUE, A list with two elements:
Character, protein identifiers.
Character, paths to the resulting .FASTA formatted files.
if (FALSE) { library(ragp) #create a fasta file to be processed, not needed if the input file is already present data(at_nsp) library(seqinr) write.fasta(sequence = strsplit(at_nsp$sequence, ""), name = at_nsp$Transcript.id, file = "at_nsp.fasta") #assumes input/output file are in working directory: file_paths <- split_fasta(path_in = "at_nsp.fasta", path_out = "at_nsp_split", num_seq = 500) }