R: Parsing Fasta Files

As a quick follow-up to my previous posts about parsing fasta files in perl, python, and ruby I wanted to make a quick note about a efficient way to get the data into R.

library(Rcpp)
sourceCpp("read_fasta.cpp")
library(microbenchmark)
fasta_lengths <- function(file) {
    records = read_fasta(file)
    sapply(records, nchar)
}
microbenchmark(fasta_lengths("Hg19.fa"), times = 1)

And the results

## Unit: seconds
##                       expr   min    lq median    uq   max
## 1 fasta_lengths("Hg19.fa") 33.99 33.99  33.99 33.99 33.99

So this is actually faster than the python implementation, an impressive feat, Rcpp is a very nice package!

Avatar
Jim Hester
Software Engineer

I’m a Senior Software Engineer at Netflix and R package developer.

Related