As a quick follow-up to my previous posts about parsing fasta files in perl, python, and ruby I wanted to make a quick note about a efficient way to get the data into R.
library(Rcpp)
sourceCpp("read_fasta.cpp")
library(microbenchmark)
fasta_lengths <- function(file) {
records = read_fasta(file)
sapply(records, nchar)
}
microbenchmark(fasta_lengths("Hg19.fa"), times = 1)
And the results
## Unit: seconds
## expr min lq median uq max
## 1 fasta_lengths("Hg19.fa") 33.99 33.99 33.99 33.99 33.99
So this is actually faster than the python implementation, an impressive feat, Rcpp is a very nice package!