Hello
I am enjoying using your excellent seqan tools. I have a couple of queries too though.
Partly as a learning exercise I have been trying to use seqan to re-create a version of the fastqc software accessed from R - which I pass summary data for graphics and trivial stats.
I'm using the double pass-reader to minimise memory of fastq - which works nicely for a minimal speed penalty (nice). So I can do most of what I want using the longer version of quality reads..
...
seqan::StringSet<seqan::CharString> ids;
seqan::StringSet<seqan::CharString> seqs; // a Charstring
seqan::StringSet<seqan::CharString> quals;
if (read2(ids, seqs, quals, reader, seqan::Fastq()) != 0)
etc...
However it strikes me that it would be preferable to be able to use the Dna5Q Stringset and half the code and runtime:
...
seqan::StringSet<seqan::CharString> ids;
seqan::StringSet<seqan::Dna5Q> seqsQ; // Dna5Q
if (read2(ids, seqsQ, reader, seqan::Fastq()) != 0)
etc...
This because I would prefer for efficiency (i.e not double passing) and code legibility (laziness???) to iterate a single loop through a StringSet
seqsQ for QC calculations. However I cannot see how to obtain the quality value from the score. The tutorial material and the documentation I can find suggests to me that this should be possible though I cannot find an actual example...
ordValue I find works for qual CharString but is not overloaded for Dna5Q. Nor does getQualityValue seem to work as I think it is described in the documentation...
//where seqs is the StringSet of Dna5Q read in above
seqan::Dna5Q seq= seqs[0];
Rcpp::Rcout << getQualityValue(seq[0]) << std::endl; // error
Rcpp::Rcout << (int)(ordValue(seq[0]) - 33) << std::endl; //error
So ... sorry for the lengthy post but how do I access the qualities?... or indeed the sequences of the Dna5Q when I have read in both?
I realise this is probably a trivial syntactic error but it has stumped me for a couple of days now.
Thanks
Stephen Henderson
UCL Cancer Institute
|