Dear JD, the DnaQ and Dna5Q alphabets are using PHRED scaled qualities.You should be able to read arbitrary qualities by giving an additional CharString for the qualities to the readRecord function call:
// ... RecordReader<Stream<GZFile>, SinglePass<> > reader(fastq_file); CharString id; String<Dna5Q> seq; CharString qual; if (readRecord(id, seq, qual, reader, Fastq()) != 0) return 1;You then have to conver the qualities into PHRED scale manually and assign them to the Dna5Q String seq.
Another solution would be to convert the FASTQ files to PHRED scale in a preprocessing step.
There currently is not such functionality in SeqAn, but we plan to add this in a later release. The main problem to tackle is how to perform a robust guess about the quality scale -- it is not obvious which scale a FASTQ file is in.
HTH Manuel On 03/31/2012 10:35 PM, ngs geek wrote:
I forgot to add an example: raw quality string: bbbeeeeegfggfiiiiiiidgfagdghfdfhdcdfdfb_ca`BOWU^GBBOLGLU_H_Z_BLLLGLLFX]aNWWWY_bb_``BBBBBBBBBBBBBBBBB seqan quality string: ___________________________________________!OWU^G!!OLGLU_H_Z_!LLLGLLFX]_NWWWY______B!BBBBBBBBBBBBBBB Best JD On Sat, Mar 31, 2012 at 3:39 PM, ngs geek <ngsgeek@gmail.com <mailto:ngsgeek@gmail.com>> wrote: Thanks! I'm seeing that the quality string is not the same that I read from the fastq file. Are they scaled? I read the doc here http://trac.seqan.de/wiki/WhitePapers/QualityHandling but I can't figure out if these are guess by assignQualityValues ... in any case, I can't see consistent results for my Illumina 1.3+ reads. Thanks JD On Sat, Mar 31, 2012 at 5:22 AM, Holtgrewe, Manuel <manuel.holtgrewe@fu-berlin.de <mailto:manuel.holtgrewe@fu-berlin.de>> wrote: Hi JD, good catch, you found a bug I just fixed. The problem was that assignQualityValues() did not resize the target string, it does now. Update your SVN checkout and it should work. Also, there now is a demo for reading FASTQ from gzipped files in core/demos/stream_read_fastq_gz.cpp HTH Manuel ------------------------------------------------------------------------ *From:* ngs geek [ngsgeek@gmail.com <mailto:ngsgeek@gmail.com>] *Sent:* Friday, March 30, 2012 7:44 PM *To:* seqan-dev@lists.fu-berlin.de <mailto:seqan-dev@lists.fu-berlin.de> *Subject:* [Seqan-dev] Reading FastQ files from GZ archive Hi All: I'm trying to filter reads contained in a gz file based on quality score; I was able to stream the id and the sequence with GZFile stream and using a singlepass reader (I don't want to keep the reads in memory). Unfortunately, I can't access the quality string. RecordReader<Stream<GZFile>, SinglePass<> > reader(fastq_file); CharString id; String<Dna5Q> seq; CharString qual; while (!atEnd(reader)) { if (readRecord(id, seq, reader, Fastq()) != 0) { std::cerr << "Problem with your FASTQ file." << std::endl; return 1; } // reading qual } Any idea on this? I tried assignQualities etc. with no luck. Thanks JD _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de <mailto:seqan-dev@lists.fu-berlin.de> https://lists.fu-berlin.de/listinfo/seqan-dev _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de https://lists.fu-berlin.de/listinfo/seqan-dev