Dear JD,
the DnaQ and Dna5Q alphabets are using PHRED scaled qualities.
You should be able to read arbitrary qualities by giving an additional CharString for the qualities to the readRecord function call:
// ...if (readRecord(id, seq, qual, reader, Fastq()) != 0)
RecordReader<Stream<GZFile>, SinglePass<> > reader(fastq_file);
CharString id;
String<Dna5Q> seq;
CharString qual;
return 1;
You then have to conver the qualities into PHRED scale manually and assign them to the Dna5Q String seq.
Another solution would be to convert the FASTQ files to PHRED scale in a preprocessing step.
There currently is not such functionality in SeqAn, but we plan to add this in a later release. The main problem to tackle is how to perform a robust guess about the quality scale -- it is not obvious which scale a FASTQ file is in.
HTH
Manuel
On 03/31/2012 10:35 PM, ngs geek wrote:
I forgot to add an example:
raw quality string:
bbbeeeeegfggfiiiiiiidgfagdghfdfhdcdfdfb_ca`BOWU^GBBOLGLU_H_Z_BLLLGLLFX]aNWWWY_bb_``BBBBBBBBBBBBBBBBB
seqan quality string:
___________________________________________!OWU^G!!OLGLU_H_Z_!LLLGLLFX]_NWWWY______B!BBBBBBBBBBBBBBB
Best
JD
On Sat, Mar 31, 2012 at 3:39 PM, ngs geek <ngsgeek@gmail.com<mailto:ngsgeek@gmail.com>> wrote:
Thanks! I'm seeing that the quality string is not the same that I
read from the fastq file.
Are they scaled? I read the doc here
http://trac.seqan.de/wiki/WhitePapers/QualityHandling
but I can't figure out if these are guess by assignQualityValues ...
in any case, I can't see
consistent results for my Illumina 1.3+ reads.
Thanks
JD
On Sat, Mar 31, 2012 at 5:22 AM, Holtgrewe, Manuel
<manuel.holtgrewe@fu-berlin.de<mailto:manuel.holtgrewe@fu-berlin.de>> wrote:------------------------------------------------------------------------
Hi JD,
good catch, you found a bug I just fixed. The problem was that
assignQualityValues() did not resize the target string, it does
now. Update your SVN checkout and it should work.
Also, there now is a demo for reading FASTQ from gzipped files
in core/demos/stream_read_fastq_gz.cpp
HTH
Manuel
*From:* ngs geek [ngsgeek@gmail.com <mailto:ngsgeek@gmail.com>]
*Sent:* Friday, March 30, 2012 7:44 PM
*To:* seqan-dev@lists.fu-berlin.de
<mailto:seqan-dev@lists.fu-berlin.de>
*Subject:* [Seqan-dev] Reading FastQ files from GZ archiveseqan-dev@lists.fu-berlin.de <mailto:seqan-dev@lists.fu-berlin.de>
Hi All:
I'm trying to filter reads contained in a gz file based on
quality score; I was able to stream the id and the
sequence with GZFile stream and using a singlepass reader (I
don't want to keep the reads in memory).
Unfortunately, I can't access the quality string.
RecordReader<Stream<GZFile>, SinglePass<> > reader(fastq_file);
CharString id;
String<Dna5Q> seq;
CharString qual;
while (!atEnd(reader))
{
if (readRecord(id, seq, reader, Fastq()) != 0)
{
std::cerr << "Problem with your FASTQ file." << std::endl;
return 1;
}
// reading qual
}
Any idea on this? I tried assignQualities etc. with no luck.
Thanks
JD
_______________________________________________
seqan-dev mailing list
https://lists.fu-berlin.de/listinfo/seqan-dev
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev