FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] Reading FastQ files from GZ archive

<-- thread -->
<-- date -->
  • From: ngs geek <ngsgeek@gmail.com>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Mon, 2 Apr 2012 14:34:00 -0400
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] Reading FastQ files from GZ archive

Thanks for your help. I'm importing a pretty large number of fastq sequences in a fragmentstore, but again when I look at the PHRED score with getQualityValue(fragmentStore.readSeqStore[read_id][i])
I got inconsistent results. It looks like it assign different offset to each character: I'm working with illumina ( phred+64).

Cheers
JD



On Sun, Apr 1, 2012 at 7:32 AM, Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de> wrote:
Dear JD,

the DnaQ and Dna5Q alphabets are using PHRED scaled qualities.

You should be able to read arbitrary qualities by giving an additional CharString for the qualities to the readRecord function call:

// ...


RecordReader<Stream<GZFile>, SinglePass<> > reader(fastq_file);
CharString id;
String<Dna5Q> seq;
CharString qual;

if (readRecord(id, seq, qual, reader, Fastq()) != 0)
   return 1;

You then have to conver the qualities into PHRED scale manually and assign them to the Dna5Q String seq.

Another solution would be to convert the FASTQ files to PHRED scale in a preprocessing step.

There currently is not such functionality in SeqAn, but we plan to add this in a later release. The main problem to tackle is how to perform a robust guess about the quality scale -- it is not obvious which scale a FASTQ file is in.

HTH
Manuel


On 03/31/2012 10:35 PM, ngs geek wrote:
I forgot to add an example:

raw quality string:
bbbeeeeegfggfiiiiiiidgfagdghfdfhdcdfdfb_ca`BOWU^GBBOLGLU_H_Z_BLLLGLLFX]aNWWWY_bb_``BBBBBBBBBBBBBBBBB

seqan quality string:
___________________________________________!OWU^G!!OLGLU_H_Z_!LLLGLLFX]_NWWWY______B!BBBBBBBBBBBBBBB

Best
JD

On Sat, Mar 31, 2012 at 3:39 PM, ngs geek <ngsgeek@gmail.com
<mailto:ngsgeek@gmail.com>> wrote:

   Thanks! I'm seeing that the quality string is not the same that I
   read from the fastq file.
   Are they scaled? I read the doc here
   http://trac.seqan.de/wiki/WhitePapers/QualityHandling
   but I can't figure out if these are guess by assignQualityValues ...
   in any case, I can't see
   consistent results for my Illumina 1.3+ reads.

   Thanks
   JD

   On Sat, Mar 31, 2012 at 5:22 AM, Holtgrewe, Manuel
   <manuel.holtgrewe@fu-berlin.de
   <mailto:manuel.holtgrewe@fu-berlin.de>> wrote:

       Hi JD,

       good catch, you found a bug I just fixed. The problem was that
       assignQualityValues() did not resize the target string, it does
       now. Update your SVN checkout and it should work.

       Also, there now is a demo for reading FASTQ from gzipped files
       in core/demos/stream_read_fastq_gz.cpp

       HTH
       Manuel

       ------------------------------------------------------------------------
       *From:* ngs geek [ngsgeek@gmail.com <mailto:ngsgeek@gmail.com>]
       *Sent:* Friday, March 30, 2012 7:44 PM
       *To:* seqan-dev@lists.fu-berlin.de
       <mailto:seqan-dev@lists.fu-berlin.de>
       *Subject:* [Seqan-dev] Reading FastQ files from GZ archive


       Hi All:

       I'm trying to filter reads contained in a gz file based on
       quality score; I was able to stream the id and the
       sequence with GZFile stream and using a singlepass reader (I
       don't want to keep the reads in memory).
       Unfortunately, I can't access the quality string.

       RecordReader<Stream<GZFile>, SinglePass<> > reader(fastq_file);
       CharString id;
       String<Dna5Q> seq;
       CharString qual;

       while (!atEnd(reader))
       {
       if (readRecord(id, seq, reader, Fastq()) != 0)
       {
       std::cerr << "Problem with your FASTQ file." << std::endl;
       return 1;
       }

       // reading qual
       }

       Any idea on this? I tried assignQualities etc. with no luck.

       Thanks
       JD


       _______________________________________________
       seqan-dev mailing list
       seqan-dev@lists.fu-berlin.de <mailto:seqan-dev@lists.fu-berlin.de>
       https://lists.fu-berlin.de/listinfo/seqan-dev






_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] Reading FastQ files from GZ archive
      • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
  • References:
    • Re: [Seqan-dev] Reading FastQ files from GZ archive
      • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
  • seqan-dev - April 2012 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal