FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] Reading FastQ files from GZ archive

thread -->
date -->
  • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Sun, 01 Apr 2012 13:32:52 +0200
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] Reading FastQ files from GZ archive

Dear JD,

the DnaQ and Dna5Q alphabets are using PHRED scaled qualities.

You should be able to read arbitrary qualities by giving an additional CharString for the qualities to the readRecord function call:

// ...

RecordReader<Stream<GZFile>, SinglePass<> > reader(fastq_file);
CharString id;
String<Dna5Q> seq;
CharString qual;

if (readRecord(id, seq, qual, reader, Fastq()) != 0)
    return 1;

You then have to conver the qualities into PHRED scale manually and assign them to the Dna5Q String seq.

Another solution would be to convert the FASTQ files to PHRED scale in a preprocessing step.

There currently is not such functionality in SeqAn, but we plan to add this in a later release. The main problem to tackle is how to perform a robust guess about the quality scale -- it is not obvious which scale a FASTQ file is in.

HTH
Manuel

On 03/31/2012 10:35 PM, ngs geek wrote:
I forgot to add an example:

raw quality string:
bbbeeeeegfggfiiiiiiidgfagdghfdfhdcdfdfb_ca`BOWU^GBBOLGLU_H_Z_BLLLGLLFX]aNWWWY_bb_``BBBBBBBBBBBBBBBBB

seqan quality string:
___________________________________________!OWU^G!!OLGLU_H_Z_!LLLGLLFX]_NWWWY______B!BBBBBBBBBBBBBBB

Best
JD

On Sat, Mar 31, 2012 at 3:39 PM, ngs geek <ngsgeek@gmail.com
<mailto:ngsgeek@gmail.com>> wrote:

    Thanks! I'm seeing that the quality string is not the same that I
    read from the fastq file.
    Are they scaled? I read the doc here
    http://trac.seqan.de/wiki/WhitePapers/QualityHandling
    but I can't figure out if these are guess by assignQualityValues ...
    in any case, I can't see
    consistent results for my Illumina 1.3+ reads.

    Thanks
    JD

    On Sat, Mar 31, 2012 at 5:22 AM, Holtgrewe, Manuel
    <manuel.holtgrewe@fu-berlin.de
    <mailto:manuel.holtgrewe@fu-berlin.de>> wrote:

        Hi JD,

        good catch, you found a bug I just fixed. The problem was that
        assignQualityValues() did not resize the target string, it does
        now. Update your SVN checkout and it should work.

        Also, there now is a demo for reading FASTQ from gzipped files
        in core/demos/stream_read_fastq_gz.cpp

        HTH
        Manuel

        ------------------------------------------------------------------------
        *From:* ngs geek [ngsgeek@gmail.com <mailto:ngsgeek@gmail.com>]
        *Sent:* Friday, March 30, 2012 7:44 PM
        *To:* seqan-dev@lists.fu-berlin.de
        <mailto:seqan-dev@lists.fu-berlin.de>
        *Subject:* [Seqan-dev] Reading FastQ files from GZ archive

        Hi All:

        I'm trying to filter reads contained in a gz file based on
        quality score; I was able to stream the id and the
        sequence with GZFile stream and using a singlepass reader (I
        don't want to keep the reads in memory).
        Unfortunately, I can't access the quality string.

        RecordReader<Stream<GZFile>, SinglePass<> > reader(fastq_file);
        CharString id;
        String<Dna5Q> seq;
        CharString qual;

        while (!atEnd(reader))
        {
        if (readRecord(id, seq, reader, Fastq()) != 0)
        {
        std::cerr << "Problem with your FASTQ file." << std::endl;
        return 1;
        }

        // reading qual
        }

        Any idea on this? I tried assignQualities etc. with no luck.

        Thanks
        JD


        _______________________________________________
        seqan-dev mailing list
        seqan-dev@lists.fu-berlin.de <mailto:seqan-dev@lists.fu-berlin.de>
        https://lists.fu-berlin.de/listinfo/seqan-dev





_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev



thread -->
date -->
  • Follow-Ups:
    • Re: [Seqan-dev] Reading FastQ files from GZ archive
      • From: ngs geek <ngsgeek@gmail.com>
  • seqan-dev - April 2012 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal