FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] CheckStreamFormat for FastQ

<-- thread -->
<-- date -->
  • From: Felix Heeger <fheeger@mi.fu-berlin.de>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Mon, 09 Jan 2012 10:35:09 +0100
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] CheckStreamFormat for FastQ

Hi,

My configuration:
Ubuntu 11.04 64bit
64 bit processor
gcc version 4.5.2

felix

On Fri, 2012-01-06 at 16:53 +0100, Manuel Holtgrewe wrote:
> What is your configuration (OS, 32/64 bit, compiler, version)
> 
> On 01/06/2012 03:53 PM, Felix Heeger wrote:
> > Hi Manuel,
> >
> > I did a fresh check out, but the still the same problem.
> >
> > However if I remove the last 6 records from the file it will be
> > recognized. I also removed the first 6 records to make sure it is the
> > file size that is causing the issue and not a specific record. Same
> > result.
> >
> > In short: it is working for me if the file size is<= 8KB.
> >
> > felix
> >
> > On Fri, 2012-01-06 at 15:06 +0100, Manuel Holtgrewe wrote:
> >> I tested the program on the file that you attached and it worked. Does
> >> the program detect the format of the small file, too?
> >>
> >> $ make file_detect
> >> [...]
> >> $ ./sandbox/holtgrew/demos/file_detect /tmp/lane_5_p1.fastq
> >> Detected FASTQ.
> >>
> >> Could you try again with a fresh checkout?
> >>
> >> On 01/06/2012 02:35 PM, Felix Heeger wrote:
> >>> Hi Manual,
> >>>
> >>> thank you for your effort. I checked your suggestion today and it did
> >>> not fix my problem. Also your example program can not identify my FASTQ
> >>> file. I am pretty sure it is valid FASTQ as other programs work fine on
> >>> it. I attached the first part of the file, if you want to have a look at
> >>> it.
> >>>
> >>> felix
> >>>
> >>> On Wed, 2011-12-21 at 18:31 +0100, Manuel Holtgrewe wrote:
> >>>> Felix,
> >>>>
> >>>> The documentation of checkStreamFormat() was misleading. I fixed it in
> >>>> [10948].
> >>>>
> >>>> http://docs.seqan.de/seqan/dev2/?i=Function.checkStreamFormat
> >>>>
> >>>> (The documentation is regenerated every hour, so you might wait for a
> >>>> bit to see it).
> >>>>
> >>>> The following is a simple example program I compiled and tested. Please
> >>>> write another email, if the problem persists.
> >>>>
> >>>> HTH,
> >>>> Manuel
> >>>>
> >>>> #include<fstream>
> >>>> #include<iostream>
> >>>>
> >>>> #include<seqan/sequence.h>
> >>>> #include<seqan/stream.h>
> >>>>
> >>>> int main(int argc, char ** argv)
> >>>> {
> >>>>        using namespace seqan;
> >>>>
> >>>>        if (argc != 2)
> >>>>            return 1;
> >>>>        std::fstream in(argv[1]);
> >>>>
> >>>>        RecordReader<std::fstream, SinglePass<>   >   reader(in);
> >>>>        AutoSeqStreamFormat tagSelector;
> >>>>        bool b = checkStreamFormat(reader, tagSelector);
> >>>>        if (!b)
> >>>>        {
> >>>>            std::cerr<<   "Could not detect file format!"<<   std::endl;
> >>>>            return 1;
> >>>>        }
> >>>>
> >>>>        // b is true if any format was detected successfully.
> >>>>        if (tagSelector.tagId == 1)
> >>>>            std::cerr<<   "Detected FASTA."<<   std::endl;
> >>>>        else if (tagSelector.tagId == 2)
> >>>>            std::cerr<<   "Detected FASTQ."<<   std::endl;
> >>>>        else
> >>>>            std::cerr<<   "Unknown file format!"<<   std::endl;
> >>>>        return 0;
> >>>> }
> >>>>
> >>>>
> >>>> On 12/21/2011 05:15 PM, Felix Heeger wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I have to different functions I want to call depending on the fact if a
> >>>>> input file is fasta or fastq format.
> >>>>>
> >>>>> My approach to this is:
> >>>>>
> >>>>>> RecordReader<std::ifstream, SinglePass<>    >    reader(inFile);
> >>>>>> if (checkStreamFormat(reader, Fasta()))
> >>>>>> {
> >>>>>>        std::cerr<<    "Input file format is fasta."<<    std::endl;
> >>>>>>        [call function for fasta]
> >>>>>> }
> >>>>>> else if (checkStreamFormat(reader, Fastq()))
> >>>>>> {
> >>>>>>        std::cerr<<    "Input file format is fastq."<<    std::endl;
> >>>>>>        [call function for fastq]
> >>>>>> }
> >>>>>> else
> >>>>>> {
> >>>>>>        std::cerr<<    "ERORR: Input file format is not fasta or fastq."<<    std::endl;
> >>>>>>        return -1;
> >>>>>> }
> >>>>>
> >>>>> This works fine for fasta. However my fastq file is not recognized.
> >>>>> I looked into the code for checkStreamFormat a bit and the file is not
> >>>>> recognized because the iterator in the readRecord function reaches
> >>>>> atEnd before the quality meta data for the 35th record is finished (l. 392).
> >>>>> This happens with two different fastq files.
> >>>>>
> >>>>> So my theory is the following:
> >>>>> In the checkStreamFormat function LimitRecordReaderInScope
> >>>>> is used. The documentation states that this prevents the stream from
> >>>>> "rebuffering". This probably prevents the reader from finishing to read
> >>>>> the complete record and the recognition of the file fails.
> >>>>>
> >>>>> I hope I could make myself clear. I can also provide my code and a sample
> >>>>> fastq file if it would be helpful.
> >>>>>
> >>>>> Cheers,
> >>>>> felix
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> seqan-dev mailing list
> >>>>> seqan-dev@lists.fu-berlin.de
> >>>>> https://lists.fu-berlin.de/listinfo/seqan-dev
> >>>>
> >>>> _______________________________________________
> >>>> seqan-dev mailing list
> >>>> seqan-dev@lists.fu-berlin.de
> >>>> https://lists.fu-berlin.de/listinfo/seqan-dev
> >>>
> >>
> >> _______________________________________________
> >> seqan-dev mailing list
> >> seqan-dev@lists.fu-berlin.de
> >> https://lists.fu-berlin.de/listinfo/seqan-dev
> >
> >
> >
> > _______________________________________________
> > seqan-dev mailing list
> > seqan-dev@lists.fu-berlin.de
> > https://lists.fu-berlin.de/listinfo/seqan-dev
> 
> _______________________________________________
> seqan-dev mailing list
> seqan-dev@lists.fu-berlin.de
> https://lists.fu-berlin.de/listinfo/seqan-dev





<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
  • References:
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Felix Heeger <fheeger@mi.fu-berlin.de>
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Felix Heeger <fheeger@mi.fu-berlin.de>
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
  • seqan-dev - January 2012 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal