FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] CheckStreamFormat for FastQ

<-- thread -->
<-- date -->
  • From: Felix Heeger <fheeger@mi.fu-berlin.de>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Fri, 06 Jan 2012 15:53:31 +0100
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] CheckStreamFormat for FastQ

Hi Manuel,

I did a fresh check out, but the still the same problem.

However if I remove the last 6 records from the file it will be
recognized. I also removed the first 6 records to make sure it is the
file size that is causing the issue and not a specific record. Same
result.

In short: it is working for me if the file size is <= 8KB.

felix

On Fri, 2012-01-06 at 15:06 +0100, Manuel Holtgrewe wrote:
> I tested the program on the file that you attached and it worked. Does 
> the program detect the format of the small file, too?
> 
> $ make file_detect
> [...]
> $ ./sandbox/holtgrew/demos/file_detect /tmp/lane_5_p1.fastq
> Detected FASTQ.
> 
> Could you try again with a fresh checkout?
> 
> On 01/06/2012 02:35 PM, Felix Heeger wrote:
> > Hi Manual,
> >
> > thank you for your effort. I checked your suggestion today and it did
> > not fix my problem. Also your example program can not identify my FASTQ
> > file. I am pretty sure it is valid FASTQ as other programs work fine on
> > it. I attached the first part of the file, if you want to have a look at
> > it.
> >
> > felix
> >
> > On Wed, 2011-12-21 at 18:31 +0100, Manuel Holtgrewe wrote:
> >> Felix,
> >>
> >> The documentation of checkStreamFormat() was misleading. I fixed it in
> >> [10948].
> >>
> >> http://docs.seqan.de/seqan/dev2/?i=Function.checkStreamFormat
> >>
> >> (The documentation is regenerated every hour, so you might wait for a
> >> bit to see it).
> >>
> >> The following is a simple example program I compiled and tested. Please
> >> write another email, if the problem persists.
> >>
> >> HTH,
> >> Manuel
> >>
> >> #include<fstream>
> >> #include<iostream>
> >>
> >> #include<seqan/sequence.h>
> >> #include<seqan/stream.h>
> >>
> >> int main(int argc, char ** argv)
> >> {
> >>       using namespace seqan;
> >>
> >>       if (argc != 2)
> >>           return 1;
> >>       std::fstream in(argv[1]);
> >>
> >>       RecordReader<std::fstream, SinglePass<>  >  reader(in);
> >>       AutoSeqStreamFormat tagSelector;
> >>       bool b = checkStreamFormat(reader, tagSelector);
> >>       if (!b)
> >>       {
> >>           std::cerr<<  "Could not detect file format!"<<  std::endl;
> >>           return 1;
> >>       }
> >>
> >>       // b is true if any format was detected successfully.
> >>       if (tagSelector.tagId == 1)
> >>           std::cerr<<  "Detected FASTA."<<  std::endl;
> >>       else if (tagSelector.tagId == 2)
> >>           std::cerr<<  "Detected FASTQ."<<  std::endl;
> >>       else
> >>           std::cerr<<  "Unknown file format!"<<  std::endl;
> >>       return 0;
> >> }
> >>
> >>
> >> On 12/21/2011 05:15 PM, Felix Heeger wrote:
> >>> Hi,
> >>>
> >>> I have to different functions I want to call depending on the fact if a
> >>> input file is fasta or fastq format.
> >>>
> >>> My approach to this is:
> >>>
> >>>> RecordReader<std::ifstream, SinglePass<>   >   reader(inFile);
> >>>> if (checkStreamFormat(reader, Fasta()))
> >>>> {
> >>>>       std::cerr<<   "Input file format is fasta."<<   std::endl;
> >>>>       [call function for fasta]
> >>>> }
> >>>> else if (checkStreamFormat(reader, Fastq()))
> >>>> {
> >>>>       std::cerr<<   "Input file format is fastq."<<   std::endl;
> >>>>       [call function for fastq]
> >>>> }
> >>>> else
> >>>> {
> >>>>       std::cerr<<   "ERORR: Input file format is not fasta or fastq."<<   std::endl;
> >>>>       return -1;
> >>>> }
> >>>
> >>> This works fine for fasta. However my fastq file is not recognized.
> >>> I looked into the code for checkStreamFormat a bit and the file is not
> >>> recognized because the iterator in the readRecord function reaches
> >>> atEnd before the quality meta data for the 35th record is finished (l. 392).
> >>> This happens with two different fastq files.
> >>>
> >>> So my theory is the following:
> >>> In the checkStreamFormat function LimitRecordReaderInScope
> >>> is used. The documentation states that this prevents the stream from
> >>> "rebuffering". This probably prevents the reader from finishing to read
> >>> the complete record and the recognition of the file fails.
> >>>
> >>> I hope I could make myself clear. I can also provide my code and a sample
> >>> fastq file if it would be helpful.
> >>>
> >>> Cheers,
> >>> felix
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> seqan-dev mailing list
> >>> seqan-dev@lists.fu-berlin.de
> >>> https://lists.fu-berlin.de/listinfo/seqan-dev
> >>
> >> _______________________________________________
> >> seqan-dev mailing list
> >> seqan-dev@lists.fu-berlin.de
> >> https://lists.fu-berlin.de/listinfo/seqan-dev
> >
> 
> _______________________________________________
> seqan-dev mailing list
> seqan-dev@lists.fu-berlin.de
> https://lists.fu-berlin.de/listinfo/seqan-dev





<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
  • References:
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Felix Heeger <fheeger@mi.fu-berlin.de>
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
  • seqan-dev - January 2012 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal