FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] CheckStreamFormat for FastQ

<-- thread -->
<-- date -->
  • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Mon, 09 Jan 2012 11:09:30 +0100
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] CheckStreamFormat for FastQ

OK, apparently the file was converted from unix line endings to windows line endings through the list.

r1018 has a fix to the bug. Try updating and report back if the problem is not fixed yet.

HTH

On 01/09/2012 10:35 AM, Felix Heeger wrote:
Hi,

My configuration:
Ubuntu 11.04 64bit
64 bit processor
gcc version 4.5.2

felix

On Fri, 2012-01-06 at 16:53 +0100, Manuel Holtgrewe wrote:
What is your configuration (OS, 32/64 bit, compiler, version)

On 01/06/2012 03:53 PM, Felix Heeger wrote:
Hi Manuel,

I did a fresh check out, but the still the same problem.

However if I remove the last 6 records from the file it will be
recognized. I also removed the first 6 records to make sure it is the
file size that is causing the issue and not a specific record. Same
result.

In short: it is working for me if the file size is<= 8KB.

felix

On Fri, 2012-01-06 at 15:06 +0100, Manuel Holtgrewe wrote:
I tested the program on the file that you attached and it worked. Does
the program detect the format of the small file, too?

$ make file_detect
[...]
$ ./sandbox/holtgrew/demos/file_detect /tmp/lane_5_p1.fastq
Detected FASTQ.

Could you try again with a fresh checkout?

On 01/06/2012 02:35 PM, Felix Heeger wrote:
Hi Manual,

thank you for your effort. I checked your suggestion today and it did
not fix my problem. Also your example program can not identify my FASTQ
file. I am pretty sure it is valid FASTQ as other programs work fine on
it. I attached the first part of the file, if you want to have a look at
it.

felix

On Wed, 2011-12-21 at 18:31 +0100, Manuel Holtgrewe wrote:
Felix,

The documentation of checkStreamFormat() was misleading. I fixed it in
[10948].

http://docs.seqan.de/seqan/dev2/?i=Function.checkStreamFormat

(The documentation is regenerated every hour, so you might wait for a
bit to see it).

The following is a simple example program I compiled and tested. Please
write another email, if the problem persists.

HTH,
Manuel

#include<fstream>
#include<iostream>

#include<seqan/sequence.h>
#include<seqan/stream.h>

int main(int argc, char ** argv)
{
        using namespace seqan;

        if (argc != 2)
            return 1;
        std::fstream in(argv[1]);

        RecordReader<std::fstream, SinglePass<>    >    reader(in);
        AutoSeqStreamFormat tagSelector;
        bool b = checkStreamFormat(reader, tagSelector);
        if (!b)
        {
            std::cerr<<    "Could not detect file format!"<<    std::endl;
            return 1;
        }

        // b is true if any format was detected successfully.
        if (tagSelector.tagId == 1)
            std::cerr<<    "Detected FASTA."<<    std::endl;
        else if (tagSelector.tagId == 2)
            std::cerr<<    "Detected FASTQ."<<    std::endl;
        else
            std::cerr<<    "Unknown file format!"<<    std::endl;
        return 0;
}


On 12/21/2011 05:15 PM, Felix Heeger wrote:
Hi,

I have to different functions I want to call depending on the fact if a
input file is fasta or fastq format.

My approach to this is:

RecordReader<std::ifstream, SinglePass<>     >     reader(inFile);
if (checkStreamFormat(reader, Fasta()))
{
        std::cerr<<     "Input file format is fasta."<<     std::endl;
        [call function for fasta]
}
else if (checkStreamFormat(reader, Fastq()))
{
        std::cerr<<     "Input file format is fastq."<<     std::endl;
        [call function for fastq]
}
else
{
        std::cerr<<     "ERORR: Input file format is not fasta or fastq."<<     std::endl;
        return -1;
}

This works fine for fasta. However my fastq file is not recognized.
I looked into the code for checkStreamFormat a bit and the file is not
recognized because the iterator in the readRecord function reaches
atEnd before the quality meta data for the 35th record is finished (l. 392).
This happens with two different fastq files.

So my theory is the following:
In the checkStreamFormat function LimitRecordReaderInScope
is used. The documentation states that this prevents the stream from
"rebuffering". This probably prevents the reader from finishing to read
the complete record and the recognition of the file fails.

I hope I could make myself clear. I can also provide my code and a sample
fastq file if it would be helpful.

Cheers,
felix



_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev


_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev



_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev



_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev



<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Felix Heeger <fheeger@mi.fu-berlin.de>
  • References:
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Felix Heeger <fheeger@mi.fu-berlin.de>
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Felix Heeger <fheeger@mi.fu-berlin.de>
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Felix Heeger <fheeger@mi.fu-berlin.de>
  • seqan-dev - January 2012 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal