Re: [Seqan-dev] CheckStreamFormat for FastQ
Hi,
My configuration:
Ubuntu 11.04 64bit
64 bit processor
gcc version 4.5.2
felix
On Fri, 2012-01-06 at 16:53 +0100, Manuel Holtgrewe wrote:
> What is your configuration (OS, 32/64 bit, compiler, version)
>
> On 01/06/2012 03:53 PM, Felix Heeger wrote:
> > Hi Manuel,
> >
> > I did a fresh check out, but the still the same problem.
> >
> > However if I remove the last 6 records from the file it will be
> > recognized. I also removed the first 6 records to make sure it is the
> > file size that is causing the issue and not a specific record. Same
> > result.
> >
> > In short: it is working for me if the file size is<= 8KB.
> >
> > felix
> >
> > On Fri, 2012-01-06 at 15:06 +0100, Manuel Holtgrewe wrote:
> >> I tested the program on the file that you attached and it worked. Does
> >> the program detect the format of the small file, too?
> >>
> >> $ make file_detect
> >> [...]
> >> $ ./sandbox/holtgrew/demos/file_detect /tmp/lane_5_p1.fastq
> >> Detected FASTQ.
> >>
> >> Could you try again with a fresh checkout?
> >>
> >> On 01/06/2012 02:35 PM, Felix Heeger wrote:
> >>> Hi Manual,
> >>>
> >>> thank you for your effort. I checked your suggestion today and it did
> >>> not fix my problem. Also your example program can not identify my FASTQ
> >>> file. I am pretty sure it is valid FASTQ as other programs work fine on
> >>> it. I attached the first part of the file, if you want to have a look at
> >>> it.
> >>>
> >>> felix
> >>>
> >>> On Wed, 2011-12-21 at 18:31 +0100, Manuel Holtgrewe wrote:
> >>>> Felix,
> >>>>
> >>>> The documentation of checkStreamFormat() was misleading. I fixed it in
> >>>> [10948].
> >>>>
> >>>> http://docs.seqan.de/seqan/dev2/?i=Function.checkStreamFormat
> >>>>
> >>>> (The documentation is regenerated every hour, so you might wait for a
> >>>> bit to see it).
> >>>>
> >>>> The following is a simple example program I compiled and tested. Please
> >>>> write another email, if the problem persists.
> >>>>
> >>>> HTH,
> >>>> Manuel
> >>>>
> >>>> #include<fstream>
> >>>> #include<iostream>
> >>>>
> >>>> #include<seqan/sequence.h>
> >>>> #include<seqan/stream.h>
> >>>>
> >>>> int main(int argc, char ** argv)
> >>>> {
> >>>> using namespace seqan;
> >>>>
> >>>> if (argc != 2)
> >>>> return 1;
> >>>> std::fstream in(argv[1]);
> >>>>
> >>>> RecordReader<std::fstream, SinglePass<> > reader(in);
> >>>> AutoSeqStreamFormat tagSelector;
> >>>> bool b = checkStreamFormat(reader, tagSelector);
> >>>> if (!b)
> >>>> {
> >>>> std::cerr<< "Could not detect file format!"<< std::endl;
> >>>> return 1;
> >>>> }
> >>>>
> >>>> // b is true if any format was detected successfully.
> >>>> if (tagSelector.tagId == 1)
> >>>> std::cerr<< "Detected FASTA."<< std::endl;
> >>>> else if (tagSelector.tagId == 2)
> >>>> std::cerr<< "Detected FASTQ."<< std::endl;
> >>>> else
> >>>> std::cerr<< "Unknown file format!"<< std::endl;
> >>>> return 0;
> >>>> }
> >>>>
> >>>>
> >>>> On 12/21/2011 05:15 PM, Felix Heeger wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I have to different functions I want to call depending on the fact if a
> >>>>> input file is fasta or fastq format.
> >>>>>
> >>>>> My approach to this is:
> >>>>>
> >>>>>> RecordReader<std::ifstream, SinglePass<> > reader(inFile);
> >>>>>> if (checkStreamFormat(reader, Fasta()))
> >>>>>> {
> >>>>>> std::cerr<< "Input file format is fasta."<< std::endl;
> >>>>>> [call function for fasta]
> >>>>>> }
> >>>>>> else if (checkStreamFormat(reader, Fastq()))
> >>>>>> {
> >>>>>> std::cerr<< "Input file format is fastq."<< std::endl;
> >>>>>> [call function for fastq]
> >>>>>> }
> >>>>>> else
> >>>>>> {
> >>>>>> std::cerr<< "ERORR: Input file format is not fasta or fastq."<< std::endl;
> >>>>>> return -1;
> >>>>>> }
> >>>>>
> >>>>> This works fine for fasta. However my fastq file is not recognized.
> >>>>> I looked into the code for checkStreamFormat a bit and the file is not
> >>>>> recognized because the iterator in the readRecord function reaches
> >>>>> atEnd before the quality meta data for the 35th record is finished (l. 392).
> >>>>> This happens with two different fastq files.
> >>>>>
> >>>>> So my theory is the following:
> >>>>> In the checkStreamFormat function LimitRecordReaderInScope
> >>>>> is used. The documentation states that this prevents the stream from
> >>>>> "rebuffering". This probably prevents the reader from finishing to read
> >>>>> the complete record and the recognition of the file fails.
> >>>>>
> >>>>> I hope I could make myself clear. I can also provide my code and a sample
> >>>>> fastq file if it would be helpful.
> >>>>>
> >>>>> Cheers,
> >>>>> felix
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> seqan-dev mailing list
> >>>>> seqan-dev@lists.fu-berlin.de
> >>>>> https://lists.fu-berlin.de/listinfo/seqan-dev
> >>>>
> >>>> _______________________________________________
> >>>> seqan-dev mailing list
> >>>> seqan-dev@lists.fu-berlin.de
> >>>> https://lists.fu-berlin.de/listinfo/seqan-dev
> >>>
> >>
> >> _______________________________________________
> >> seqan-dev mailing list
> >> seqan-dev@lists.fu-berlin.de
> >> https://lists.fu-berlin.de/listinfo/seqan-dev
> >
> >
> >
> > _______________________________________________
> > seqan-dev mailing list
> > seqan-dev@lists.fu-berlin.de
> > https://lists.fu-berlin.de/listinfo/seqan-dev
>
> _______________________________________________
> seqan-dev mailing list
> seqan-dev@lists.fu-berlin.de
> https://lists.fu-berlin.de/listinfo/seqan-dev