FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] CheckStreamFormat for FastQ

<-- thread -->
<-- date -->
  • From: Felix Heeger <fheeger@mi.fu-berlin.de>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Fri, 06 Jan 2012 14:35:36 +0100
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] CheckStreamFormat for FastQ

Hi Manual,

thank you for your effort. I checked your suggestion today and it did
not fix my problem. Also your example program can not identify my FASTQ
file. I am pretty sure it is valid FASTQ as other programs work fine on
it. I attached the first part of the file, if you want to have a look at
it.

felix

On Wed, 2011-12-21 at 18:31 +0100, Manuel Holtgrewe wrote:
> Felix,
> 
> The documentation of checkStreamFormat() was misleading. I fixed it in 
> [10948].
> 
> http://docs.seqan.de/seqan/dev2/?i=Function.checkStreamFormat
> 
> (The documentation is regenerated every hour, so you might wait for a 
> bit to see it).
> 
> The following is a simple example program I compiled and tested. Please 
> write another email, if the problem persists.
> 
> HTH,
> Manuel
> 
> #include <fstream>
> #include <iostream>
> 
> #include <seqan/sequence.h>
> #include <seqan/stream.h>
> 
> int main(int argc, char ** argv)
> {
>      using namespace seqan;
> 
>      if (argc != 2)
>          return 1;
>      std::fstream in(argv[1]);
> 
>      RecordReader<std::fstream, SinglePass<> > reader(in);
>      AutoSeqStreamFormat tagSelector;
>      bool b = checkStreamFormat(reader, tagSelector);
>      if (!b)
>      {
>          std::cerr << "Could not detect file format!" << std::endl;
>          return 1;
>      }
> 
>      // b is true if any format was detected successfully.
>      if (tagSelector.tagId == 1)
>          std::cerr << "Detected FASTA." << std::endl;
>      else if (tagSelector.tagId == 2)
>          std::cerr << "Detected FASTQ." << std::endl;
>      else
>          std::cerr << "Unknown file format!" << std::endl;
>      return 0;
> }
> 
> 
> On 12/21/2011 05:15 PM, Felix Heeger wrote:
> > Hi,
> >
> > I have to different functions I want to call depending on the fact if a
> > input file is fasta or fastq format.
> >
> > My approach to this is:
> >
> >> RecordReader<std::ifstream, SinglePass<>  >  reader(inFile);
> >> if (checkStreamFormat(reader, Fasta()))
> >> {
> >>      std::cerr<<  "Input file format is fasta."<<  std::endl;
> >>      [call function for fasta]
> >> }
> >> else if (checkStreamFormat(reader, Fastq()))
> >> {
> >>      std::cerr<<  "Input file format is fastq."<<  std::endl;
> >>      [call function for fastq]
> >> }
> >> else
> >> {
> >>      std::cerr<<  "ERORR: Input file format is not fasta or fastq."<<  std::endl;
> >>      return -1;
> >> }
> >
> > This works fine for fasta. However my fastq file is not recognized.
> > I looked into the code for checkStreamFormat a bit and the file is not
> > recognized because the iterator in the readRecord function reaches
> > atEnd before the quality meta data for the 35th record is finished (l. 392).
> > This happens with two different fastq files.
> >
> > So my theory is the following:
> > In the checkStreamFormat function LimitRecordReaderInScope
> > is used. The documentation states that this prevents the stream from
> > "rebuffering". This probably prevents the reader from finishing to read
> > the complete record and the recognition of the file fails.
> >
> > I hope I could make myself clear. I can also provide my code and a sample
> > fastq file if it would be helpful.
> >
> > Cheers,
> > felix
> >
> >
> >
> > _______________________________________________
> > seqan-dev mailing list
> > seqan-dev@lists.fu-berlin.de
> > https://lists.fu-berlin.de/listinfo/seqan-dev
> 
> _______________________________________________
> seqan-dev mailing list
> seqan-dev@lists.fu-berlin.de
> https://lists.fu-berlin.de/listinfo/seqan-dev

@DHCDZDN1:106:5:2203:1124:1021#0/1
CGCCCCTCCTTAGGCAACCTGGTGGTCCCCCGCTCCCGGGAGGTCACCATATTGATGCCGAACTTAGTGCGGACACCCGATCGGCATAGCGCACTACAGC
+
b__eeeeegggeghf_eghfiifgfeeghifhhfghiiiiegg\bceeedadddcdcccc\accbbcbcccaZac`accccccccRXb`ba[acc_b`bc
@DHCDZDN1:106:5:2203:1134:1060#0/1
CTCTTGAATTCATGTCCTCTTCTCTATTCCCATGGCCACTGTCTTAGTTGAGGACTTCATCTTCTCTTATTTGGACTACTGTCATGGCCTTCTTTCTGGG
+
_bbeeeccgggggiiiiiiiiiiiiiiiiiifhiiiihiiigghihiiiighiOWafghiiiiiiiiiiiiihfg_fgiihhgegeggeeeeedddcbbb
@DHCDZDN1:106:5:2203:1175:1067#0/1
TCCGAGTGTTGTGGATTAGGTTAACGATCGTCTCGTTCCCCTTTTGGTATTCGTTGCACAAGTATTGGGCCAGGAAAATGAGCAGTTCCCGTCCAACAGC
+
abbececcgggggihiiiiigghhhihhhihfhhiiiiiiihiiiiieghiiihiii`ghigbdeggedeebbc`accbccccccbccccc`acc]bccc
@DHCDZDN1:106:5:2203:1136:1093#0/1
CCCTCCTTAGGCAACCTGGTGGTCCCCCGCTCCCGGGAGGTCACCATATTGATGCCGAACTTAGTGCGGACACCCGATCGGCATAGCGCACTACAGCCCA
+
b__eeeeefggeghhhhhhegf[beghfhcghghhdhegfddgggdegededeeddccc_accb`bbc\]_accc_]aacaccacbcccca]bbbccca[
@DHCDZDN1:106:5:2203:1191:1105#0/1
CGGGCGCACATTGCCAGGGTACCTCAGACCCCTGAGCTCCGTGCAGATGTCTGTGTCCAGCCCTGGGCCCCCAGCCCCTCATCTGCATGAGAACTTGGGG
+
bbbeeeeegggggiiiiiiegiiiiiiiiiiiiigiiiiiibgghiiiihiihggegfggeeeedcccaccaccccccacccbbccG]_bcbcccccaaa
@DHCDZDN1:106:5:2203:1115:1107#0/1
CTCTGCTGGGGGTAGGCACTTCGCACAGGGTACACAGCGGTCTGGTAGGGGTTGGGGGAGGAGGAGTATGGTGGCACTGCCCCGCTGGTGGGGGGACAGG
+
bbbeeeeegggg_fgigfiiiiiiiiiiiiefgfhhiiiiaghgg^dgeeeX`accccT_aW^a^_X^bcc`bc[^`bc]bacccaccJ[^cccEHOWW^
@DHCDZDN1:106:5:2203:1149:1112#0/1
CACGCCCCCACGGGATACAGCAGTGATAAAAATTAAGCCATGAACGAAAGTTCGACTAAGCTATGTTGATTAAGGGTTGGTTAATTTCGTGCCAGCCACC
+
b__eeeeegggggighfhfghffcefhhiiifiihfhiiifgdfgfeffdbgfhgfgeeeecbddcddbcc`bccc``acacccdccbb`aacbcccc_W
@DHCDZDN1:106:5:2203:1152:1131#0/1
CCTCCCAACACAGCTTTCCGCTTCAGCGGACCCGTCCCGTCCTGATCCCGCGGCCGCCATGGTGCACGACTGTCCACACACCCACGCAGGCTCACCCATG
+
_^^^cc]cecg^e^[eaeYefUef]a`[Y^eg_ee\eeW\_[Z_ZbdZ`\[`]WWVTTVa`^JYSYSWWaa]`b]_`_^W[_]WQX_EX[]^_]`BBBBB
@DHCDZDN1:106:5:2203:1148:1148#0/1
CGGGAGGACGACGGGGCGGACGTCGTGTGCTCTGTGAACCATGAGTCTCTAAAGGGGGCTGACAGGTCGACCTCTCAGCGCATCGAAGTCCTGTACACAC
+
abbeeecegggggiiighgeecaccZ_^acbcbbbbcccccbcbbbbbcc`bccccaccccccbc_Y`aacccbbb`bcccL[aab]^^bbbcbcbbbba
@DHCDZDN1:106:5:2203:1074:1161#0/1
CGCGGCAGCACAGCGCCGGGCAGCTCTGCCGCCTGCAAGATGTCGCCAGCCCTCTGAAGCTTCATCGGACAGAGACTTTTCCTGCCTACCGATCCGAGCA
+
bbbeeeeegggeghiiiiiihhiiiiihiihhgdegceeeedddcccaccccccccccccccccddbc_aaabb[^]bcc]b]``a]_bcRTX[^[[_aX
@DHCDZDN1:106:5:2203:1243:1169#0/1
AAATAGGTTTGGGGAAGTGGATAAGAGTTAACTTTAAAAGGAAATGTCAATCAGGCACTGCATACATTAATCTAGGATTCAGGCAAGGGTTCAAGCTCGA
+
^__eeeecgggggifffdghghhiiggdghhiiihiiihiiiihhhghiiiiiihhhiiiiiiiiiiiiiiiggggfgeeeeeecdcccGZ`cbbccccc
@DHCDZDN1:106:5:2203:1218:1199#0/1
CGCAGGTACTTGAGAACGTACAGGGGCAGACAGCTGACCAGAGTGATAACAGAGACCTTCCACAAGAACGACAAAGTGGCGATGAAGTACACATCGAGAT
+
bbbeeecegggggiiiiigfhiiiiiihighihiiiiiiiiih^bghihhiihihiihiigggggeeeeccccccb`bccccaabbcYY__bbbbabaa^
@DHCDZDN1:106:5:2203:1185:1220#0/1
CTCCGGGAAATGGCAGATGCAAAGAGCAAGGGGGTGCAGGTGTGCACACCTGCTCAGCGATATGGGATGTGGGGACAGACTTCACTCAGTTTTCATCCCA
+
bbbeeeeeggggcghifhiifgiihihihhhiiiV`gghighhggfgeeeeeddddddcaccccccbbabbccca]aabcccbbbccccbcdcbbccccb
@DHCDZDN1:106:5:2203:1218:1246#0/1
ACCGCCTCTTCACGGGAAGGTCAATTTCACTGATTGGAAGTAAGAGACAGTTAAACCCTCGTGTGGCCTTTCATACAAGTCCTTAATTAGAGAACAAATG
+
^aaeeeeegggggiiighhifgiiiiiiiiiifhiiifggfghhiihihgghiiiiiiiiicgggeeeeedddddbdbcbccccccdddcccbbc^bccb
@DHCDZDN1:106:5:2203:1194:1248#0/1
CGAGAAATATACCCCACCAGCAAGAAAGTAAGCAGGGATGTTTGCTGCTTCAACTGCAGCTGAGGCCCAAGGCGGCTAGGAGACAGTTGTAAGTATTCGG
+
_b_eccceeggggiidgffhiiihghiifhfhhggiiefggfhhfhdhiiiiiiiiihhihichhfgggdeeecccaccc]bW^bb]b_]`b_]abcbc^
@DHCDZDN1:106:5:2203:1393:1024#0/1
GTCTGCAGGGCACGGCAGGGTGTGCAACAATTAATAATTTTTCTCCATTTCCACTGAGATCATACCTCCTTGTAGAAAGTATCTGTTCCTTACTCATCTG
+
___cccccggggghhhhhhhXcYaeffffhhhfhfhfhhhhhhhhhhhhhhhefghhhhcggeggdddddbbbbbdbbb``abbccbbbbbbbbbbbbbb
@DHCDZDN1:106:5:2203:1428:1052#0/1
ATGGACCCACATTTTCATTGGTGGTTCCACTGGGCAAACTAAACATGTAGCATAGAGTCATGGGCATGCTGGCGTTAGGATGGCCGTGGCATAGTTAGCG
+
^__eceeeegcgghhhfhhhheeghhhh_fffhhhacafghddfeghhfghhhhh_efffhhhhfbgdgfeggec]]abbb`_bbc\\^a^__bcbb_ba
@DHCDZDN1:106:5:2203:1406:1061#0/1
CTCACCGCTCCATGTGCGTCCCTCCCGAAGCTGCGCGCTCGGTCGAAGAGGACGACCTTCCCCGATAGAGGAGGACCGTTCTTCGGTCAAGGGTATACGA
+
bbbeeeeegggggiiihiigfhiiiiiiiiifgfghiiiige^c`^acccc_ac\YacccccccaZWabcaacc_ac_X[^acccc_]]b`b^GX_`ca[
@DHCDZDN1:106:5:2203:1354:1064#0/1
GAGCAACCCAAAGGTTATGAATCTCATCAGTAAATTGTCAGCCAAGTTTGGAGGTCAAGCATAATGCCCTTCTGACAAATAAAGCCCTTGCTGAAGGAAA
+
bbbeeeeegggggifgiiiiiiiiihiiiiiiiiiiighiiiighifgiiiiiieghiiihiiihifhiiiiigfggggeeeeeedcddccccccccccc
@DHCDZDN1:106:5:2203:1394:1084#0/1
CTTGCGGCTGCCGTGCGCGGGCACAGGTTTGCCCTTCCCGTAGTATCCCGTGAAGATGGCCCTGACCTCCAGCTTCTTGGCCAGGCTCAGCAGCCCGCGC
+
___cccccggae^[b_d_cgh[U__\bM^dZV^c^b`bb[UZ^KZZ_bbTKZW^_]R_bX^aW[bbb_`^^`bbbYY_GY]bX[_BBBBBBBBBBBBBBB
@DHCDZDN1:106:5:2203:1261:1093#0/1
GCCATGGCCACATCCGGGCTGAACCGCATGGCTGTCTCGATGCCCGACTCCTGCTCAAAGTAGCCCGAGCCCGACCCTGAATAGAGGTTATCCAGCTCAT
+
___cccccgegcgghY[degfffhhhhdcgdffffddddb]eegfhhheggdddddc]`bZ`bbbb]W^_\^aaaa]^^^bb`bbbaY^bbbbRR]Y]_`
@DHCDZDN1:106:5:2203:1428:1104#0/1
GCTTGCTGATCAGTGGTAAGATTACATTGCAAGTAATGAGTTCCACCACTACATGGCGCCCAGTCCGGGTCTCCAAGTGGGGCTTTGGCACTAGTCCTTG
+
bbbeeeeeegggghiihhhiiiihifhiiihgighiiiiiiiiichiiiihigiiiiiiiiiiiiiigggeeeddddddccccccccccccccccccccc
@DHCDZDN1:106:5:2203:1339:1107#0/1
GTAGAGCTCACTGGGAAAAAAATGCACACATTGCCATGTCCACATATATGCTTACAGTTTCAGCAGGAATGTGTGGGGCACCTACCACGTGCCAGACCCA
+
aaaeeeeegfgggiighiiiiiiiiiiiiiiiiiiiiiiiiihiihiiiiiiiiiiieghiiiiiigggggeeeeeccccccccccccc`acc`bacccc
@DHCDZDN1:106:5:2203:1267:1116#0/1
GTAGCTGCCCGAGTAAAGCCAGTCCTGTGCTCCATTGTAGGGGTTAGGACCTTGATGGAACCACCTCGTCTTCCAGTCCTCCTCCTTTGTCCCTTTGGGG
+
bbbeeeeegggegbedfhhhhhghiiaeffghidgiihibfhiffg]e^egddgcegbfhigfggfee^cccdddbbcbbbccccccccbbbccccbb_a
@DHCDZDN1:106:5:2203:1287:1122#0/1
GTGTGTGTGCCGTGGGGTGGGGTGTCACTATACTGTCCTAAGATACAAGCCAAGGGTAATCTCCGTGGACACACACGCTACCCTGCCTCCCATAATCCCT
+
_bbcceecegggcegfhbdcfggdcggfhagZegcggfhhhhghdggfgdee_baaZ^bcccccaacc^b`abcaacca_ccaYX``]b`aRSGS__bcb
@DHCDZDN1:106:5:2203:1465:1126#0/1
GTCACGGACGGCGATACGGCGACCAGGGTACAGAACAGAGTCTGCCAGCCTTCTGGCCAGGAAGGAAATCTAGACTATCGTTCTACACATCCAGGCAGAA
+
_aaeeeeegggggiiiiiihF_aggfee^ccd_bddcccc`bccccccccbccccccccccccccaccbccccbccccbc`accdcbbcbbcbbca^^_b
@DHCDZDN1:106:5:2203:1311:1161#0/1
GTGGAATGTGCGTCTTCAACGTTGTATTTTTTCTCACATACAAGCGGTGCAGCGACAATGGTAACTGCTACACATGACTTCCAGGGCCAACCATGACGCA
+
aaaeceeeggggghiiiiiiiiiigghiiiiihhiiiiiiiiiiiiighiiiiiiggeeeebdddddccccccccccbccccbcccacccccccccccca
@DHCDZDN1:106:5:2203:1259:1184#0/1
CTGGAGTGCAGTGGCTATTCACAGGCGCGATCCCACTAATGATCAGCACGGGAGTTTTGACCTGCTCCGTTTCCGACCTGGGCCGGTTCACCCCTCCTTA
+
_[_^cccc^eeJbRdZdfK`[^ebdP[WWHOOaZ^HHIIXXXNaIXbSZ_edBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@DHCDZDN1:106:5:2203:1274:1191#0/1
GGGTACTGGAGACACGATGCTGTCTGAGCCAACAGAGGATTCGCTACTTTGAGTAAAAGGTGGTGAGCTGGATGTTTCTGACTGTACTGGCAGTTCGGCA
+
___`cdeegggggighghfhgfadfhfffhbafZaafg_ghiihaegfgihae\_eghhhM\dY`Zaaccd]_b]]``bbZ`]bZ]``Y`aX^BBBBBBB
@DHCDZDN1:106:5:2203:1416:1193#0/1
GGCCCAGGTCGGAAACGGAGCAGGTCAAAACTCCCGTGCTGATCAGTAGTGGGATCGCGCCTGTGAATAGCCACTGCACTCCAGCCTGGGCAACATAGCG
+
bbbeeeeegggggiiiiiiiiiii^ggiiihiiiiigiiiiiiiiihiifgiiiggggeccccbbcdddccccccbccccccccccbaccccccccbcca
@DHCDZDN1:106:5:2203:1311:1197#0/1
CCTCGTGTGGCCTTTCATACAAGTCCTTATTTAGAGAACAAATGATTATGCTACCTTTGCACGGTCAGAATACCGCGGCCGTTTAACTGATGTCACCGGG
+
abbeeceeaggggiiiiiiiiiighhiiiiiifihihhghiiiihiiiiiiiiiiiiiiiiiihggfgihh`gggcdccccccccccccccccccccacc
@DHCDZDN1:106:5:2203:1282:1244#0/1
CATCGAGGTCGTAAACCCTATTGTCGATATGGACTCTTAAACAGGATTGCGCTGTTATCCCTAGGGTAACTTGTTCCGTTGATCAAAGAAGTTTTGGATC
+
___eecee_ceegfhhhhhfhhhgfhfdfgfhhffhhhfdddfhhhhh_eghhfdhffgggddceeZ^bddcddcbbc]`aaccdc`bbab_bcccc]`b
@DHCDZDN1:106:5:2203:1437:1249#0/1
ATTATATTAGGAGTTAATTGGATATTTTTGGTATCAGGTTAGGTGAGAAGATGGGACTAGAATACGTCAATTTGGAAGTTGTCATCATATCAAGAGCTTT
+
^aaeeeeegggceeggigiiiiihiiiiiiieeghhhifghiiggfhhhhghhiigiiiiiiiiihghifgiiigfgggeeeeeeedddddddccccccc
@DHCDZDN1:106:5:2203:1526:1031#0/1
CATGTCTTTGGATAGAGAAATCAGCGTTTTCAGCTAGATTGCAAAACTGGTTTAAAGTATTCCCTCGGCGTGGGGCGAATACCAAGATCGGAAGAGCGGT
+
b_beeeeegfggcghhhfhihiiifgbfgiihiiiiiihiiiihiiiihifhhhhiieghhiiihiiiigcecccaaacccccccccccccaccacbac]
@DHCDZDN1:106:5:2203:1732:1032#0/1
CCGCCACTCCGGATTCGGGGATCTGAACCCGACTCCCTTTCGATCGGCCGAGGGCAACGGAGGCCATCGCCCGTCCCTTCGGAACGGCGCTCGCCCATCC
+
___c^cccg^egedfeghhha^c[aa^c_egad`defheRb_bHUS\^dbaZ\_aaaa_aaaaa_abGGWTZ_EWTT]bb_]OQT__BBBBBBBBBBBBB
@DHCDZDN1:106:5:2203:1669:1033#0/1
GTGTGTGGTATGTGTGACAGTATAGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTATGTGGAGGGGGTGTGTGTGTGTGTGGTGTGTGTATGTGTGG
+
aaaccceecgggghghghghgggghegdgdeefbfdfgffgefgfcfegegefefefgH\bH\bHGT`[`ZQZ^`^^``GQ[GQ[bG[_a^`]aGS]Y`B
@DHCDZDN1:106:5:2203:1556:1034#0/1
AGAAGGGTTCAAAGACCAAACCAAGAAGATCGGATTGCCGAGACTATCACATCTTTGTAAACATGCAGTGTGAAGTAGGAGAAAGAAAGTGACAGGTAAC
+
J\^^cccceee^edZ^^^Yba`_c`cbcI^dfdbZ_ce]_c`U[\_SV__`bcdbd]V_dV\\`daZ]^aZ]]`_a_``R]TGZ_aa^YKYZ`]]^BBBB
@DHCDZDN1:106:5:2203:1578:1048#0/1
GTTTTTTTTTTTTTTCCAGAGCTCAGCAGCATGTGTGTTCAAAGGGCTGTGAATGTTGGGTTCTCCTAGCAGGCTCTGGATGGACAGCAGGATGGGCCTG
+
___cccccggggghffgd`V\c_c]]]X]dZ^ZMVMMU_bddbbba^_^KY]]_YTZ___QWWR]_]]__R_a_WJRY`RY]_XY][[[_aOSY_BBBBB
@DHCDZDN1:106:5:2203:1725:1059#0/1
CCCGTGCGGAGGAGCGAGGAGGAAGGAAGCGCTTCCTCAAGATGCCTTTTTCTCCCATCAGCTGAGAAACAGTTGTAATAATGCAGAAATCTGCTGTGCC
+
bbbeeeeeggfggigigiiiiiiiiiiihhiiiiiiiggggggeeeeeeddddddcccccccccccccccccbccccdddcdccccb]b_bbccccbccc
@DHCDZDN1:106:5:2203:1683:1061#0/1
TTTTCTTGGACTTTGGGGTCTTTTCTCAATTACCCAGTTGTATCCTGGCACCAATCCCTCTTTCAAAGCTGGGTAGAGACATGGTTTCTTTGCATGACTT
+
bbbeeeeegggggiiiiibfhiiiiiiiiiiiiiiiihiiihiiiiiiiiiiiiiiiiiiiiiiiiehhiiggggcceedddddbdccccccccdccccc
<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] CheckStreamFormat for FastQ
      • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
  • seqan-dev - January 2012 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal