This data set came from the BFCounter program that introduced the Bloom filter for filtering out erroneous k-mers http://pritch.bsd.uchicago.edu/bfcounter.html Here is a snippet of one of these file: @EAS18:1:1:1:1:1119:0/1 NGTTACTTCGCGCTTTCACCGGAAGACGAAGCGCGCGATGCAGCGCGTCATATTCGTGACCANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +EAS18:1:1:1:1:1119:0/1 ltrim=1 rtrim=38 BTWa^`^``X^__bb_`_a]QX`\UW_H_\V^OZGZMZ_]RGWWGZTGZZWYT\GZX]_YWZBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @EAS18:1:1:1:1:927:0/1 NGTAGCAAATCAAAAAGGTGGCGCTGGCAACGCTGGCGAGTGAGCCTAAATTCAGTGCCGCCGTCATCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +EAS18:1:1:1:1:927:0/1 ltrim=1 rtrim=32 B`\`bbbbabbb_[`bbYb`Zbbba[\b\bbab`X^\^[\GTbVH^\]XY[_VQQ_T^^a^]\]bba\BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @EAS18:1:1:1:1:639:0/1 NACTTTTGCGGGAAGAATGGAAATAATATTAACGGTTTTAGTTTCTTGTTTGGTATTCAGTTGATCGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +EAS18:1:1:1:1:639:0/1 ltrim=1 rtrim=32 BbbbbbbZbbbbbbVbbbb`bbbbba^ab_^ab_]G]baa_\`aa[VTH^_KV`bb``^[U[H]H[RKBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @EAS18:1:1:1:1:1479:0/1 NCCTTTCTGCGCTGCATTAACTTCCTCGAAAAACCGAGTGAAGGGTCGATCGTGGTCAATGGCCAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +EAS18:1:1:1:1:1479:0/1 ltrim=1 rtrim=34 Ba^`aaaaa`abaaabbZ`[a]bbb[b]bbVa_bbbbb]ZY`aWMG^ab]baZZ\I_b`\MM[\aQBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @EAS18:1:1:1:1:1131:0/1 NCGCATACATCAGCGAGAAACCGCCATCACGACGCGGATCGGTTGGCTCATACAGCTCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +EAS18:1:1:1:1:1131:0/1 ltrim=1 rtrim=42 BbUa_Y_b__`[_bZWQ_[Xab`bba`a_a]bb``_\\V[ZTIXGT_a_Z]^`MM\\[BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @EAS18:1:1:1:1:1469:0/1 NGAACCTCGGAATGCCGGAATCAGAATCCGGTGCCTTACCGCTTGGCGATACCCCAACAAATTGGTTTTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +EAS18:1:1:1:1:1469:0/1 ltrim=1 rtrim=30 BaZbbbabaa`[`aa`_`[`babbb_ZQ_U]]_bb^a_W`aba]^\ZTb\W^^_SHT`I]H\QZJTbabaBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @EAS18:1:1:1:1:989:0/1 NGAGTGAAACACCATTGCCAGAAAATCATTTACTGGATGCGCGGTTACGTAAAGAAAAAGAAGATGCAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +EAS18:1:1:1:1:989:0/1 ltrim=1 rtrim=31 Bbab]_bbbbbbb\baaab[]W\\a`bba`^Za`Y`[\abY`\_bZbbbW^aZVV`bbXSJa[`S[aa`BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB On 1/30/2013 6:00 AM, seqan-dev-request@lists.fu-berlin.de wrote: > Send seqan-dev mailing list submissions to > seqan-dev@lists.fu-berlin.de > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.fu-berlin.de/listinfo/seqan-dev > or, via email, send a message with subject or body 'help' to > seqan-dev-request@lists.fu-berlin.de > > You can reach the person managing the list at > seqan-dev-owner@lists.fu-berlin.de > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of seqan-dev digest..." > > > Today's Topics: > > 1. Re: Zlib linking errors under 64bit Windows 7 (Holtgrewe, Manuel) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 30 Jan 2013 10:14:31 +0100 > From: "Holtgrewe, Manuel" <manuel.holtgrewe@fu-berlin.de> > To: SeqAn Development <seqan-dev@lists.fu-berlin.de> > Subject: Re: [Seqan-dev] Zlib linking errors under 64bit Windows 7 > Message-ID: > <FCCAB9D80C3DAB47B5601C5B0E62872B29435096@ex02b.campus.fu-berlin.de> > Content-Type: text/plain; charset="iso-8859-1" > > Hi Theo, > > The FASTQ format is a good example for the file format fuzziness in bioinformatics. There is no real standard, only an article [1] telling what is there. The supplemental material is a tar.gz file that contains an example which says different ids for sequence and quality meta are an error. > > That said, what is the source of the FASTQ file? Ignoring the quality meta would not be a big change in the parser and be a change that I would be quite willing to make given that a "major" source of FASTQ generates such files. > > In a future version, such things should be configurable when reading FASTQ but alas we do not currently have time to make the change to make the I/O of sequences more configurable. > > HTH, > Manuel > > [1] http://nar.oxfordjournals.org/content/38/6/1767 > > ________________________________ > From: Theodore Omtzigt [theo@stillwater-sc.com] > Sent: Wednesday, January 30, 2013 1:31 AM > To: SeqAn Dev List > Subject: Re: [Seqan-dev] Zlib linking errors under 64bit Windows 7 > > I have got the linking to work outside of the SeqAn CMake build environment, so I have now at least isolated it to be a problem in the SeqAn build environment. > > However, I also run into a format error on a fastq file that is passing with the standard code from Genome Research Lab (kseq.h). > > The problem occurs in the last test in this code fragment from read_fasta_fastq.h > template <typename TIdString, > typename TQualString, > typename TFile, > typename TPass> > inline int > _readQualityBlock(TQualString & qual, > RecordReader<TFile, TPass > & reader, > unsigned const seqLength, > TIdString const & meta, > Fastq const & /*tag*/) > { > // READ AND CHECK QUALITIES' META > if (atEnd(reader)) > return EOF_BEFORE_SUCCESS; > if (value(reader) != '+') > return RecordReader<TFile, TPass >::INVALID_FORMAT; > goNext(reader); > if (resultCode(reader)) > return resultCode(reader); > if (atEnd(reader)) // empty ID, no sequence, this is legal? TODO > return 0; > > CharString qualmeta_buffer; > int res = readLine(qualmeta_buffer, reader); > if (res && res == EOF_BEFORE_SUCCESS) > return EOF_BEFORE_SUCCESS; > else if (res) > return RecordReader<TFile, TPass >::INVALID_FORMAT; > > // meta string has to be empty or identical to sequence's meta > if ((qualmeta_buffer != "") && (qualmeta_buffer != meta)) > return RecordReader<TFile, TPass >::INVALID_FORMAT; > ... > > and the test fails because of the qualmeta_buffer not being equal to meta. > + meta {data_begin=0x001757e0 "EAS18:1:1:1:1:1119:0/1??????????????????????????" data_end=0x001757f6 "??????????????????????????" data_capacity=32 } const seqan::String<char,seqan::Alloc<void> > & > + qualmeta_buffer {data_begin=0x0017dc40 "EAS18:1:1:1:1:1119:0/1 ltrim=1 rtrim=38?????????????????????????????????" data_end=0x0017dc67 "?????????????????????????????????" data_capacity=49 } seqan::String<char,seqan::Alloc<void> > > > that section: " ltrim=1 rtrim=38" appears to be a format difference that kseq.h accepts as a valid quality segment, but read_fasta_fastq.h does not. > > So, this question now has bifurcated into a second question and that is what is considered a valid fastq format? > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <https://lists.fu-berlin.de/pipermail/seqan-dev/attachments/20130130/35fb6afe/attachment.html> > > ------------------------------ > > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev > > > End of seqan-dev Digest, Vol 40, Issue 4 > **************************************** >