On 3/2/2013 6:00 AM,
seqan-dev-request@lists.fu-berlin.de wrote:
What are you reading the sequences into? DnaString? CharString? Can you give more details here? Your snippet parses nicely with SequenceStream. Currently, there is a limitation that when reading sequence into Dna5String then any non-CGATN character causes an error. We will resolve this issue with a configuration object to the readRecord function in the future that allows to switch between error/coerce-to-N for other characters (e.g. when there are IUPAC characters indicating an A-C ambiguity). *m For this snippet, the SequenceStream gets in an infinite loop with _atEnd and _isGood both go false, and subsequent calls to readRecord to not turn _atEnd to true: // Create SequenceStream object for reading, optimized for reading single records. seqan::SequenceStream seqStream(argv[1]); if (!isGood(seqStream)) { std::cerr << "ERROR: Could not open the file.\n"; return 1; } static const size_t NUM_BITS = 8*1024*1024; // 8Kbits typedef boost::mpl::vector<boost_hash<DnaString, 0xAAAAAAAA> > HashFns; basic_bloom_filter<DnaString, NUM_BITS, HashFns> bloom; // Buffers for the sequence ids and characters. seqan::CharString id; seqan::String<Dna> kmer; seqan::String<Dna> seq; seqan::String<Dna> shortRead; unsigned long seqCount = 0; while (!atEnd(seqStream)) { if (0 == readRecord(id, shortRead, seqStream)) { seqCount++; seq = shortRead; bloom.insert(seq); } else { cout << '[' << seqCount+1 << "] ERROR: " << shortRead << endl; ----> infinite loop as _atEnd and _isGood both go to false and _atEnd never turns to true } } |