Re: [Seqan-dev] Fastq test files



On 3/2/2013 6:00 AM, seqan-dev-request@lists.fu-berlin.de wrote:

What are you reading the sequences into? DnaString? CharString? Can you give more details here?

Your snippet parses nicely with SequenceStream.

Currently, there is a limitation that when reading sequence into Dna5String then any non-CGATN character causes an error. We will resolve this issue with a configuration object to the readRecord function in the future that allows to switch between error/coerce-to-N for other characters (e.g. when there are IUPAC characters indicating an A-C ambiguity).

*m


For this snippet, the SequenceStream gets in an infinite loop with _atEnd and _isGood both go false, and subsequent calls to readRecord to not turn _atEnd to true:

    // Create SequenceStream object for reading, optimized for reading single records.
    seqan::SequenceStream seqStream(argv[1]);
    if (!isGood(seqStream))
    {
        std::cerr << "ERROR: Could not open the file.\n";
        return 1;
    }
    static const size_t NUM_BITS = 8*1024*1024; // 8Kbits

    typedef boost::mpl::vector<boost_hash<DnaString, 0xAAAAAAAA> > HashFns;
    basic_bloom_filter<DnaString, NUM_BITS, HashFns> bloom;

    // Buffers for the sequence ids and characters.
    seqan::CharString id;
    seqan::String<Dna> kmer;
    seqan::String<Dna> seq;
    seqan::String<Dna> shortRead;
    
    unsigned long seqCount = 0;
    while (!atEnd(seqStream))
    {

        if (0 == readRecord(id, shortRead, seqStream)) {
            seqCount++;
            seq = shortRead;
            bloom.insert(seq);
        }
        else {
            cout << '[' << seqCount+1 << "] ERROR:  " << shortRead << endl;
----> infinite loop as _atEnd and _isGood both go to false and _atEnd never turns to true
        }
    }