FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] Fastq test files

<-- thread -->
<-- date -->
  • From: Theodore Omtzigt <theo@stillwater-sc.com>
  • To: seqan-dev@lists.fu-berlin.de
  • Date: Mon, 04 Mar 2013 10:28:06 -0500
  • Cc: seqan-dev-request@lists.fu-berlin.de
  • Organization: Stillwater Supercomputing, Inc.
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] Fastq test files


On 3/2/2013 6:00 AM, seqan-dev-request@lists.fu-berlin.de wrote:

What are you reading the sequences into? DnaString? CharString? Can you give more details here?

Your snippet parses nicely with SequenceStream.

Currently, there is a limitation that when reading sequence into Dna5String then any non-CGATN character causes an error. We will resolve this issue with a configuration object to the readRecord function in the future that allows to switch between error/coerce-to-N for other characters (e.g. when there are IUPAC characters indicating an A-C ambiguity).

*m


For this snippet, the SequenceStream gets in an infinite loop with _atEnd and _isGood both go false, and subsequent calls to readRecord to not turn _atEnd to true:

    // Create SequenceStream object for reading, optimized for reading single records.
    seqan::SequenceStream seqStream(argv[1]);
    if (!isGood(seqStream))
    {
        std::cerr << "ERROR: Could not open the file.\n";
        return 1;
    }
    static const size_t NUM_BITS = 8*1024*1024; // 8Kbits

    typedef boost::mpl::vector<boost_hash<DnaString, 0xAAAAAAAA> > HashFns;
    basic_bloom_filter<DnaString, NUM_BITS, HashFns> bloom;

    // Buffers for the sequence ids and characters.
    seqan::CharString id;
    seqan::String<Dna> kmer;
    seqan::String<Dna> seq;
    seqan::String<Dna> shortRead;
    
    unsigned long seqCount = 0;
    while (!atEnd(seqStream))
    {

        if (0 == readRecord(id, shortRead, seqStream)) {
            seqCount++;
            seq = shortRead;
            bloom.insert(seq);
        }
        else {
            cout << '[' << seqCount+1 << "] ERROR:  " << shortRead << endl;
----> infinite loop as _atEnd and _isGood both go to false and _atEnd never turns to true
        }
    }


<-- thread -->
<-- date -->
  • seqan-dev - March 2013 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal