[Seqan-dev] Fwd: Building QGramSA fibre from StringSet
Hi
I'm getting an error whilst trying to get the SA fibre of a QGram
index over a StringSet
There's a minimal example below. Is there any chance someone could
have a look at this?
I'm on Rev: 9996 of trunk.
Daniel
#include <algorithm>
#include <seqan.h>
using namespace seqan;
// Generates random nucleotides.
struct MyGenerator : std::unary_function<char, void>
{
    std::string syms;
    MyGenerator (std::string syms = "ACGT") : syms(syms) { srand(time(NULL)); }
   char operator()(void) { return syms[rand() % syms.size()]; }
};
int main(int argc, char** argv)
{
   typedef StringSet<DnaString>                                  TMyStringSet;
   typedef Index<TMyStringSet, IndexQGram<UngappedShape<8> > >   TMyIndex;
   StringSet<DnaString> myStringSet;
   for (unsigned i = 0; i < 100; ++i) {
       DnaString input;
       resize(input, 60);
       generate_n(begin(input), 60, MyGenerator());
       appendValue(myStringSet, input);
   }
   std::cout << myStringSet[0] << std::endl;
   std::cout << "requiring QGramSA..." << std::endl;
   TMyIndex index(myStringSet);
   indexRequire(index, QGramSA());
   return 0;
}
---------- Forwarded message ----------
From: Daniel James <danielpeterjames@gmail.com>
Date: 5 July 2011 15:18
Subject: Building QGramSA fibre from StringSet
To: seqan-dev@lists.fu-berlin.de
Hi
I'm running into some unexpected behaviour whilst try to use an SA
fibre of a QGram index that's built over a string set.
The below code runs OK with 1 or 10 as command line args, (strings in
string set), but fails at 100 with the following exception:
/Users/dj5/usr/local/include/seqan/sequence/string_base.h:238
Assertion failed : static_cast<TStringPos>(pos) <
static_cast<TStringPos>(length(me)) was: 171599218 >= 100 (Trying to
access an element behind the last one!)
Have I made a coding blunder or is this a bug?
Many thanks,
Daniel
<code>
#include <ctime>
#include <cstdlib>
#include <iostream>
#include <sstream>
#include <string>
#include <algorithm>
#include <seqan.h>
using namespace std;
using namespace seqan;
// Generates random nucleotides.
struct MyGenerator : public unary_function<char, void>
{
  string syms;
  MyGenerator (string syms = "ACGT") : syms(syms) { srand(time(NULL)); }
  char operator()(void) { return syms[rand() % syms.size()]; }
};
int main(int argc, char** argv)
{
  // Input the number of sequences for the string set.
  stringstream ss(argv[1]);
  unsigned n;
  ss >> n;
  typedef StringSet<DnaString>                                  TMyStringSet;
  typedef Index<TMyStringSet, IndexQGram<UngappedShape<8> > >   TMyIndex;
  typedef Fibre<TMyIndex, QGramSA>::Type                        TMySAFibre;
  typedef Fibre<TMyIndex, QGramDir>::Type                       TMyDirFibre;
  // Fill a string set with 60-mer DNA sequences.
  StringSet<DnaString> myStringSet;
  string input_s;
  DnaString input;
  for (unsigned i = 0; i < n; ++i)
  {
      input_s.resize(60);
      generate_n(input_s.begin(), 60, MyGenerator());
      input = input_s;
      appendValue(myStringSet, input);
  }
  // Build the index.
  TMyIndex index(myStringSet);
  // Require the QGramSA fibre.
  cout << "requiring SA fibre:\n";
  float t0 = clock();
  indexRequire(index, QGramSA());
  cout << (clock() - t0)/CLOCKS_PER_SEC << endl;
  cout << "requiring Dir fibre:\n";
  t0 = clock();
  indexRequire(index, QGramDir());
  cout << (clock() - t0)/CLOCKS_PER_SEC << endl;
  TMySAFibre mySAFibre = getFibre(index, QGramSA());
  cout << "QGramSA length: " << length(mySAFibre) << endl;
  TMyDirFibre myDirFibre = getFibre(index, QGramDir());
  cout << "QGramDir length: " << length(myDirFibre) << endl;
  return 0;
}
</code>