[Seqan-dev] Building QGramSA fibre from StringSet
Hi
I'm running into some unexpected behaviour whilst try to use an SA fibre of a QGram index that's built over a string set.
The below code runs OK with 1 or 10 as command line args, (strings in string set), but fails at 100 with the following exception:
/Users/dj5/usr/local/include/seqan/sequence/string_base.h:238 Assertion failed : static_cast<TStringPos>(pos) < static_cast<TStringPos>(length(me)) was: 171599218 >= 100 (Trying to access an element behind the last one!)
Have I made a coding blunder or is this a bug?
Many thanks,
Daniel
<code>
#include <ctime>
#include <cstdlib>
#include <iostream>
#include <sstream>
#include <string>
#include <algorithm>
#include <seqan.h>
using namespace std;
using namespace seqan;
// Generates random nucleotides.
struct MyGenerator : public unary_function<char, void>
{
string syms;
MyGenerator (string syms = "ACGT") : syms(syms) { srand(time(NULL)); }
char operator()(void) { return syms[rand() % syms.size()]; }
};
int main(int argc, char** argv)
{
// Input the number of sequences for the string set.
stringstream ss(argv[1]);
unsigned n;
ss >> n;
typedef StringSet<DnaString> TMyStringSet;
typedef Index<TMyStringSet, IndexQGram<UngappedShape<8> > > TMyIndex;
typedef Fibre<TMyIndex, QGramSA>::Type TMySAFibre;
typedef Fibre<TMyIndex, QGramDir>::Type TMyDirFibre;
// Fill a string set with 60-mer DNA sequences.
StringSet<DnaString> myStringSet;
string input_s;
DnaString input;
for (unsigned i = 0; i < n; ++i)
{
input_s.resize(60);
generate_n(input_s.begin(), 60, MyGenerator());
input = input_s;
appendValue(myStringSet, input);
}
// Build the index.
TMyIndex index(myStringSet);
// Require the QGramSA fibre.
cout << "requiring SA fibre:\n";
float t0 = clock();
indexRequire(index, QGramSA());
cout << (clock() - t0)/CLOCKS_PER_SEC << endl;
cout << "requiring Dir fibre:\n";
t0 = clock();
indexRequire(index, QGramDir());
cout << (clock() - t0)/CLOCKS_PER_SEC << endl;
TMySAFibre mySAFibre = getFibre(index, QGramSA());
cout << "QGramSA length: " << length(mySAFibre) << endl;
TMyDirFibre myDirFibre = getFibre(index, QGramDir());
cout << "QGramDir length: " << length(myDirFibre) << endl;
return 0;
}
</code>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.