Am 20.10.2010 um 11:58 schrieb Tilo Eißler: > Hello, > > as the subject denotes I'm currently trying to build a SearchIndex based > on an external StringSet using the actual trunk version of seqan. > > I started with the "Index Finder StringSet" example program. Then I've > altered the String type to the external specialisation which results in > the following term for the StringSet: > > StringSet < String<String<char>, External<> > > mySet; > > Afterwards I've resized the set and appended values to each entry of the > set using the append function. > > Building the index and a finder at least compiles as well, but trying to > search the finder using a pattern does not work, or more precisely, the > compilation fails. I've attached my sourcecode. > I'm getting confused with the compiler error message, so I'm asking if > someone is kind and takes a look at it to help me :-) > Is there a major flaw in my thought process? Or is it possible to do > what I'm trying? 1) The type in the Pattern is the String you want to search. It can differ from the type of the string you want to build an index on. The specializations of the string classes are completely independent, the alphabets have to be compatible (for certain definitions of "compatible" ;). So, first of all rather use: Pattern< String<char> > pat = "agg"; 2) To the Strings, StringSet: I think the specializations were not what you wanted. External Strings are used like this: String<TAlphabet, TSpec=Alloc<> > I.e. String<char, External<> > for an external string and not String<***String<char>***, External<> > (the *** are there for marking the wrong part). Typedefs often help to make the code more readable. The attached program is more readable and should probably do what you want. > Another (related) topic: > > To my knowledge the default index type is the enhanced suffix array. > I've read that the build process can use external memory. Is this done > by default or do I need to provide an extra specialisation to achieve this? > The resulting suffix array lies in main memory, right? I think this depends on the algorithm you use for building the SA. If I remember correctly, at least the Skew algorithms have external but no internal implementation, however this should not matter greatly since for small indices, the kernel will not write buffers to the disk anyway. I guess it is not documented yet since I could not find it in the documentation. index_base.h has the following: // suffix array construction specs struct Skew3; struct Skew7; struct LarssonSadakane; struct ManberMyers; struct SAQSort; struct QGram_Alg; // lcp table construction algorithms struct Kasai; struct KasaiOriginal; // original, but more space-consuming algorithm // enhanced suffix array construction algorithms struct ChildTab; struct BWT; David would be the right one to explain this in more detail and eventually document this. *m
Attachment:
seqan_ext_stringset_esa.cpp
Description: seqan_ext_stringset_esa.cpp