FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] SearchIndex based on external StringSet

<-- thread -->
<-- date -->
  • From: Tilo Eißler <eissler@in.tum.de>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Wed, 20 Oct 2010 14:17:42 +0200
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] SearchIndex based on external StringSet

Hello again,

> 
> 1) The type in the Pattern is the String you want to search. It can differ from the type of the string you want to build an index on. The specializations of the string classes are completely independent, the alphabets have to be compatible (for certain definitions of "compatible" ;).
> 
> So, first of all rather use:
> 
>     Pattern< String<char> > pat = "agg";

Ok, my first Pattern variant looks odd anyways ;-)

> 
> 2) To the Strings, StringSet: I think the specializations were not what you wanted.
> 
> External Strings are used like this:
> 
> String<TAlphabet, TSpec=Alloc<> >
> 
> I.e. String<char, External<> > for an external string and not String<***String<char>***, External<> > (the *** are there for marking the wrong part).
> 
> Typedefs often help to make the code more readable. The attached program is more readable and should probably do what you want. 

hmm, I don't remember how I came to my external string. Now I've got it,
thank you very much for the corrected version.

> 
>> Another (related) topic:
>>
>> To my knowledge the default index type is the enhanced suffix array.
>> I've read that the build process can use external memory. Is this done
>> by default or do I need to provide an extra specialisation to achieve this?
>> The resulting suffix array lies in main memory, right?
> 
> I think this depends on the algorithm you use for building the SA. If I remember correctly, at least the Skew algorithms have external but no internal implementation, however this should not matter greatly since for small indices, the kernel will not write buffers to the disk anyway.

Right, for small inidces it doesn't matter, but it may be of interest
for large sequences/sets of sequences or on computers with small amounts
of main memory. So it is of interest if building an application capable
of handling different amounts of input data.

> 
> I guess it is not documented yet since I could not find it in the documentation. index_base.h has the following:
> 
> 	// suffix array construction specs
> 	struct Skew3;
> 	struct Skew7;
> 	struct LarssonSadakane;
> 	struct ManberMyers;
> 	struct SAQSort;
> 	struct QGram_Alg;
> 
> 	// lcp table construction algorithms
> 	struct Kasai;
> 	struct KasaiOriginal;	// original, but more space-consuming algorithm
> 
> 	// enhanced suffix array construction algorithms
> 	struct ChildTab;
> 	struct BWT;
> 
> David would be the right one to explain this in more detail and eventually document this.
> 

I came to my question because I've read in the dissertation about seqan
that there are ESA-construction algorithms that use external storage but
I haven't been able to find some hints in the documentation if it's done
per default or not.
I'm toying around a little bit more, but I appreciate any further hints
as well :-)

Thanks again and best regards

Tilo



<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] SearchIndex based on external StringSet
      • From: "Holtgrewe, Manuel" <manuel.holtgrewe@fu-berlin.de>
  • References:
    • [Seqan-dev] SearchIndex based on external StringSet
      • From: Tilo Eißler <eissler@in.tum.de>
    • Re: [Seqan-dev] SearchIndex based on external StringSet
      • From: "Holtgrewe, Manuel" <manuel.holtgrewe@fu-berlin.de>
  • seqan-dev - October 2010 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal