FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] Best index for task

<-- thread -->
<-- date -->
  • From: Marcel Schulz <maschulz@andrew.cmu.edu>
  • To: seqan-dev@lists.fu-berlin.de
  • Date: Tue, 22 Mar 2011 12:23:08 -0400
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] Best index for task

Hi John,
you should definitely try the Wotd-algorithm if you do a pruned search. We have had huge improvement over the ESA for different problems in terms of memory and running time using the Wotd, especially if your working on a range of suffix lengths where it is more technical to employ Q-gram indices. Also, the pruning with the Wotd-Algorithm is very effective if you work on large alphabets, like proteins.

Bests,
Marcel

Am 22.03.11 06:50, schrieb John Reid:
Hi,

I have a motif search algorithm I have coded using a enhanced suffix array. I'm wondering if its worth investigating other indexes to see if they are more efficient. The algorithm builds an index over a sets of sequences, say 5Mb average total size. My algorithm descends the index to a given maximum depth (say 20 bases) many times but never goes deeper. It doesn't descend all paths, it does some pruning on the way down. Up until now I have been using the IndexEsa. I notice I could also use the IndexWotd, the IndexQGram or perhaps something from Pizza&Chili. Has anyone got any recommendations about what might be quickest for this sort of task? I realise I haven't given you too much to go on but perhaps it is enough without describing the algorithm in full. My code compiles with either the IndexWotd or the IndexEsa but with IndexQGram I get compilation errors. Should these indexes have the same programming interface?

Thanks for a great library,
John.


_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev


--
------------------------------------------------------------------------------
Marcel H. Schulz
Ray and Stephanie Lane Center      email: maschulz@cs.cmu.edu
for Computational Biology  		
Carnegie Mellon University	
7413 Gates-Hillman Complex
5000 Forbes Avenue
Pittsburgh, PA 15213
http://www.cs.cmu.edu/~maschulz/
------------------------------------------------------------------------------




<-- thread -->
<-- date -->
  • References:
    • [Seqan-dev] Best index for task
      • From: John Reid <j.reid@mail.cryst.bbk.ac.uk>
  • seqan-dev - March 2011 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal