FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] Disk-based index

<-- thread -->
<-- date -->
  • From: "Siragusa, Enrico" <Enrico.Siragusa@fu-berlin.de>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Wed, 28 Aug 2013 11:46:28 +0200
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] Disk-based index

Hi John,

On Aug 28, 2013, at 10:54 AM, John Reid <j.reid@mail.cryst.bbk.ac.uk>
 wrote:

Hi all,

I would like to index the mouse or human genome with an ESA. I need to do this more than once though and would like to store the ESA on disk as it takes some hours to construct. Is this feasible? Is there any way to do this in SeqAn already?

Sure. To save an index after constructing it, you can call save(index, "/path/to/index"). To load it, call open(index, "/path/to/index"). The path must be given as a C style string, so if you're using a SeqAn String, please use toCString() to convert it.

Also parallel construction is interesting to me. To quote wikipedia (http://en.wikipedia.org/wiki/Suffix_tree#External_construction):
ERA is a recent parallel suffix tree construction method that is significantly faster. ERA can index the entire human genome in 19 minutes on an 8-core desktop computer with 16GB RAM. On a simple Linux cluster with 16 nodes (4GB RAM per node), ERA can index the entire human genome in less than 9 minutes
Are there any plans to incorporate the ERA algorithm (http://www.vldb.org/pvldb/vol5/p049_essammansour_vldb2012.pdf) into SeqAn?

We had some thoughts on parallel suffix array construction as well as parallel direct bwt construction. However, ERA is a suffix tree construction algorithm, so it wouldn't be useful to construct enhanced suffix arrays directly.

Enrico

Thanks,
John.
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

<-- thread -->
<-- date -->
  • References:
    • [Seqan-dev] Disk-based index
      • From: John Reid <j.reid@mail.cryst.bbk.ac.uk>
  • seqan-dev - August 2013 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal