FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA

<-- thread -->
<-- date -->
  • From: John Reid <j.reid@mail.cryst.bbk.ac.uk>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Fri, 06 Jul 2012 13:55:58 +0100
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA


On 05/07/12 11:40, Siragusa, Enrico wrote:

On Jul 5, 2012, at 11:02 AM, John Reid wrote:

Great. That looks very helpful. So in your example, how do you arrive at 38Gb? You are using unsigned int instead of long unsigned int. Where does the unsigned char in the Fibre<>::Type come into the calculation? I think I need my code to handle sequence sets with more than 256 sequences. I'm guessing if I replace the unsigned char  with unsigned long I get back to 48Gb?

I counted 15n bytes: 1+4 bytes for suffix array values, 4 bytes for lcp values and 4 bytes for childtab values. Then for n equals 3Gbp you get roughly 38Gb.
Value sizes really depend on your input sequences. How many strings do you want to index and which is their maximum length?
Do you dispose of enough memory? Depending on your application another index could be more efficient...

I have one more question if you don't mind. What are the constraints on the string sets I can pass to an index for which I have overloaded these types?

For example, I think the types in the FibreSA String of Pairs limit the number of sequences and then the number of items in each sequence. But what about the types for the FibreLcp and the FibreChildtab? Do these need to be the same as the second type in the FibreSA Pair or do they relate to something different? Sorry my knowledge of the internals of suffix arrays is not up to speed.

Thanks,
John.
<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA
      • From: "Siragusa, Enrico" <Enrico.Siragusa@fu-berlin.de>
  • References:
    • Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA
      • From: John Reid <j.reid@mail.cryst.bbk.ac.uk>
    • Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA
      • From: "Siragusa, Enrico" <Enrico.Siragusa@fu-berlin.de>
    • Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA
      • From: John Reid <j.reid@mail.cryst.bbk.ac.uk>
    • Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA
      • From: "Siragusa, Enrico" <Enrico.Siragusa@fu-berlin.de>
  • seqan-dev - July 2012 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal