Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA
- From: "Siragusa, Enrico" <Enrico.Siragusa@fu-berlin.de>
- To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
- Date: Thu, 5 Jul 2012 08:27:29 +0000
- Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
- Subject: Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA
Hi John,
On Jul 5, 2012, at 9:46 AM, John Reid wrote:
If you build the Esa as it is, it will consume 4 long ints per char on a 64bit machine and take ~96Gb of memory for a 3Gb genome.
But you can redefine index fibres for your needs, i.e. you can replace long ints with ints or chars.
// TGenome is the type of sequence, e.g. StringSet<Dna5String>
typedef StringSet<Dna5String>
TGenome;
typedef Index<TGenome, IndexEsa<> > TGenomeEsa;
namespace seqan
{
template <>
struct Fibre<TGenomeEsa,
FibreSA>
{
// Works for up to 256 contigs of length 4Gbp
typedef String< Pair<unsigned
char, unsigned
int, Compressed>,
DefaultIndexStringSpec<TGenomeEsa>::Type > Type;
// Use a mmapped string
// typedef String< Pair<unsigned char, unsigned int, Compressed>, MMap<> > Type;
};
template <>
struct Fibre<TGenomeEsa,
FibreLcp>
{
typedef String<unsigned
int, DefaultIndexStringSpec<TGenomeEsa>::Type > Type;
};
template <>
struct Fibre<TGenomeEsa,
FibreChildtab>
{
typedef String<unsigned
int, DefaultIndexStringSpec<TGenomeEsa>::Type > Type;
};
}
In this way your Esa will fit in ~38Gb of memory.
You might want to try out mmapped strings depending on your memory requirements and the access pattern of your algorithm.
You can also try to redefine size and limits metafunctions for you sequence types.
namespace seqan
{
template <>
struct Size<Dna5String>
{
typedef
unsigned int Type;
};
template <>
struct StringSetLimits<TGenome>
{
typedef String<unsigned
char> Type;
};
}
Please overload metafunctions only in your applications, not in library modules!
Ciao,
Enrico
|
- Follow-Ups:
- Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA
- From: John Reid <j.reid@mail.cryst.bbk.ac.uk>
- Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA
- References:
- Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA
- From: John Reid <j.reid@mail.cryst.bbk.ac.uk>
- Re: [Seqan-dev] {Disarmed} Re: Performance advice for whole genome ESA
-
seqan-dev - July 2012 - Archives indexes sorted by:
[ thread ] [ subject ] [ author ] [ date ] - Complete archive of the seqan-dev mailing list
- More info on this list...