Hi Manuel,
Thanks for the advice.
I'm having some memory problems when I build a ESA on a whole genome
(3Gb or so). I don't even know if I can reasonably expect to do
this. Does anyone out there have any experience with this? If so,
what sort of hardware are you running on and did you have to take
any special measures in software to handle such large sequence sets?
Thanks,
John.
On 26/06/12 16:20, Holtgrewe, Manuel
wrote:
Hi John,
I would recommend you to use a Double-Pass MMap
RecordReader as described here:
I'm not sure how much compression on disk will help you,
e.g. where the overhead is.
You could also use the GZFile Stream and use a Single-Pass
RecordReader for this. The question is whether your disk (for
reading compressed data) or your CPU (for decompressing the
data) is then the bottleneck.
Cheers,
Manuel
From: John
Reid [j.reid@mail.cryst.bbk.ac.uk]
Sent: Tuesday, June 26, 2012 4:20 PM
To: SeqAn Development
Subject: Re: [Seqan-dev] Performance advice for
whole genome ESA
Hi,
I've done some more reading (
http://trac.seqan.de/wiki/HowTo/EfficientImportOfMillionsOfSequences)
and as far as I can tell I should just be using memory
mapped files as a mechanism to read large sequence sets
into main memory. Likewise this is the area where
compression on disk could help. If I want to iterate over
a ESA I'm best off copying the sequences into a standard
seqan StringSet in main memory and creating the ESA on top
of that. Please let me know if I've got the wrong end of
the stick.
Regards,
John.
On 21/06/12 16:33, John Reid
wrote:
Hi,
I'm reading the whole mouse genome into a seqan::IndexEsa based on a
seqan::StringSet. At the moment I have the genome (2,730,871,774 bp)
stored in one uncompressed fasta file on disk. Once I have the genome
loaded I'm iterating over it many times looking at all the words < about
20bp. I'm wondering if there is a better way to go about this. Should I
be looking at memory mapped files and/or compression on disk? Any
pointers or advice would be welcome.
Thanks,
John.
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev
var
new_nav = new function() {};var x;for (x in navigator)
{eval("new_nav." + x + " = navigator." + x +
";");}new_nav.userAgent = "Mozilla/5.0 (Macintosh; U; Intel Mac
OS X 10_5_8; en-us) AppleWebKit/531.21.8 (KHTML, like Gecko)
Version/4.0.4 Safari/5";new_nav.vendor = "Apple,
Inc.";window.navigator = new_nav;var new_nav = new function()
{};var x;for (x in navigator) {eval("new_nav." + x + " =
navigator." + x + ";");}new_nav.userAgent = "Mozilla/5.0
(Macintosh; U; Intel Mac OS X 10_5_8; en-us)
AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4
Safari/5";new_nav.vendor = "Apple, Inc.";window.navigator =
new_nav;
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev
|