Re: [Seqan-dev] razers3 memory problem

Matthias Lienhard <lienhard@molgen.mpg.de> · Fri, 07 Jun 2013 15:36:31 +0200

Hi Enrico,

does Masai support multithreading (without splitting the input files) by 

now? 20 gb memory (or more) is no problem.

Best, Matthias

On 06/07/13 15:29, Siragusa, Enrico wrote:

Hi Matthias,

If you don't need strict 100% sensitivity, then you could try out Masai. There is an option to select how many reads the program should map at once, e.g. with "--mapping-block 1000000" the program would load only 1M reads at time. Btw Masai gets faster with more reads, so I advise you to map at least 10M reads.

Mapping 10M reads on human requires 19GB of memory. If you don't dispose of so much memory, then you could use the FM-index. You need to pass the option "--index fm" to indexer/mapper apps. In this way you should be able to run a mapping job within 9GB of memory. If 9GB of memory is still too much, you can disable multiple backtracking, but the mapper wouldn't take less than ~7GB of memory…

Enrico

On Jun 7, 2013, at 2:52 PM, Matthias Lienhard <lienhard@molgen.mpg.de> wrote:

Hi Dave,

I see. Is it planned, that the reads are processed in blocks internally? 10M reads is not really a realistic input size these days, and I imagine, a lot of potential users are scared off, when the program (or worse: the server) crashes.
Also, it is not really a practical solution to split the files, as handling the data is difficult enough.

Best, Matthias

On 06/07/13 09:16, Weese, David wrote:

Hi Matthias,

RazerS keeps a q-gram index of reads in memory. Hence its memory consumption is directly proportional to the input size. And it requires about 10GB for 10M x 100bp reads. Unfortunately, there is currently no other option than to split the input file into chunks and map then independently one-after-another or in-parallel on a cluster.

BAM outputs will certainly be supported in the near future and gzipped fastq input could be supported but requires to benchmark the alternative I/O module before.

Cheers,
Dave

--
David Weese                     weese@inf.fu-berlin.de
Freie Universität Berlin        http://www.inf.fu-berlin.de/
Institut für Informatik         Phone: +49 30 838 75137
Takustraße 9                    Algorithmic Bioinformatics
14195 Berlin                    Room 020

Am 05.06.2013 um 11:44 schrieb Matthias Lienhard <lienhard@molgen.mpg.de>:

Hi,
when runnig razers3 on my paired end HiSeq fastq files I get the following errors

razers3 -i 94 -rr 95 -tc 20 -o sample.sam reads1.fastq reads2.fastq
terminate called recursively
terminate called recursively
Aborted

or

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted

It seems as memory usage is very high (>50gb). Each of the fastq files is about 7 gb. When I take the first 100000 reads, the razers3 seems to work fine. However, I don't want to split the files in small chucks and merge them together afterwards (because of disk usage and convenience - I have about 50 samples to process)
Is there another way to handle this issue?

Also, it would be very convienient if gzipped fastq files could be used as input directly - and output in bam-format would be nice as well.

Best wishes, Matthias

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev