Re: [Seqan-dev] razers3 memory problem

"Siragusa, Enrico" <Enrico.Siragusa@fu-berlin.de> · Fri, 07 Jun 2013 15:56:07 +0200

Oh, I just noticed that you are mapping paired-end data! Then I wouldn't suggest you Masai because it wouldn't be as fast as on single-end data and, worse than that, you could potentially get wrong results as it hasn't been used that much on paired-end data...

On Jun 7, 2013, at 3:36 PM, Matthias Lienhard <lienhard@molgen.mpg.de>
 wrote:

> Hi Enrico,
> 
> does Masai support multithreading (without splitting the input files) by now? 20 gb memory (or more) is no problem.
> 
> Best, Matthias
> 
> On 06/07/13 15:29, Siragusa, Enrico wrote:
>> Hi Matthias,
>> 
>> If you don't need strict 100% sensitivity, then you could try out Masai. There is an option to select how many reads the program should map at once, e.g. with "--mapping-block 1000000" the program would load only 1M reads at time. Btw Masai gets faster with more reads, so I advise you to map at least 10M reads.
>> 
>> Mapping 10M reads on human requires 19GB of memory. If you don't dispose of so much memory, then you could use the FM-index. You need to pass the option "--index fm" to indexer/mapper apps. In this way you should be able to run a mapping job within 9GB of memory. If 9GB of memory is still too much, you can disable multiple backtracking, but the mapper wouldn't take less than ~7GB of memory…
>> 
>> Enrico
>> 
>> On Jun 7, 2013, at 2:52 PM, Matthias Lienhard <lienhard@molgen.mpg.de> wrote:
>> 
>>> Hi Dave,
>>> 
>>> I see. Is it planned, that the reads are processed in blocks internally? 10M reads is not really a realistic input size these days, and I imagine, a lot of potential users are scared off, when the program (or worse: the server) crashes.
>>> Also, it is not really a practical solution to split the files, as handling the data is difficult enough.
>>> 
>>> Best, Matthias
>>> 
>>> On 06/07/13 09:16, Weese, David wrote:
>>>> Hi Matthias,
>>>> 
>>>> RazerS keeps a q-gram index of reads in memory. Hence its memory consumption is directly proportional to the input size. And it requires about 10GB for 10M x 100bp reads. Unfortunately, there is currently no other option than to split the input file into chunks and map then independently one-after-another or in-parallel on a cluster.
>>>> 
>>>> BAM outputs will certainly be supported in the near future and gzipped fastq input could be supported but requires to benchmark the alternative I/O module before.
>>>> 
>>>> Cheers,
>>>> Dave
>>>> 
>>>> --
>>>> David Weese                     weese@inf.fu-berlin.de
>>>> Freie Universität Berlin        http://www.inf.fu-berlin.de/
>>>> Institut für Informatik         Phone: +49 30 838 75137
>>>> Takustraße 9                    Algorithmic Bioinformatics
>>>> 14195 Berlin                    Room 020
>>>> 
>>>> Am 05.06.2013 um 11:44 schrieb Matthias Lienhard <lienhard@molgen.mpg.de>:
>>>> 
>>>>> Hi,
>>>>> when runnig razers3 on my paired end HiSeq fastq files I get the following errors
>>>>> 
>>>>> 
>>>>> razers3 -i 94 -rr 95 -tc 20 -o sample.sam reads1.fastq reads2.fastq
>>>>> terminate called recursively
>>>>> terminate called recursively
>>>>> Aborted
>>>>> 
>>>>> or
>>>>> 
>>>>> terminate called after throwing an instance of 'std::bad_alloc'
>>>>>  what():  std::bad_alloc
>>>>> Aborted
>>>>> 
>>>>> It seems as memory usage is very high (>50gb). Each of the fastq files is about 7 gb. When I take the first 100000 reads, the razers3 seems to work fine. However, I don't want to split the files in small chucks and merge them together afterwards (because of disk usage and convenience - I have about 50 samples to process)
>>>>> Is there another way to handle this issue?
>>>>> 
>>>>> Also, it would be very convienient if gzipped fastq files could be used as input directly - and output in bam-format would be nice as well.
>>>>> 
>>>>> Best wishes, Matthias
>>>>> 
>>>>> _______________________________________________
>>>>> seqan-dev mailing list
>>>>> seqan-dev@lists.fu-berlin.de
>>>>> https://lists.fu-berlin.de/listinfo/seqan-dev
>>>> _______________________________________________
>>>> seqan-dev mailing list
>>>> seqan-dev@lists.fu-berlin.de
>>>> https://lists.fu-berlin.de/listinfo/seqan-dev
>>> _______________________________________________
>>> seqan-dev mailing list
>>> seqan-dev@lists.fu-berlin.de
>>> https://lists.fu-berlin.de/listinfo/seqan-dev
>> _______________________________________________
>> seqan-dev mailing list
>> seqan-dev@lists.fu-berlin.de
>> https://lists.fu-berlin.de/listinfo/seqan-dev
> 
> _______________________________________________
> seqan-dev mailing list
> seqan-dev@lists.fu-berlin.de
> https://lists.fu-berlin.de/listinfo/seqan-dev