Hi Enrico,does Masai support multithreading (without splitting the input files) by now? 20 gb memory (or more) is no problem.
Best, Matthias On 06/07/13 15:29, Siragusa, Enrico wrote:
Hi Matthias, If you don't need strict 100% sensitivity, then you could try out Masai. There is an option to select how many reads the program should map at once, e.g. with "--mapping-block 1000000" the program would load only 1M reads at time. Btw Masai gets faster with more reads, so I advise you to map at least 10M reads. Mapping 10M reads on human requires 19GB of memory. If you don't dispose of so much memory, then you could use the FM-index. You need to pass the option "--index fm" to indexer/mapper apps. In this way you should be able to run a mapping job within 9GB of memory. If 9GB of memory is still too much, you can disable multiple backtracking, but the mapper wouldn't take less than ~7GB of memory… Enrico On Jun 7, 2013, at 2:52 PM, Matthias Lienhard <lienhard@molgen.mpg.de> wrote:Hi Dave, I see. Is it planned, that the reads are processed in blocks internally? 10M reads is not really a realistic input size these days, and I imagine, a lot of potential users are scared off, when the program (or worse: the server) crashes. Also, it is not really a practical solution to split the files, as handling the data is difficult enough. Best, Matthias On 06/07/13 09:16, Weese, David wrote:Hi Matthias, RazerS keeps a q-gram index of reads in memory. Hence its memory consumption is directly proportional to the input size. And it requires about 10GB for 10M x 100bp reads. Unfortunately, there is currently no other option than to split the input file into chunks and map then independently one-after-another or in-parallel on a cluster. BAM outputs will certainly be supported in the near future and gzipped fastq input could be supported but requires to benchmark the alternative I/O module before. Cheers, Dave -- David Weese weese@inf.fu-berlin.de Freie Universität Berlin http://www.inf.fu-berlin.de/ Institut für Informatik Phone: +49 30 838 75137 Takustraße 9 Algorithmic Bioinformatics 14195 Berlin Room 020 Am 05.06.2013 um 11:44 schrieb Matthias Lienhard <lienhard@molgen.mpg.de>:Hi, when runnig razers3 on my paired end HiSeq fastq files I get the following errors razers3 -i 94 -rr 95 -tc 20 -o sample.sam reads1.fastq reads2.fastq terminate called recursively terminate called recursively Aborted or terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted It seems as memory usage is very high (>50gb). Each of the fastq files is about 7 gb. When I take the first 100000 reads, the razers3 seems to work fine. However, I don't want to split the files in small chucks and merge them together afterwards (because of disk usage and convenience - I have about 50 samples to process) Is there another way to handle this issue? Also, it would be very convienient if gzipped fastq files could be used as input directly - and output in bam-format would be nice as well. Best wishes, Matthias _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de https://lists.fu-berlin.de/listinfo/seqan-dev_______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de https://lists.fu-berlin.de/listinfo/seqan-dev_______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de https://lists.fu-berlin.de/listinfo/seqan-dev_______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de https://lists.fu-berlin.de/listinfo/seqan-dev