FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] razers3 memory problem

<-- thread -->
<-- date -->
  • From: Matthias Lienhard <lienhard@molgen.mpg.de>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Fri, 07 Jun 2013 14:52:00 +0200
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] razers3 memory problem

Hi Dave,

I see. Is it planned, that the reads are processed in blocks internally? 10M reads is not really a realistic input size these days, and I imagine, a lot of potential users are scared off, when the program (or worse: the server) crashes. Also, it is not really a practical solution to split the files, as handling the data is difficult enough.

Best, Matthias

On 06/07/13 09:16, Weese, David wrote:
Hi Matthias,

RazerS keeps a q-gram index of reads in memory. Hence its memory consumption is directly proportional to the input size. And it requires about 10GB for 10M x 100bp reads. Unfortunately, there is currently no other option than to split the input file into chunks and map then independently one-after-another or in-parallel on a cluster.

BAM outputs will certainly be supported in the near future and gzipped fastq input could be supported but requires to benchmark the alternative I/O module before.

Cheers,
Dave

--
David Weese                     weese@inf.fu-berlin.de
Freie Universität Berlin        http://www.inf.fu-berlin.de/
Institut für Informatik         Phone: +49 30 838 75137
Takustraße 9                    Algorithmic Bioinformatics
14195 Berlin                    Room 020

Am 05.06.2013 um 11:44 schrieb Matthias Lienhard <lienhard@molgen.mpg.de>:

Hi,
when runnig razers3 on my paired end HiSeq fastq files I get the following errors


razers3 -i 94 -rr 95 -tc 20 -o sample.sam reads1.fastq reads2.fastq
terminate called recursively
terminate called recursively
Aborted

or

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted

It seems as memory usage is very high (>50gb). Each of the fastq files is about 7 gb. When I take the first 100000 reads, the razers3 seems to work fine. However, I don't want to split the files in small chucks and merge them together afterwards (because of disk usage and convenience - I have about 50 samples to process)
Is there another way to handle this issue?

Also, it would be very convienient if gzipped fastq files could be used as input directly - and output in bam-format would be nice as well.

Best wishes, Matthias

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev



<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] razers3 memory problem
      • From: "Siragusa, Enrico" <Enrico.Siragusa@fu-berlin.de>
  • References:
    • [Seqan-dev] razers3 memory problem
      • From: Matthias Lienhard <lienhard@molgen.mpg.de>
    • Re: [Seqan-dev] razers3 memory problem
      • From: "Weese, David" <weese@campus.fu-berlin.de>
  • seqan-dev - June 2013 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal