Re: [Seqan-dev] RazerS3: extreme malloc

<-- date

From: "Weese, David" <weese@campus.fu-berlin.de>
To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
Date: Thu, 30 Aug 2012 22:45:09 +0000
Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
Subject: Re: [Seqan-dev] RazerS3: extreme malloc

Hi Uwe,

mmh, hard to say what the reason is. I would like to look at your datasets. So, if you could mail me the third dataset I will have a look at this time sink.

Cheers,

David
--
David Weese weese@inf.fu-berlin.de
Freie Universität Berlin http://www.inf.fu-berlin.de/
Institut für Informatik Phone: +49 30 838 75137
Takustraße 9 Algorithmic Bioinformatics
14195 Berlin Room 020

Am 30.08.2012 um 13:40 schrieb Uwe Appelt <uappelt@clcbio.com>:

Hi @All,

I am currently observing some maybe undesired behavior using razers3, trunk version (checked out 29/08/2012, 9am, Berlin time).

Within a Rabema scenario the following is observed (impatient readers go straight to (3)):

1) 100,000 reads from Illumina GAII, single-end, 100Nt
../bin/rabema/razers3 -v -tc 32 -of sam -rr 100 -i 90 -m 1000000 -ds -o ./rabema/NA18507_1_1e5complex.sam ../ref_genomes/hg19.fa ../reads/NA18507_1_1e5complex.fastq
=> ran entirely and finished successfully.

2) 100,000 454-Titanium reads, ranging from 50Nt up to 3575
../bin/rabema/razers3 -v -tc 32 -of sam -rr 100 -i 90 -m 1000000 -ds -o ./rabema/NA12878_1e5complex.sam ../ref_genomes/hg19.fa ../reads/NA12878_1e5complex.fastq
=> didn't yet finish, but steady progress is observed, while mem-consumption seems reasonable (below 20Gb).

3) 100,000 IonTorrent reads, ranging from 5Nt up to 387(less than 100 reads are actually shorter than 50 nukes)
../bin/rabema/razers3 -v -tc 32 -of sam -rr 100 -i 90 -m 1000000 -ds -o ./rabema/NS12911_1e5complex.sam ../ref_genomes/hg19.fa ../reads/NS12911_1e5complex.fastq
=> this didn't finish and no progress is observed any more.

### the according STDERR-output is:
___SETTINGS____________
Genome file:                         ../ref_genomes/hg19.fa
Read file: ../reads/NS12911_1e5complex.fastq
Compute forward matches:             YES
Compute reverse matches:             YES
Allow Indels:                        YES
Error rate:                          0.1
Pigeonhole mode with overlap:        0
Shape:                               11111111111
Repeat threshold:                    1000
Overabundance threshold:             1
Program PID:                         29923

25014636 bps of 100000 reads loaded.
Loading reads took                   0.507097 seconds

Number of threads:                   32

Initialization took                  0.05828 seconds

Process genome seq #0[fwd]........1M.
[###EOF###]

### the according TOP-output says:
top - 12:41:08 up 49 days, 45 min, 2 users, load average: 32.80, 21.50, 10.65
Tasks: 980 total,   3 running, 976 sleeping,   1 stopped, 0 zombie
Cpu(s): 41.2%us, 0.1%sy, 0.0%ni, 58.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 132003720k total, 124116252k used, 7887468k free, 98572k buffers
Swap: 32764556k total,    31340k used, 32733216k free, 12365036k cached

PID USER      PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29923 uappelt   25   0 136g 97g 1928 R 3197.9 77.3 141:03.96 razers3
29931 uappelt   16   0 11544 1772 780 R 1.0 0.0 0:02.60 top
[###EOF###]

### My according output is: *DOOOOH*[###EOF###]

So sample 3 kind of skrewed the machine. The poor thing is close to swapping and CPUs are under heavy loads, while progress didn't move a single period further (within minutes). I guess, extremely short reads (below 50 nukes) are the culprits to blame, since this phenomenon doesn't occur with longer reads in samples 1 & 2. Still, less than 100 reads are in fact shorter than 50 nukes. And even a million reference-genome locations that were asked to be reported per read shouldn't cause this, should it.

Is there an easy way to prevent this behavior? Filtering for short reads would certainly be an option. Couldn't, however, long reads entirely lying within repetitive elements (e.g. SINES or LINES) cause similar malloc-behavior? This case couldn't be taken care of so easily from outside the read mapper, could it.

Let me know, if you want reads of sample 3 to be emailed somewhere (gzipped file size: 22,558 KB)

Cheers,
Uwe

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

<-- thread -->

<-- date

References:
- [Seqan-dev] RazerS3: extreme malloc
  - From: Uwe Appelt <uappelt@clcbio.com>

seqan-dev - August 2012 - Archives indexes sorted by:
[ thread ] [ subject ] [ author ] [ date ]
Complete archive of the seqan-dev mailing list
More info on this list...