FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

[Seqan-dev] RazerS3 result duplication and other issues

<-- thread -->
<-- date -->
  • From: Fredrik Boulund <fredrik.boulund@chalmers.se>
  • To: <seqan-dev@lists.fu-berlin.de>
  • Date: Thu, 1 Nov 2012 13:44:42 +0100
  • Reply-to: fredrik.boulund@chalmers.se, SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: [Seqan-dev] RazerS3 result duplication and other issues

Hi,

I'm trying to use RazerS3 in a project were I want to map reads to contigs and I'm putting RazerS3 in an automated pipeline written in Python.
The version of RazerS3 I'm currently using is:
VERSION                                                             
    razers3 version: 3.1 [12568]                                    
    Last update 2012-08-20                                          
This was the most recent binary on the SeqAn website. I have also compiled RazerS3 from sources (razers3 version: 3.1 [12787] Last update 2012-09-03) but the same problems exist there.

First off, a minor annoyance, RazerS3 produces some output to stderr when running the program that seems a bit like some kind of debugging information rather than information for the user.
This looks something like this, and is shown every time the program is run:
useExternalSort == 0                      
useSequentialCompaction == 0              
length(threadLocalStorages[0].matches) == 20
length(store.alignedReadStore) == 20  


Second, a bit more severe; it seems that RazerS3 is producing duplicate output in some cases for the same read mapped to the same contig (in the same position). Is this intended or just a "feature"? Attached you'll find a zip-file with contigs and reads that reproduce the errors.
In some cases this also is worsened when running RazerS3 multithreaded (i.e. using the -tc commandline option). When running multithreaded we sometimes see an increase in the amount of duplicates in the output, but it also seems to be a lot slower (sometimes even taking even substantially more time than running it single-threaded) but still consuming cpu-time for each thread as if it's actually doing the entire alignment N number of times (N = number of cores chosen). Very strange. I'm sorry to say that I unfortunately haven't been able to replicate this behavior in such small files that can be easily attached to email.

Below is an example using the attached files to illustrate what I mean about the duplications (and the non-user-friendly debug-looking output that is printed to stderr on program exit).

[example]$ ls
contigs.fa reads.fs

[example]$ razers3
razers3 - Fast Read Mapping with Sensitivity Control                
====================================================                
    razers3 [OPTIONS] <GENOME FILE> <READS FILE>                    
    razers3 [OPTIONS] <GENOME FILE> <PE-READS FILE1> <PE-READS FILE2>
    Try 'razers3 --help' for more information.                      
                                                                    
VERSION                                                             
    razers3 version: 3.1 [12568]                                    
    Last update 2012-08-20                                          

[example]$ razers3 contigs.fa reads.fa
useExternalSort == 0                      
useSequentialCompaction == 0              
length(threadLocalStorages[0].matches) == 20
length(store.alignedReadStore) == 20                

[example]$ cat reads.fa.results
read1   0       75      F       contig1 0       75      100  
read1   0       75      F       contig1 0       75      100  
read1   0       75      F       contig3 0       74      98.667
read1   0       75      F       contig3 0       74      98.667
read1   0       75      F       contig3 0       74      98.667
read1   0       75      R       contig4 582     657     100  
read2   0       67      F       contig1 0       67      100  
read2   0       67      F       contig1 0       67      100  
read2   0       67      F       contig3 0       66      98.507
read2   0       67      F       contig3 0       66      98.507
read2   0       67      F       contig3 0       66      98.507
read2   0       67      R       contig4 590     657     100  
read3   0       72      F       contig1 0       75      95.833
read3   0       72      F       contig1 0       75      95.833
read3   0       72      R       contig4 582     657     95.833
read4   0       65      F       contig1 5       70      100  
read4   0       65      F       contig2 0       64      98.462
read4   0       65      F       contig3 4       69      100  
read4   0       65      F       contig3 4       69      100  
read4   0       65      R       contig4 587     652     100  


As you can see there are several duplicates of identical mappings (for example the first to rows). This seems to be alleviated by using "-m 1", outputting only the best match for each read, although this is not desirable in my case.

I hope any of the information helps you figure out the issues.

I also have a couple of questions on how RazerS3 reads the files. Would it be possible for RazerS3 to accept reads from a stream or FIFO? I made some tests and had a quick glance in the source code and it appears that RazerS3 tries to seek in the reads-file at some point, breaking my pipe. I sincerely hope that it does not read the entire reads-file into memory at any point, since I'm concerned that it'll start swapping if I load large data sets. Do you have any tips here to help me out?

Best regards

Fredrik Boulund
-- 
——————————————————————————————————————————————————
Fredrik Boulund, PhD student
Department of Mathematical Sciences
Division of Mathematical Statistics
Chalmers University of Technology, 
SE-412 96 Gothenburg, Sweden
Email: fredrik.boulund@chalmers.se 
Phone: +46(0)31-772-5342, mobile +46(0)73-770 6629
——————————————————————————————————————————————————

Attachment: example.zip
Description: Zip compressed data

<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] RazerS3 result duplication and other issues
      • From: "Holtgrewe, Manuel" <manuel.holtgrewe@fu-berlin.de>
  • seqan-dev - November 2012 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal