Der Fredrik,
sorry for not getting back to you earlier.
I removed the errorneous message, thank you for reporting this.
We will look into the duplication issue as soon as possible. However, because of deadlines I have to ask you for a bit more patience regarding this issue.
Cheers,
Manuel
From: Fredrik Boulund [fredrik.boulund@chalmers.se]
Sent: Thursday, November 01, 2012 1:44 PM To: seqan-dev@lists.fu-berlin.de Subject: [Seqan-dev] RazerS3 result duplication and other issues Hi,
I'm trying to use RazerS3 in a project were I want to map reads to contigs and I'm putting RazerS3 in an automated pipeline written in Python. The version of RazerS3 I'm currently using is: VERSION razers3 version: 3.1 [12568] Last update 2012-08-20 This was the most recent binary on the SeqAn website. I have also compiled RazerS3 from sources (razers3 version: 3.1 [12787] Last update 2012-09-03) but the same problems exist there. First off, a minor annoyance, RazerS3 produces some output to stderr when running the program that seems a bit like some kind of debugging information rather than information for the user. This looks something like this, and is shown every time the program is run: useExternalSort == 0 useSequentialCompaction == 0 length(threadLocalStorages[0].matches) == 20 length(store.alignedReadStore) == 20 Second, a bit more severe; it seems that RazerS3 is producing duplicate output in some cases for the same read mapped to the same contig (in the same position). Is this intended or just a "feature"? Attached you'll find a zip-file with contigs and reads that reproduce the errors. In some cases this also is worsened when running RazerS3 multithreaded (i.e. using the -tc commandline option). When running multithreaded we sometimes see an increase in the amount of duplicates in the output, but it also seems to be a lot slower (sometimes even taking even substantially more time than running it single-threaded) but still consuming cpu-time for each thread as if it's actually doing the entire alignment N number of times (N = number of cores chosen). Very strange. I'm sorry to say that I unfortunately haven't been able to replicate this behavior in such small files that can be easily attached to email. Below is an example using the attached files to illustrate what I mean about the duplications (and the non-user-friendly debug-looking output that is printed to stderr on program exit). [example]$ ls contigs.fa reads.fs [example]$ razers3 razers3 - Fast Read Mapping with Sensitivity Control ==================================================== razers3 [OPTIONS] <GENOME FILE> <READS FILE> razers3 [OPTIONS] <GENOME FILE> <PE-READS FILE1> <PE-READS FILE2> Try 'razers3 --help' for more information. VERSION razers3 version: 3.1 [12568] Last update 2012-08-20 [example]$ razers3 contigs.fa reads.fa useExternalSort == 0 useSequentialCompaction == 0 length(threadLocalStorages[0].matches) == 20 length(store.alignedReadStore) == 20 [example]$ cat reads.fa.results read1 0 75 F contig1 0 75 100 read1 0 75 F contig1 0 75 100 read1 0 75 F contig3 0 74 98.667 read1 0 75 F contig3 0 74 98.667 read1 0 75 F contig3 0 74 98.667 read1 0 75 R contig4 582 657 100 read2 0 67 F contig1 0 67 100 read2 0 67 F contig1 0 67 100 read2 0 67 F contig3 0 66 98.507 read2 0 67 F contig3 0 66 98.507 read2 0 67 F contig3 0 66 98.507 read2 0 67 R contig4 590 657 100 read3 0 72 F contig1 0 75 95.833 read3 0 72 F contig1 0 75 95.833 read3 0 72 R contig4 582 657 95.833 read4 0 65 F contig1 5 70 100 read4 0 65 F contig2 0 64 98.462 read4 0 65 F contig3 4 69 100 read4 0 65 F contig3 4 69 100 read4 0 65 R contig4 587 652 100 As you can see there are several duplicates of identical mappings (for example the first to rows). This seems to be alleviated by using "-m 1", outputting only the best match for each read, although this is not desirable in my case. I hope any of the information helps you figure out the issues. I also have a couple of questions on how RazerS3 reads the files. Would it be possible for RazerS3 to accept reads from a stream or FIFO? I made some tests and had a quick glance in the source code and it appears that RazerS3 tries to seek in the reads-file at some point, breaking my pipe. I sincerely hope that it does not read the entire reads-file into memory at any point, since I'm concerned that it'll start swapping if I load large data sets. Do you have any tips here to help me out? Best regards Fredrik Boulund -- —————————————————————————————————————————————————— Fredrik Boulund, PhD student Department of Mathematical Sciences Division of Mathematical Statistics Chalmers University of Technology, SE-412 96 Gothenburg, Sweden Email: fredrik.boulund@chalmers.se Phone: +46(0)31-772-5342, mobile +46(0)73-770 6629 —————————————————————————————————————————————————— |