Re: [Seqan-dev] rabema_prepare_sam "File must be sorted by query name."

"Holtgrewe, Manuel" <manuel.holtgrewe@fu-berlin.de> · Tue, 11 Sep 2012 14:14:57 +0000

Hi Uwe,

RazerS 3 will write out the reads in the same order as in the input. This is independent of multi-threading.

The problem is that rabema_prepare_sam assumes that the reads are sorted by their name with "Windows 7 Explorer File Sorting", i.e. zz100zz comes after zz10zz. Thus this sorting is not lexicographically. This is also the sorting that "samtools sort" uses when sorting by query name. In rabema_prepare_sam, this is only used as a sanity check.

I have just added an option --dont-check-sorting to rabema_prepare_sam that disables this check. I hope this fixes your issues.

Sorry for any inconvenience.

Cheers,
Manuel
________________________________________
From: Uwe Appelt [uappelt@clcbio.com]
Sent: Tuesday, September 11, 2012 11:10 AM
To: SeqAn Development
Subject: [Seqan-dev] rabema_prepare_sam "File must be sorted by query name."

Hi guys,

I've got a small problem in Rabema: I am preparing another gold standard
for a Rabema benchmark and received the following error msg:

./razers3 -v -tc 32 -of sam -rr 100 -i 90 -m 1000000 -ds -o
./output0.sam ./hg19.fa ./input.fastq
[load of outputs]
./rabema_prepare_sam ./output0.sam > ./output.sam

ERROR: 16A7I:4:100 succeeds 16A7I:49:92 in SAM file.
File must be sorted by query name.

Seems like razers3 and rabema_prepare_sam are somehow out of sync with
respect to their sorting code (maybe it's just the multi-threading in
razers3 that sometimes results in slightly "unordered" output of mapping
results)? The Trunk-checkout is, however, already a week old, so I might
have to re-run everything. Before doing so, I wanted to ask for advice
here, because re-running razers3 will take 4 days. Any ideas or suggestions?

Thanks in advance,
Uwe

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev