OK, there was another gotcha, I guess :)In the code, I took the sample from, the contig is actually in the fragment store. In your case, you have to use "Nothing" for the contig sequence. See the attached and now hopefully fixed file.
Am 21.03.2011 um 14:29 schrieb Mat:
Yes Manuel, you are right. This assumption was only valid with the sorting of the store via://sortAlignedReads(fragStore.alignedReadStore, SortId()); which should be uncommented i guess.Sorry - but still i am not getting consistent results (but i have the feeling i am getting closer at least;-) ). So if i use the function positionGapToSeq i should get always the SAM coordinates, not gap space right? I created a mini-sam file containing the two lines:illumina_80bp_3kb.000000571 99 contig00002 19626 60 80M = 22401 2855 TGAAAACTGGGTAGAATTTCTGTTCGTTCCAAAAATGTCTCTCTACGTGGCAGCTGATGGTACTCTGGAGACACATGTCA HHHIHHHHHHGIGHGIJHIJJIIJKIGEQFHEHDJGEDMMBHKBFMIHEIJGLEEFIJKFID@A? LLHKEMLIDBLGBGG XT:A:U NM:i:2 SM:i:37 AM:i:37 X0:i: 1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:29G32T17 illumina_80bp_3kb.000000571 147 contig00002 22401 60 80M = 19626 -2855 GTGTTTATGTTTAAAAAAAATTTCAAACCAGATTGAGATGCAATCTTTCAAATGAGGGTAACAGTAAATATATATGTAGT HIHIIHHHIIIGGGIIIFIIHGHLIHOFLFHKKIKHJHILFMMNLGFDLMFPHIEGPHOLJHJHKGII @HHNPDIIAQTD XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:80Then i get (by using your code): illumina_80bp_3kb.000000571 0,0if i use it on sam files of different size it varies between 0,0 (smaller files) and the correct coordinates (big files)...cheers mat Am 3/21/11 1:12 PM, schrieb Holtgrewe, Manuel:Your problem is that you use an alignment id as the read id. See my proposal for a fixed version attached that yields consistent results for me (regarding the positions, there still is the bug using alignment ids for read ids so the printed read names are wrong).$ ./demos/SAMtest -s bug2.sam Loading SAM file bug2.sam Reads cached... n:2825 id: 1022 illumina_80bp_3kb.000000199 19625,19705 $ ./demos/SAMtest -s bug3.sam Loading SAM file bug3.sam Reads cached... n:3253 id: 1022 illumina_80bp_3kb.000000170 19625,19705 $ ./demos/SAMtest -s bug4.sam Loading SAM file bug4.sam Reads cached... n:3648 id: 1022 illumina_80bp_3kb.000000148 19625,19705 Am 21.03.2011 um 12:52 schrieb Mat:Hi! So if i adapt the code i still get the weired results: test.sam: illumina_80bp_3kb.000000571 2128,2048 bug2.sam (70000 lines of test.sam): illumina_80bp_3kb.000000571 237,157 (code attached...) cheers <SAMtest.cpp><ATT00001..txt> _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de https://lists.fu-berlin.de/listinfo/seqan-dev<mini.sam><ATT00001..txt>
-- Manuel Holtgrewe manuel.holtgrewe@fu-berlin.de Freie Universität Berlin http://www.inf.fu-berlin.de/ Institut für Informatik Phone: +49 30 838 75246 Takustraße 9 Algorithmic Bioinformatics 14195 Berlin Room 021
Attachment:
SAMtest.cpp
Description: Binary data