Hi Tina, there was a bug in the breakpoint computation for edit distance split mapping. What basically happened was that a read position right next to an insertion breakpoint could be used more than once (in suffix as well as prefix match). The incorrect matches then led to the wrong cigar strings.. I just checked in the fixed version. Cheers, Anne-Katrin ----------------------------------------------- Anne-Katrin Emde +49 30 83875240 emde@inf.fu-berlin.de Algorithmic Bioinformatics Department of Computer Science Freie Universität Berlin ---------------------------------------------- ________________________________________ Von: Tina Hu [t.hu@celmatix.com] Gesendet: Mittwoch, 20. Juni 2012 16:20 An: seqan-dev@lists.fu-berlin.de Betreff: Re: [Seqan-dev] splazers cigar mismatch Hi Manuel, Below are the sequences that produce this mismatch. This is run against chr1 of hg19. Thanks! Tina @truncated AAATTGCGTTTAAATTCTTCCCTGGAGGCAGAACACTAAATCCTTTTGTAA TTATTTACAGCTCACATCCTTAGAGCTGCAATATGTTTCGCCATAACTTCTGATGTGCCG GGACCTTGAATTGGCTGTGCCAGGAACATATGACCCAACCNGCCAATCATTCGCATTCAG TCCATAG + IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIII @full GTGCCTGCTTTCTCGTCTCAGGCCTATGGTCGAGATTTAATGGAGGCCCAAGAGTGGTGC AGGAAGTACATGAAATCAGGGAATGTCAAGGACCTCACGCAAGCCTGGGACCTCTATTAT CATGTGTTCCGACGAATCTCAAAGCAGCTGCCTCAGGTAGGATCTTCAGGCTCCTGGCAG GGTTAACTGTCATTATAGTCCTTTCTGTTTTACTGTCTATGTAATTGTTCTAACACTCAT TCCAAAGCATCTGGTTTTACTCTGCTTTGGGACAAGTAATTGTTATTAGCAATCCACTGA AAAATCTTAAAATTGCGTTTAAATTCTTCCCTGGAGGCAGAACACTAAATCCTTTTGTAA TTATTTACAGCTCACATCCTTAGAGCTGCAATATGTTTCGCCAAAACTTCTGATGTGCCG GGACCTTGAATTGGCTGTGCCAGGAACATATGACCCAACCNGCCAATCATTCGCATTCAG TCCATAG + IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIII -- Tina Hu, Ph.D. Bioinformatics Scientist Celmatix, Inc. 1 Little West 12th St. New York, NY 10014 917.426.2878 www.celmatix.com This email message and any attachments being sent by Celmatix, Inc., are confidential, and may be privileged. If you are not the intended recipient, please notify us immediately—by replying to this message—and destroy all copies of this message and any attachments. Thank you. On Jun 20, 2012, at 6:00 AM, seqan-dev-request@lists.fu-berlin.de wrote: > Today's Topics: > > 1. splazers cigar mismatch (Tina Hu) > 2. Re: splazers cigar mismatch (Holtgrewe, Manuel) > > ------------------------------ > Message: 2 > Date: Wed, 20 Jun 2012 05:15:29 +0000 > From: "Holtgrewe, Manuel" <manuel.holtgrewe@fu-berlin.de> > To: SeqAn Development <seqan-dev@lists.fu-berlin.de> > Subject: Re: [Seqan-dev] splazers cigar mismatch > Message-ID: > <FCCAB9D80C3DAB47B5601C5B0E62872B0A2D1F@ex02a.campus.fu-berlin.de> > Content-Type: text/plain; charset="windows-1252" > > Dear Tina, > > could you provide us with the whole problematic record? > > Cheers, > Manuel > > ________________________________ > From: Tina Hu [t.hu@celmatix.com] > Sent: Tuesday, June 19, 2012 9:13 PM > To: seqan-dev@lists.fu-berlin.de > Subject: [Seqan-dev] splazers cigar mismatch > > Hi, > > I get the following error from snp_store (on output from splazers): > WARNING! Read TESTING: cigar alignment length does not match genome coordinates. Discarding read.. > > I'm fairly certain this error stems from splazers as the alignment length computed from the cigar string does not match the start and end match position (off by 1 bp). Interestingly, when I truncate 300 bps same sequence from 478 bps to 178 bps, the cigar string appears to be correct. Note that 32M is now 31M which is the correct length. > > cigar of the longer sequence (478 bps): > 14M1I206M1I2M1I35M1D45M1D3M1I147M1D32M > > cigar of the truncated sequence (178 bps): > 147M1D31M > > Has anyone else encountered this issue and is there a fix? > > Thanks! > Tina > > -- > Tina Hu, Ph.D. > Bioinformatics Scientist > Celmatix, Inc. > 1 Little West 12th St. > New York, NY 10014 > 917.426.2878 > > www.celmatix.com<http://www.celmatix.com> > > This email message and any attachments being sent by Celmatix, Inc., are confidential, and may be privileged. If you are not the intended recipient, please notify us immediately?by replying to this message?and destroy all copies of this message and any attachments. Thank you. > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: <https://lists.fu-berlin.de/pipermail/seqan-dev/attachments/20120620/8f874102/attachment.htm> > > ------------------------------ > > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev > > > End of seqan-dev Digest, Vol 33, Issue 6 > **************************************** _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de https://lists.fu-berlin.de/listinfo/seqan-dev