Re: [Seqan-dev] bandedChainAlignment default failing due to default k
- From: Brett Bowman <bnbowman@gmail.com>
- To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
- Date: Mon, 8 Jun 2015 10:46:27 -0700
- Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
- Subject: Re: [Seqan-dev] bandedChainAlignment default failing due to default k
If you're asking for the seed chain used used generate the above example, it's given in the above code:
"""
appendValue(seedChain, TSeed( 0, 0, 14));
appendValue(seedChain, TSeed(30, 31, 14));
"""
So the entire SeedChain is just equivalent to
"""
[ (0, 0, 14, 14), (30, 31, 44, 45) ]
"""
Unless you mean something else by SeedChain?
Also I found another case of this in a larger sequence (over 100bp on each side), such that I don't think it's a simple size-of-the-matrix issue:
Query:
TAACTCATCATCAGCAATCCGAACAAAACCGCTGAACCGACCCTCGGACAAACAACAACGGAGGAGGAGAGAGGAGAGAGGAGCAGTTCAGGCGGTTGATTACGGAGGATTAGAAATAC
Reference:
ACGCCATAGTGACTGGCGATGCTGTCGGAATGGACGATATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGATATCGTCCATTCCGACAGCATCGCCAGTCACTATGGC
With a single matching 19mer seed in the dead center:
Query 50 AACAACAACGGAGGAGGAG 69
Reference 50 AACAACAACGGAGGAGGAG 69
What I except to see is something like this:
0 . : . : . : . : . :
------TAACTCAT-CATCAGCAAT-CCGAACAAAAC-C--GC-TGA-AC
| | || | | | | | ||| | | | || || |
GCTTGGT---T-ATGC--CGGTACTGCCGGGC----CTCTTGCGGGATAT
50 . : . : . : . : . :
CGACCCTCGGACAAACAACAACGGAGGAGGA-GAGAGGAGAGAG-GA---
| | ||| || |||||||||||||||||| || | ||||||| |
C-TCTCTC-AAC-AACAACAACGGAGGAGGAGGAAAAGAGAGAGATATCC
100 . : . : . : . :
-GC-AGTTCAGG--CGGTTGATTACGGAGGATTAGA--AATAC
|| || ||| ||| | ||| | | | || | ||
CGCAAG---AGGCCCGG--CAGTACCG-GCA-TA-ACCAA---
Which I can then trim down to the core region I want here:
0 . : . : . :
AACAACAACGGAGGAGGA-GAGAGGAGAGAG
|||||||||||||||||| || | |||||||
AACAACAACGGAGGAGGAGGAAAAGAGAGAG
But instead, if my bandExtension is below 14 I get this garbage instead:
0 . : . : . : . : . :
-------------------------------------TAACTCATCATCA
|| ||| || |||
GCTTGGTTATGCCGGTACTGCCGGGCCTCTTGCGGGATATCTC-TC-TCA
50 . : . : . : . : . :
GCAATCCGAACAAAACCGCTGAACCGACCCTCGGACAAACAACAACGGAG
||| | ||| || || | | | ||| || || | ||
ACAA--C-AAC--AA-CG--G-A--G----GAGGAGGAA-AA-GA--GA-
100 . : . : . : . : . :
GAGGAGA----G-AGGAGAGAG---GAGCAGTTCAGGCGGTTGATTACGG
|| || | | | |||| | | |||| | | ||| || ||
GA-GATATCCCGCA--AGAG-GCCCG-GCAG-T-A-CCGG--CATAACCA
150 . :
AGGATTAGAAATAC
|
A-------------
-Brett
On Wed, Jun 3, 2015 at 8:28 AM, Rahn, René <rene.maerker@fu-berlin.de> wrote:
Hi Brett,
the k-Parameter specifies the band width around seeds given by the passed seed chain. This is compliant to the banded chain alignment algorithm described in the LAGAN paper.So instead of taking the seed as is, one can compute a banded DP around the seed to find the optimal trace into and out of the seed.
My best guess, in your example is, that the entire alignment matrix is to small for a band extension of 15.Nevertheless it should work, but maybe this is an uncovered border case. Can you please supply the seed chain?It shouldn’t be to much on these small sequences?
Cheers,
René
On 23 May 2015, at 00:53, Brett Bowman <bnbowman@gmail.com> wrote:
_______________________________________________I'm trying to align two highly similar sequences found via Kmer search:Query = "ATCTCTCTCAACAAAACAACGAGGAGGAGTGAAAAGAGAGAGAT"Reference = "ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT"
The expected alignment looks like this:
Score: 800 . : . : . : . : .ATCTCTCTCAACAA-AACAAC-GAGGAGGAGTGAAAAGAGAGAGAT|||||||||||||| |||||| ||||||||| ||||||||||||||ATCTCTCTCAACAACAACAACGGAGGAGGAG-GAAAAGAGAGAGAT
But when I align it using the default values suggested by the tutorial, it doesn't show any inserted gaps at all, and I wind up with this instead:
Score: 800 . : . : . : . :
ATCTCTCTCAACAAAACAACGAGGAGGAGTGAAAAGAGAGAGAT|||||||||||||| | | | | | | |||ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGA
I finally traced it down to the k-value (bandExtension value) passed into the alignment algorithm - values of K <= 13 succeed and generate the top-most alignment, while the values of 14-15 like the default (15) report the low-quality alignment.
Yet oddly, both alignments report the correct alignment score at the end - so it's not failing, precisely. It's just not storing or displaying the correct alignment.
So I have two questions:1) What exactly does the k / bandExtension variable do?2) What is going on here?
My code is pasted below for your use.
Sincerely,-Brett
"""#include <seqan/seeds.h>
using namespace seqan;
DnaString query = "ATCTCTCTCAACAAAACAACGAGGAGGAGTGAAAAGAGAGAGAT";DnaString ref = "ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT";typedef Seed<Simple> TSeed;String<TSeed> seedChain;appendValue(seedChain, TSeed( 0, 0, 14));appendValue(seedChain, TSeed(30, 31, 14));Score<int, Simple> scoringScheme(2, -1, -2);Align<DnaString, ArrayGaps> alignment1;resize(rows(alignment1), 2);assignSource(row(alignment1, 0), query);assignSource(row(alignment1, 1), ref);Align<DnaString, ArrayGaps> alignment2;resize(rows(alignment2), 2);assignSource(row(alignment2, 0), query);assignSource(row(alignment2, 1), ref);int result1 = bandedChainAlignment(alignment1, seedChain, scoringScheme, 14);std::cout << "Score: " << result1 << std::endl;std::cout << alignment1 << std::endl;int result2 = bandedChainAlignment(alignment2, seedChain, scoringScheme, 13);std::cout << "Score: " << result2 << std::endl;std::cout << alignment2 << std::endl;"""
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev
---
René RahnPh.D. Student--------------------------------Institute of Computer ScienceAlgorithmic Bioinformatics (ABI)--------------------------------Freie Universität BerlinTakustraße 914195 Berlin--------------------------------
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev
- Follow-Ups:
- Re: [Seqan-dev] bandedChainAlignment default failing due to default k
- From: Rahn, René <rene.maerker@fu-berlin.de>
- Re: [Seqan-dev] bandedChainAlignment default failing due to default k
- References:
- Re: [Seqan-dev] bandedChainAlignment default failing due to default k
- From: Rahn, René <rene.maerker@fu-berlin.de>
- Re: [Seqan-dev] bandedChainAlignment default failing due to default k
-
seqan-dev - June 2015 - Archives indexes sorted by:
[ thread ] [ subject ] [ author ] [ date ] - Complete archive of the seqan-dev mailing list
- More info on this list...