Hi Brett,
the k-Parameter specifies the band width around seeds given by the passed seed chain. This is compliant to the banded chain alignment algorithm described in the LAGAN paper.So instead of taking the seed as is, one can compute a banded DP around the seed to find the optimal trace into and out of the seed.
My best guess, in your example is, that the entire alignment matrix is to small for a band extension of 15.Nevertheless it should work, but maybe this is an uncovered border case. Can you please supply the seed chain?It shouldn’t be to much on these small sequences?
Cheers,
René
On 23 May 2015, at 00:53, Brett Bowman <bnbowman@gmail.com> wrote:
_______________________________________________I'm trying to align two highly similar sequences found via Kmer search:Query = "ATCTCTCTCAACAAAACAACGAGGAGGAGTGAAAAGAGAGAGAT"Reference = "ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT"
The expected alignment looks like this:
Score: 800 . : . : . : . : .ATCTCTCTCAACAA-AACAAC-GAGGAGGAGTGAAAAGAGAGAGAT|||||||||||||| |||||| ||||||||| ||||||||||||||ATCTCTCTCAACAACAACAACGGAGGAGGAG-GAAAAGAGAGAGAT
But when I align it using the default values suggested by the tutorial, it doesn't show any inserted gaps at all, and I wind up with this instead:
Score: 800 . : . : . : . :
ATCTCTCTCAACAAAACAACGAGGAGGAGTGAAAAGAGAGAGAT|||||||||||||| | | | | | | |||ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGA
I finally traced it down to the k-value (bandExtension value) passed into the alignment algorithm - values of K <= 13 succeed and generate the top-most alignment, while the values of 14-15 like the default (15) report the low-quality alignment.
Yet oddly, both alignments report the correct alignment score at the end - so it's not failing, precisely. It's just not storing or displaying the correct alignment.
So I have two questions:1) What exactly does the k / bandExtension variable do?2) What is going on here?
My code is pasted below for your use.
Sincerely,-Brett
"""#include <seqan/seeds.h>
using namespace seqan;
DnaString query = "ATCTCTCTCAACAAAACAACGAGGAGGAGTGAAAAGAGAGAGAT";DnaString ref = "ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT";typedef Seed<Simple> TSeed;String<TSeed> seedChain;appendValue(seedChain, TSeed( 0, 0, 14));appendValue(seedChain, TSeed(30, 31, 14));Score<int, Simple> scoringScheme(2, -1, -2);Align<DnaString, ArrayGaps> alignment1;resize(rows(alignment1), 2);assignSource(row(alignment1, 0), query);assignSource(row(alignment1, 1), ref);Align<DnaString, ArrayGaps> alignment2;resize(rows(alignment2), 2);assignSource(row(alignment2, 0), query);assignSource(row(alignment2, 1), ref);int result1 = bandedChainAlignment(alignment1, seedChain, scoringScheme, 14);std::cout << "Score: " << result1 << std::endl;std::cout << alignment1 << std::endl;int result2 = bandedChainAlignment(alignment2, seedChain, scoringScheme, 13);std::cout << "Score: " << result2 << std::endl;std::cout << alignment2 << std::endl;"""
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev
---
René RahnPh.D. Student--------------------------------Institute of Computer ScienceAlgorithmic Bioinformatics (ABI)--------------------------------Freie Universität BerlinTakustraße 914195 Berlin--------------------------------
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev