Hey Brett,
by just looking at the seed chain I might think that the problem is the bad chain in the first place.
The first seed begins at position 7688 for the first sequence, while the seed for the second sequence starts at position 57.
There I can have at most 57 matches and have 7631 gaps. Given a score of -7 for the gaps and 4 for the matches I have a score of at least -53189 which in fact exceeds the bound of unsigned shorts -32.768 - 32.767.
This would also explain why it works with a gap penalty of -2 ~ -15300.
Using unsigned short cannot give correct results, as it is only defined on the values from 0 to 65535. In this case -7 would also be interpreted as 65529.
So somewhere in the first matrix is an overflow, which might lead to the problem with the traceback. I will investigate this more thoroughly in the next days.
For the moment I suggest 2 things.
A) Try to use signed integer types and see if it runs through. (Just to check if it is really related to the value overflow.)
B) Use chaos chaining when adding seeds to the seed set. From the data set I would guess you used the simple merge strategy, but given that the sequences might not be “very” similar the chaos chaining might compute better results and then the
banded chain alignment might work as well.
Did you changed the default band settings for the band width of the bandedChainAlignment (this parameter k)?
Cheers,
René
---
René Rahn
Ph.D. Student
--------------------------------
Institute of Computer Science
Algorithmic Bioinformatics (ABI)
--------------------------------
Freie Universität Berlin
Takustraße 9
14195 Berlin
--------------------------------
|