FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] Fatal Error in Banded Chain Alignment Traceback

<-- thread -->
<-- date -->
  • From: Brett Bowman <bnbowman@gmail.com>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Tue, 18 Nov 2014 11:30:07 -0800
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] Fatal Error in Banded Chain Alignment Traceback

Thanks Rene, I came to the same conclusion.  That portion of the code I got from a colleague, and I've begun looking at what he's doing to see where the problem arises.  We're both new to your library and so there is a bit of a learning curve...  Are there any good examples or tutorials on using SeqAn's Q-grams or other SA tools to find seeds we could use?  The Seed-And-Extend tutorial only talks about hand-picked seeds it seems.

As for the Signed/Unsigned tools - digging into the internal tools, it looks like the code that uses shorts accepts large references, but only <100bp queries, which explains why they can get away with using shorts.  Considering we're trying to parallelize many large alignments the savings to memory from staying within shorts instead of longs would be significant - but clearly we'll need to write our own code to do DP while handling over-flows if so... 

-Brett

On Tue, Nov 18, 2014 at 7:25 AM, Rahn, René <rene.maerker@fu-berlin.de> wrote:
Sorry, 

On 18 Nov 2014, at 16:19, Rahn, René <rene.maerker@fu-berlin.de> wrote:

Hey Brett, 

by just looking at the seed chain I might think that the problem is the bad chain in the first place.
The first seed begins at position 7688 for the first sequence, while the seed for the second sequence starts at position 57. 
There I can have at most 57 matches and have 7631 gaps. Given a score of -7 for the gaps and 4 for the matches I have a score of at least -53189 which in fact exceeds the bound of
unsigned shorts -32.768 - 32.767.

I meant signed shorts of course :).

This would also explain why it works with a gap penalty of -2 ~ -15300.
Using unsigned short cannot give correct results, as it is only defined on the values from 0 to 65535. In this case -7 would also be interpreted as 65529.
So somewhere in the first matrix is an overflow, which might lead to the problem with the traceback. I will investigate this more thoroughly in the next days. 
For the moment I suggest 2 things. 
A) Try to use signed integer types and see if it runs through. (Just to check if it is really related to the value overflow.)
B) Use chaos chaining when adding seeds to the seed set. From the data set I would guess you used the simple merge strategy, but given that the sequences might not be “very” similar the chaos chaining might compute better results and then the banded chain alignment might work as well.

Did you changed the default band settings for the band width of the bandedChainAlignment (this parameter k)?

Cheers,

René

On 18 Nov 2014, at 02:17, Brett Bowman <bnbowman@gmail.com> wrote:

Opps, I just noticed a mistake in my previous e-mail.  I wrote:

"""
The scoring scheme had to be changed from:
Score<short, Simple> scoringScheme(4, -13, -7);   // Determined empirically to work well on PacBio data with my home-brewed aligner
to:
Score<short, Simple> scoringScheme(2, -1, -2);     // Scoring scheme recommended/tested by SeqAn docs
"""

but what I meant was (changes in CAPITAL):
"""
The scoring scheme had to be changed from:
Score<UNSIGNED short, Simple> scoringScheme(4, -13, -7);   // Determined empirically to work well on PacBio data with my home-brewed aligner
to:
Score<UNSIGNED short, Simple> scoringScheme(2, -1, -2);     // Scoring scheme recommended/tested by SeqAn docs
"""

I was not surprised when regular shorts were not large enough given that we're dealing with ~10kb alignments, but I was surprised that unsigned shorts did not work.  They are preferable given their smaller size, and used by some of my home-brewed aligners. 

-Brett

On Sun, Nov 16, 2014 at 11:52 PM, Brett Bowman <bnbowman@gmail.com> wrote:
I've gotten a rudimentary alignment out, but I had to tweak a few things:

"""
      0     .    :    .    :    .    :    .    :    .    : 
        AAAGAG--AGAGAT-ATGCCAGTAATA-CTTGAAAGATATTGCCGAGCTG
           | |   |  || |||||||||||| ||||||||||||||||||||||
        ---GTGCCCGTCATAATGCCAGTAATAGCTTGAAAGATATTGCCGAGCTG

     50     .    :    .    :    .    :    .    :    .    : 
        GTGCCGTTTGCCCATCGTTATGGTGCAAAAA-TTTCGTCACGCTTAACAC
        ||||||||||||||||||||||||||||||| ||||||||||||||||||
        GTGCCGTTTGCCCATCGTTATGGTGCAAAAATTTTCGTCACGCTTAACAC

    100     .    :    .    :    .    :    .    :    .    : 
        CATTTTGCATGATGATGAGCTGGAACCCGCGC-ACGGCTGATTACTGACC
        |||||||||||||||||||||||||||||||| |||||||||||||||||
        CATTTTGCATGATGATGAGCTGGAACCCGCGCAACGGCTGATTACTGACC

    150     .    :    .    :    .    :    .    :    .    : 
        TCTACCAGACCGGTGTCGATGCGCTGATTGTT-AGGATATGGGGATTCTG
        |||||||||||||||||||||||||||||||| |||||||||||||||||
        TCTACCAGACCGGTGTCGATGCGCTGATTGTTCAGGATATGGGGATTCTG

    200     .    :    .    :    .    :    .    :    .    : 
        GAAACCTTGATATTCGCGCGCGATTGAACTGGCACGCCAGTACCGCAGTG
        | || |||||||||| ||| |||||||||| ||||||||||| |||||||
        G-AA-CTTGATATTC-CGC-CGATTGAACT-GCACGCCAGTA-CGCAGTG
"""

The scoring scheme had to be changed from:
Score<short, Simple> scoringScheme(4, -13, -7);   // Determined empirically to work well on PacBio data with my home-brewed aligner
to:
Score<short, Simple> scoringScheme(2, -1, -2);     // Scoring scheme recommended/tested by SeqAn docs

The initial Seed-set had to be set by hand with the reference-alignment's start position:
SeedSet<Simple> seeds2;
addSeed(seeds2, TSeed(  12,   12, 100), seqan::Single());
addSeed(seeds2, TSeed( 120,  120, 100), seqan::Single());

So it appears that there are at least two issues at play here:
-The value-type used for the bandedChainAlignment isn't large enough to handle multi-kb alignments with moderate-sized penalties
-The default chaining algorithm doesn't handle large numbers of seeds with small indel gaps well.

-Brett

On Sun, Nov 16, 2014 at 9:39 PM, Brett Bowman <bnbowman@gmail.com> wrote:
Thanks Rene,

My query, reference, and canonical alignment from BLASR (the PacBio-specific aligner) attached.

I also noticed that the seeds it's chaining with chainGlobalAlignment() are way off the diagonal, which may be related - details below: 

"""
Query #1 - m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/5779/0_10397
Finished finding seeds
Seeds: 1939
Finishing chaining seeds
Chain: 44
Seed<Simple, TConfig>(7688, 57, 7706, 75, lower diag = 7631, upper diag = 7631) Length: 18
Seed<Simple, TConfig>(7707, 94, 7722, 109, lower diag = 7613, upper diag = 7613) Length: 15
Seed<Simple, TConfig>(7933, 111, 7946, 124, lower diag = 7822, upper diag = 7822) Length: 13
Seed<Simple, TConfig>(7949, 130, 7965, 146, lower diag = 7819, upper diag = 7819) Length: 16
Seed<Simple, TConfig>(8065, 299, 8078, 312, lower diag = 7766, upper diag = 7766) Length: 13
Seed<Simple, TConfig>(8109, 314, 8126, 331, lower diag = 7795, upper diag = 7795) Length: 17
Seed<Simple, TConfig>(8169, 363, 8181, 375, lower diag = 7806, upper diag = 7806) Length: 12
Seed<Simple, TConfig>(8208, 381, 8222, 395, lower diag = 7827, upper diag = 7827) Length: 14
Seed<Simple, TConfig>(8222, 402, 8234, 414, lower diag = 7820, upper diag = 7820) Length: 12
Seed<Simple, TConfig>(8287, 446, 8304, 463, lower diag = 7841, upper diag = 7841) Length: 17
Seed<Simple, TConfig>(8304, 466, 8316, 478, lower diag = 7838, upper diag = 7838) Length: 12
Seed<Simple, TConfig>(8326, 748, 8338, 760, lower diag = 7578, upper diag = 7578) Length: 12
Seed<Simple, TConfig>(8354, 761, 8366, 773, lower diag = 7593, upper diag = 7593) Length: 12
Seed<Simple, TConfig>(8426, 801, 8438, 813, lower diag = 7625, upper diag = 7625) Length: 12
Seed<Simple, TConfig>(8702, 2291, 8721, 2310, lower diag = 6411, upper diag = 6411) Length: 19
Seed<Simple, TConfig>(8769, 2314, 8781, 2326, lower diag = 6455, upper diag = 6455) Length: 12
Seed<Simple, TConfig>(8794, 2387, 8806, 2399, lower diag = 6407, upper diag = 6407) Length: 12
Seed<Simple, TConfig>(8835, 2400, 8847, 2412, lower diag = 6435, upper diag = 6435) Length: 12
Seed<Simple, TConfig>(9162, 2650, 9174, 2662, lower diag = 6512, upper diag = 6512) Length: 12
Seed<Simple, TConfig>(9184, 2676, 9196, 2688, lower diag = 6508, upper diag = 6508) Length: 12
Seed<Simple, TConfig>(9196, 2736, 9208, 2748, lower diag = 6460, upper diag = 6460) Length: 12
Seed<Simple, TConfig>(9223, 2818, 9235, 2830, lower diag = 6405, upper diag = 6405) Length: 12
Seed<Simple, TConfig>(9251, 2844, 9267, 2860, lower diag = 6407, upper diag = 6407) Length: 16
Seed<Simple, TConfig>(9294, 2871, 9308, 2885, lower diag = 6423, upper diag = 6423) Length: 14
Seed<Simple, TConfig>(9314, 2891, 9327, 2904, lower diag = 6423, upper diag = 6423) Length: 13
Seed<Simple, TConfig>(9432, 2976, 9447, 2991, lower diag = 6456, upper diag = 6456) Length: 15
Seed<Simple, TConfig>(9515, 2992, 9527, 3004, lower diag = 6523, upper diag = 6523) Length: 12
Seed<Simple, TConfig>(9675, 3385, 9701, 3411, lower diag = 6290, upper diag = 6290) Length: 26
Seed<Simple, TConfig>(9724, 3414, 9736, 3426, lower diag = 6310, upper diag = 6310) Length: 12
Seed<Simple, TConfig>(9791, 3571, 9804, 3584, lower diag = 6220, upper diag = 6220) Length: 13
Seed<Simple, TConfig>(9805, 3596, 9817, 3608, lower diag = 6209, upper diag = 6209) Length: 12
Seed<Simple, TConfig>(9818, 4915, 9830, 4927, lower diag = 4903, upper diag = 4903) Length: 12
Seed<Simple, TConfig>(9842, 4939, 9854, 4951, lower diag = 4903, upper diag = 4903) Length: 12
Seed<Simple, TConfig>(9873, 5045, 9885, 5057, lower diag = 4828, upper diag = 4828) Length: 12
Seed<Simple, TConfig>(9886, 5077, 9898, 5089, lower diag = 4809, upper diag = 4809) Length: 12
Seed<Simple, TConfig>(9943, 5242, 9955, 5254, lower diag = 4701, upper diag = 4701) Length: 12
Seed<Simple, TConfig>(9957, 5385, 9969, 5397, lower diag = 4572, upper diag = 4572) Length: 12
Seed<Simple, TConfig>(9994, 5423, 10006, 5435, lower diag = 4571, upper diag = 4571) Length: 12
Seed<Simple, TConfig>(10062, 7096, 10074, 7108, lower diag = 2966, upper diag = 2966) Length: 12
Seed<Simple, TConfig>(10102, 7109, 10114, 7121, lower diag = 2993, upper diag = 2993) Length: 12
Seed<Simple, TConfig>(10156, 7131, 10178, 7153, lower diag = 3025, upper diag = 3025) Length: 22
Seed<Simple, TConfig>(10277, 7354, 10299, 7376, lower diag = 2923, upper diag = 2923) Length: 22
Seed<Simple, TConfig>(10299, 7378, 10313, 7392, lower diag = 2921, upper diag = 2921) Length: 14
Seed<Simple, TConfig>(10371, 7400, 10385, 7414, lower diag = 2971, upper diag = 2971) Length: 14
"""

-Brett

On Sun, Nov 16, 2014 at 2:57 AM, Rahn, René <rene.maerker@fu-berlin.de> wrote:
Hey Brett, 

It could have multiple reasons. 
In general, it means that there was a traceback path from a seed or a gap, that couldn’t be connected with one of the so far generated traceback paths. Note, that the traceback is computed incrementally to reduce the space requirements. 
If you could send me a simple text file containing the seeds of the seed chain (one seed per row using the <<-operator), that is input to the banded chain alignment function, I could have a look at it and fix the problem. 
Also, I opened an issue in github: https://github.com/seqan/seqan/issues/699, for tracking the issue.


Kind regards, 

René Rahn

On 16 Nov 2014, at 04:17, Brett Bowman <bnbowman@gmail.com> wrote:

I'm trying to build a tool with SeqAn to quickly align some PacBio data, using "bandedChainAlignment", but I'm consistently getting a crash during the trace-back step.  The generation of the Seeds (SuffixArray) and the finding of the base SeedChain works fine, but the final step of turning that into an alignment doesn't appear to work.  

Suggestions?

Error Log:
"""
bbowman@localhost:~/git/SRSLI$ ./src/C++/srsli test/data/query/ecoli_5p_sample.fastq test/data/reference/ecoliK12_5p_assembly.fasta 
Query 0
Finished finding seeds
Finishing chaining seeds
Starting alignment of sequences
/usr/include/seqan/seeds/banded_chain_alignment_traceback.h:195 Assertion failed : isGlued == true was: 0 != 1 (Fatal error while trying to connect trace backs: No glue point available!)
Aborted (core dumped)
"""

-Brett
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

---

René Rahn
Ph.D. Student
--------------------------------
Tel:  (+49) 30 838 75277
Mail: rene.rahn@fu-berlin.de
--------------------------------
Institute of Computer Science
Algorithmic Bioinformatics (ABI)
--------------------------------
Freie Universität Berlin
Takustraße 9
14195 Berlin
--------------------------------


_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev




_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

---

René Rahn
Ph.D. Student
--------------------------------
Tel:  (+49) 30 838 75277
Mail: rene.rahn@fu-berlin.de
--------------------------------
Institute of Computer Science
Algorithmic Bioinformatics (ABI)
--------------------------------
Freie Universität Berlin
Takustraße 9
14195 Berlin
--------------------------------

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

---

René Rahn
Ph.D. Student
--------------------------------
Tel:  (+49) 30 838 75277
Mail: rene.rahn@fu-berlin.de
--------------------------------
Institute of Computer Science
Algorithmic Bioinformatics (ABI)
--------------------------------
Freie Universität Berlin
Takustraße 9
14195 Berlin
--------------------------------


_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev


<-- thread -->
<-- date -->
  • References:
    • [Seqan-dev] Fatal Error in Banded Chain Alignment Traceback
      • From: Brett Bowman <bnbowman@gmail.com>
    • Re: [Seqan-dev] Fatal Error in Banded Chain Alignment Traceback
      • From: Rahn, René <rene.maerker@fu-berlin.de>
    • Re: [Seqan-dev] Fatal Error in Banded Chain Alignment Traceback
      • From: Brett Bowman <bnbowman@gmail.com>
    • Re: [Seqan-dev] Fatal Error in Banded Chain Alignment Traceback
      • From: Brett Bowman <bnbowman@gmail.com>
    • Re: [Seqan-dev] Fatal Error in Banded Chain Alignment Traceback
      • From: Brett Bowman <bnbowman@gmail.com>
    • Re: [Seqan-dev] Fatal Error in Banded Chain Alignment Traceback
      • From: Rahn, René <rene.maerker@fu-berlin.de>
    • Re: [Seqan-dev] Fatal Error in Banded Chain Alignment Traceback
      • From: Rahn, René <rene.maerker@fu-berlin.de>
  • seqan-dev - November 2014 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal