MyersBitVector assumes edit distance, so the result of globalAlignment() is the edid distance. I do not know whether MyersBitVector is really faster than the new alignment code, it will certainly depend on the length of the sequences. Myers is certainly fast if you have less than 64 characters in each sequence. Otherwise, the theoretic adavantages might be eaten up by emulating longer words. getAlignmentScore() does not return the alignment length. If you can band your alignment, using the Needleman-Wunsch will be quite fast. What exactly is your application? Is the alignment computation on a critical path/in a computational core/kernel? I would say that one would have to try the options (Myers-Hirschberg after fixing, MyersBitVector, and Needleman-Wunsch) with a real-world data set and see which is fastest. Note that NW is the only implementation that you can band at the moment. We would be interested in your benchmark results. Cheers, Manuel ________________________________________ From: Johannes Dröge [johdro@mpi-inf.mpg.de] Sent: Wednesday, August 21, 2013 4:19 PM To: seqan-dev@lists.fu-berlin.de Subject: Re: [Seqan-dev] Memory allocation error with globalAlignmentScore() and globalAlignment() Hi Manuel, I replaced seqan::MyersHirschberg() with seqan::MyersBitVector() in the below code and it seems to work just fine. For this reason I believe it is a bug in the alignment code, a regression certainly. Is the score the number of gaps and mismatches in the alignment (edit distance), when used like this? I just need this number but also it would be good to have the number of matches. When I have the length of the alignment, I can calculate this quantity by using (alignment_length+score) because score is given as a negative integer. How can I do this with globalAlignmentScore()? Gruß Johannes Am 21.08.2013 13:48, schrieb Holtgrewe, Manuel: > Hi Johannes, > > we will investigate this problem. > > Meanwhile, you could use the globalAlignment() function without a tag. We do not have current benchmarks for MyerHirschberg vs. our new NeedlemanWunsch implementation but the latter is definitely more used and then less error prone. > > HTH > Manuel > ________________________________________ > From: Johannes Dröge [johdro@mpi-inf.mpg.de] > Sent: Wednesday, August 21, 2013 12:54 PM > To: SeqAn Development > Subject: [Seqan-dev] Memory allocation error with globalAlignmentScore() and globalAlignment() > > Hello Seqan developers, > > I just ported my software, which was using globalAlignment() extensively without problems, from a Seqan version from 2011 to the recent headers. I had to change the syntax related to the scoring until it compiled well. Anyway, now I got a segfault when running globalAlignment(). I then changed the code to use globalAlignmentsScore(), because I don't actually need the explicit alignment. The error is still persists and seems to be related to the data/types or similar. > > 1) Here is the original code fragment: > > const seqan::Dna5String qseq = ... > seqan::Dna5String rseq = ... > seqan::Align< seqan::Dna5String > aln; > seqan::resize( seqan::rows( aln ), 2); > seqan::assignSource( seqan::row( aln, 0 ), qseq ); > seqan::assignSource( seqan::row( aln, 1 ), rseq ); > const int score = -seqan::globalAlignment( aln, seqan::SimpleScore(), seqan::MyersHirschberg() ); > > 2) Here is the intermediate code: > > const seqan::Dna5String qseq = ... > seqan::Dna5String rseq = ... > seqan::Align< seqan::Dna5String > aln; > seqan::resize( seqan::rows( aln ), 2); > seqan::assignSource( seqan::row( aln, 0 ), qseq ); > seqan::assignSource( seqan::row( aln, 1 ), rseq ); > const int score = -seqan::globalAlignment( aln, seqan::MyersHirschberg() ); > > 3) Here is the current code: > > const seqan::Dna5String qseq = ... > seqan::Dna5String rseq = ... > const int score = -seqan::globalAlignmentScore( qseq, rseq, seqan::MyersHirschberg() ); > > ---------- > > This is what I get from (2) and (3): > > aligning two sequences globally: > seq1: TAGCACTCAGGGAGAATGAGTGCTAAAACATAGAATGAGAAATGGAGGCGAGAGTATGGAGCTGACCAATCGAAAAAAGCGAATCCTGCGGGCCATTGTTGAGATCTATATCTCCACCGCGGAACCGGTAGGTTCCAAGGCCGTGGCAGAGCAGGCCGGACTGGACATCTCCACCGCCACCATACGAAATGAGATGGCCGACCTCACCGAACTGGGCTATCTGGAGCAGCCCCACACCTCGGCCGGACGGATTCCGTCCCCCATGGGCTACCGGCTCTACGTCAACGAGCTCATGGGTGAGCACCAGCTGACCATGCAGGAGACCCAGCGCATCAACGACGCGCTGAACCTGAAAATGGAGGAGCTGGACCGGGTCATCGACCGGGCGGGCAAGGTGCTCTCCCAGATCAGCGACTACCCCGTCTTTACCATGGCCCAGCCCAAGCAGCGGGTGACGGTAAAGCGGTACGACCTGCTGATGGTAGAGGAAAACGCCTTTATCGCTGTGGTGATGACCGACAACTCGGTGGTGCGCAACAAGCTTATCCACCTGTCCGATGAGCTTTCCGACACCCAGCTGCAGCTGCTGTCCACCGTTTTGAACAGCTCCTTTGTAGGTCTGACCGTGGAGGAAATGGAACAG > seq2: TATATTAGCACTCGGAAAGCGAGAGTGCCAACAGTCCGAAGCGGAGAGTGCTAACATGGCAATCAGTGAGAGGAAAAAGAAAATACTGGCGGCGGTGGTGGATGAATACATCCGCACGGCAGAGCCGGTAGGCAGCAAGGCCATCGCCCAGAGCGGCGGGCTGAACTGCTCCTCGGCCACCATCCGCAACGAGCTGGCGGAGCTGGTCGCCATGGGCTATCTGGAGCAGCCCCACACCTCTGCGGGCCGGGTGCCCACCCCCATGGGCTACCGCATGTACGTCAACGAGCTCATGGAGAAGCAGAAGATGAGCCTGGAGGAGACGGAGGAGATGAACCGCCGCCTGAACCAGAAGCTCCAGGAGCTGGACGACACCATCCGGGACGTGAGCAAGCTGGCTTCCCAGCTGACGAATTATCCCGCCCTGGCCCTGACGGCCCAGAGCTCCGTCACGGTAAAGCGCTTCGACCTGATCTATGTGGACGCCAACAACTTCATCATCGTGCTGATGCTGTCCAACAACAGCGTAAAGAGCAAGCTGGTGCATCTGCCGGTGTCCGTGGACCAGGACATGATCAAGCGCCTTTCCACCCTCTTCAACGCCAGCTTTACCGGCGTGGAGGATCAGCAGATCACGCCG > prog: malloc.c:4631: _int_malloc: Assertion `(unsigned long)(size) >= (unsigned long)(nb)' failed. > > ---------- > > GDB backtrace: > > 0x00007ffff6a7d1b5 in raise () from /lib/libc.so.6 > (gdb) bt > #0 0x00007ffff6a7d1b5 in raise () from /lib/libc.so.6 > #1 0x00007ffff6a7ffc0 in abort () from /lib/libc.so.6 > #2 0x00007ffff6ab35bb in ?? () from /lib/libc.so.6 > #3 0x00007ffff6abce16 in ?? () from /lib/libc.so.6 > #4 0x00000000004f13ce in seqan::deallocate<seqan::String<unsigned int, seqan::Alloc<void> >, unsigned int, unsigned long, seqan::AllocateStorage_> (data=0x832831460) > at seqan/basic/allocator_interface.h:329 > #5 0x00000000004ecc09 in seqan::_deallocateStorage<unsigned int, void, unsigned int, unsigned long> (me=..., ptr=0x832831460, capacity=32) > at seqan/sequence/string_alloc.h:402 > #6 0x00000000004e76d6 in ~String (this=0x7fffede12e80, __in_chrg=<value optimized out>) > at seqan/sequence/string_alloc.h:177 > #7 0x00000000004e1ebd in seqan::_globalAlignment<seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5_>, seqan::Alloc<void> >, seqan::Tag<seqan::ArrayGaps_>, seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5_>, seqan::Alloc<void> >, seqan::Tag<seqan::ArrayGaps_> > (gapsH=..., gapsV=..., algorithmTag=...) > at seqan/align/global_alignment_myers_hirschberg_impl.h:745 > #8 0x00000000004df9ea in seqan::_globalAlignment<seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5_>, seqan::Alloc<void> >, seqan::Tag<seqan::ArrayGaps_>, seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5_>, seqan::Alloc<void> >, seqan::Tag<seqan::ArrayGaps_> > (gapsH=..., gapsV=..., algorithmTag=...) > at seqan/align/global_alignment_myers_hirschberg_impl.h:183 > #9 0x00000000004db18c in seqan::globalAlignment<seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5_>, seqan::Alloc<void> >, seqan::Tag<seqan::ArrayGaps_> > (align=..., > algorithmTag=...) at seqan/align/global_alignment_specialized.h:93 > #... > > Could this be related to the differing sequence lengths or am I doing something else wrong? > > Gruß Johannes > > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev > > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev > _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de https://lists.fu-berlin.de/listinfo/seqan-dev