Subject: [Seqan-dev] fast + memory efficient hashtable of sequences
I'm trying to store a hashtable of SeqAn sequences but am having trouble getting it to be fast enough. I'm using boost::unordered_map< String< Dna > >. My comparison function looks like:
Note: PackedMer in this case actually just equals String< Dna >
class HashKeys { public: std::size_t operator()( PackedMer *a ) const
{ unsigned long hash = 5381; int c; typedef Iterator< Mer >::Type TIterator; Mer s = *a; for ( TIterator it = begin( s ); it != end( s ); ++it ) { char ch = ( char ) value( it );
hash = ((hash << 5) + hash) + ( int ) c; } return hash; } };
I simply converted my code that was using "const char *" as "packedMer" and replaced it with the SeqAn equivalent. In comparison with using const char *, the program runs at least 20x slower.....any ideas? I've narrowed down the bottle necks to these two functions...it makes sense that these might be slow, but what might be a good workaround? ( I need a fast way to compute a hashvalue and a fast way to test for equality on the keys )