[Seqan-dev] fast + memory efficient hashtable of sequences
- From: Isaac Ho <isaacyho@gmail.com>
- To: seqan-dev@lists.fu-berlin.de
- Date: Mon, 23 Jul 2012 15:54:50 -0700
- Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
- Subject: [Seqan-dev] fast + memory efficient hashtable of sequences
I'm trying to store a hashtable of SeqAn sequences but am having trouble getting it to be fast enough. I'm using boost::unordered_map< String< Dna > >. My comparison function looks like:
Note: PackedMer in this case actually just equals String< Dna >
class CompareKeys
{
public:
bool operator()( PackedMer *a, PackedMer *b ) const
{
return( *a == *b );
}
};
My hash function:
class HashKeys
{
public:
std::size_t operator()( PackedMer *a ) const
{
unsigned long hash = 5381;
int c;
typedef Iterator< Mer >::Type TIterator;
Mer s = *a;
for ( TIterator it = begin( s ); it != end( s ); ++it )
{
char ch = ( char ) value( it );
hash = ((hash << 5) + hash) + ( int ) c;
}
return hash;
}
};
I simply converted my code that was using "const char *" as "packedMer" and replaced it with the SeqAn equivalent. In comparison with using const char *, the program runs at least 20x slower.....any ideas? I've narrowed down the bottle necks to these two functions...it makes sense that these might be slow, but what might be a good workaround? ( I need a fast way to compute a hashvalue and a fast way to test for equality on the keys )
Thanks,
Isaac
-
seqan-dev - July 2012 - Archives indexes sorted by:
[ thread ] [ subject ] [ author ] [ date ] - Complete archive of the seqan-dev mailing list
- More info on this list...