[Seqan-dev] fast + memory efficient hashtable of sequences

From: Isaac Ho <isaacyho@gmail.com>
To: seqan-dev@lists.fu-berlin.de
Date: Mon, 23 Jul 2012 15:54:50 -0700
Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
Subject: [Seqan-dev] fast + memory efficient hashtable of sequences

I'm trying to store a hashtable of SeqAn sequences but am having trouble getting it to be fast enough.   I'm using boost::unordered_map< String< Dna > >. My comparison function looks like:

Note: PackedMer in this case actually just equals String< Dna >

class CompareKeys
{
public:
bool operator()( PackedMer *a, PackedMer *b ) const
{
    return( *a == *b );
}
};

My hash function:

class HashKeys
{
public:
std::size_t operator()( PackedMer *a ) const
{
   unsigned long hash = 5381;
   int c;
   typedef Iterator< Mer >::Type TIterator;
    Mer s = *a;
    for ( TIterator it = begin( s ); it != end( s ); ++it )
    {
      char ch = ( char ) value( it );
      hash = ((hash << 5) + hash) + ( int ) c;
    }
   return hash;
}
};

I simply converted my code that was using "const char *" as "packedMer" and replaced it with the SeqAn equivalent.   In comparison with using const char *, the program runs at least 20x slower.....any ideas?   I've narrowed down the bottle necks to these two functions...it makes sense that these might be slow, but what might be a good workaround? ( I need a fast way to compute a hashvalue and a fast way to test for equality on the keys )

Thanks,

Isaac

<-- thread -->

<-- date -->

seqan-dev - July 2012 - Archives indexes sorted by:
[ thread ] [ subject ] [ author ] [ date ]
Complete archive of the seqan-dev mailing list
More info on this list...