On Jul 5, 2012, at 1:44 PM, John Reid wrote:
Ok the values in the code snippet should work for any genome, i.e. they work fine for hg18/hg19.
Concerning index construction: if I am right, the Esa for StringSets should be built on external memory by default.
Concerning index querying: if you don't have 40Gb of memory, then overload fibres to be memory mapped (as in the commented line in the code snippet). In this way only a small part of the index will be kept in memory.
Alternatively, if you only need the top of the tree along with some sparse subtrees, you could try using a lazy suffix tree (Wotd index in SeqAn) instead of the Esa.
The Wotd provides the same iterators interface as the Esa. Moreover, you can overload the Wotd FibreSA metafunction exactly in the same way.
Or if you are very limited by memory you might want to try the FM-Index (it is not yet in the core library).
The constructed FM-Index would fit into 3 Gb of memory.