we use the indices in some application targeted at mapping reads to human reference genomes. Therefore all indices should work on large data sets. Could you provide some more information on the problem you are running into?
Concerning the large number of files which are created I assume you are using an index build over a StringSet. The save function stores each String in the StringSet into a separate file. However, if you specify the StringSet to be a ConcatDirect StringSet, (StringSet<TString, Owner<ConcatDirect<> > >
) then all strings are concatenated internally and only tow file is stored (one with the sequence and one with the sequence length information).
At the moment there is no compression of the index files available, you would have to do it manually, but its a thought we should keep in mind.
I hope that helps!
On 13.09.2013, at 12:04, John Reid wrote:
Algorithmic Bioinformatics Working Group
Freie Universität Berlin
Takustr. 9, 14195 Berlin
Phone +49 30 838 75228, Room K25