Hi Lucas,
thanks for writing us.
But in the future I would like you to move such conversations to our issue tracker.
https://github.com/seqan/seqan/issues
You will access a much bigger range of people.
In general, I am not familiar with the serialisation of SA indices, but I kind of think your approach can be done more efficiently with a q-gram Index.
You can build it over the set of reads, and then query for every k-mer the occurrences in your reads.
Using a OpenAddressing Q-Gram index will keep the memory in a range of 30% more space than actually occurring q-grams.
Building it is fairly fast and thus could be facilitated to run different q-gram sizes (needs recreation of the index).
Please have a look at: http://seqan.readthedocs.io/en/master/Tutorial/DataStructures/Indices/QgramIndex.html
IHTH,
René
---
René Rahn
Ph.D. Student (de.NBI - CIBI)
--------------------------------
Institute of Computer Science
Algorithmic Bioinformatics (ABI)
--------------------------------
Freie Universität Berlin
Takustraße 9
14195 Berlin
--------------------------------
|