Re: [Seqan-dev] Saving an index for StringSet to disk: lots of files
- From: Rahn, René <Rene.Rahn@fu-berlin.de>
- To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
- Date: Wed, 21 Mar 2018 10:26:13 +0100
- Subject: Re: [Seqan-dev] Saving an index for StringSet to disk: lots of files
Hi Lucas,
thanks for writing us.
But in the future I would like you to move such conversations to our issue tracker.
https://github.com/seqan/seqan/issues
You will access a much bigger range of people.
In general, I am not familiar with the serialisation of SA indices, but I kind of think your approach can be done more efficiently with a q-gram Index.
You can build it over the set of reads, and then query for every k-mer the occurrences in your reads.
Using a OpenAddressing Q-Gram index will keep the memory in a range of 30% more space than actually occurring q-grams.
Building it is fairly fast and thus could be facilitated to run different q-gram sizes (needs recreation of the index).
Please have a look at: http://seqan.readthedocs.io/en/master/Tutorial/DataStructures/Indices/QgramIndex.html
IHTH,
René
---
René Rahn
Ph.D. Student (de.NBI - CIBI)
--------------------------------
Institute of Computer Science
Algorithmic Bioinformatics (ABI)
--------------------------------
Freie Universität Berlin
Takustraße 9
14195 Berlin
--------------------------------
|
- References:
- [Seqan-dev] Saving an index for StringSet to disk: lots of files
- From: Lucas van Dijk <info@lucasvandijk.nl>
- [Seqan-dev] Saving an index for StringSet to disk: lots of files
-
seqan-dev - March 2018 - Archives indexes sorted by:
[ thread ] [ subject ] [ author ] [ date ] - Complete archive of the seqan-dev mailing list
- More info on this list...