Re: [Seqan-dev] Disk-based index

From: "Singer, Jochen" <Jochen.Singer@fu-berlin.de>
To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
Date: Fri, 13 Sep 2013 12:44:04 +0200
Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
Subject: Re: [Seqan-dev] Disk-based index

Hi John,

we use the indices in some application targeted at mapping reads to human reference genomes. Therefore all indices should work on large data sets. Could you provide some more information on the problem you are running into?

Concerning the large number of files which are created I assume you are using an index build over a StringSet. The save function stores each String in the StringSet into a separate file. However, if you specify the StringSet to be a ConcatDirect StringSet, (StringSet<TString, Owner<ConcatDirect<> > >

) then all strings are concatenated internally and only tow file is stored (one with the sequence and one with the sequence length information).

At the moment there is no compression of the index files available, you would have to do it manually, but its a thought we should keep in mind.

I hope that helps!

Kind regards,

Jochen

On 13.09.2013, at 12:04, John Reid wrote:

Hi Enrico,

On 28/08/13 10:46, Siragusa, Enrico wrote:

Hi John,

On Aug 28, 2013, at 10:54 AM, John Reid <j.reid@mail.cryst.bbk.ac.uk>

wrote:

Hi all,

I would like to index the mouse or human genome with an ESA. I need to do this more than once though and would like to store the ESA on disk as it takes some hours to construct. Is this feasible? Is there any way to do this in SeqAn already?

Sure. To save an index after constructing it, you can call save(index, "/path/to/index"). To load it, call open(index, "/path/to/index"). The path must be given as a C style string, so if you're using a SeqAn String, please use toCString() to convert it.

Do you have any experience using this functionality with genome sized indexes (3Gb or so)? Would you expect it to work? I seem to be running into some issues I need to debug. I was just wondering if anyone else had used it in this way. Also the save function seems to create many files in the same directory. I imagine this could be a problem for some filesystems. Might you consider changing this? Also as mentioned before the ability to save in a compressed format would be very attractive to me as well.

Thanks for all the great work in SeqAn,
John.

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

Jochen Singer
Institute of Computer Science

Algorithmic Bioinformatics Working Group

Freie Universität Berlin
Takustr. 9, 14195 Berlin
Phone +49 30 838 75228, Room K25

<-- thread -->

<-- date -->

Follow-Ups:
- Re: [Seqan-dev] Disk-based index
  - From: John Reid <j.reid@mail.cryst.bbk.ac.uk>

References:
- Re: [Seqan-dev] Disk-based index
  - From: John Reid <j.reid@mail.cryst.bbk.ac.uk>

seqan-dev - September 2013 - Archives indexes sorted by:
[ thread ] [ subject ] [ author ] [ date ]
Complete archive of the seqan-dev mailing list
More info on this list...