Re: [Seqan-dev] Random access of large FASTA file
- From: "Weese, David" <weese@campus.fu-berlin.de>
- To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
- Date: Thu, 21 Jul 2011 15:02:14 +0200
- Acceptlanguage: de-DE
- Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
- Subject: Re: [Seqan-dev] Random access of large FASTA file
Am 21.07.2011 um 10:58 schrieb Johannes Dröge:
Hi,
Yes, you could do it this way. This requires either: 1) a persistent StringSet which you construct once and reopen everytime 2) a StringSet which uses a temporary file to store the concatenated sequences (e.g. StringSet<String<Dna5Q, MMap<> >, Owner<ConcatDirect<> > >) that you generate from the fasta file everytime you start your application and that is deleted automatically in the StringSet destructor 2) is the easiest way, but the conversion is certainly more time consuming 1) requires to make both members (not only the concatenated sequence) of the ConcatDirect persistent strings. The second member is limits and stores the sequence breakpoints in concat. By default it is an Alloc String. In your application you can specialize: template <> struct StringSetLimits<TYourStringSet> { typedef typename Size< TYourStringSet >::Type TSize_; typedef String<TSize_, MMap<> > Type; }; to use a MMap<> String instead. Before doing anything with your StringSet simply call: TYourStringSet stringSet; open(stringSet.concat, "yourfile.concat"); // assigns a file to the mmap string open(stringSet.limits, "yourfile.limits"); // if not called, a temporary file is created = non-persistent // append your sequences (when runned for the first time) // or // use the sequences (later) // // save() is not required as the string is always in sync with the file on disk Cheers, David
Freie Universität Berlin http://www.inf.fu-berlin.de/ Institut für Informatik Phone: +49 30 838 75246 Takustraße 9 Algorithmic Bioinformatics 14195 Berlin Room 021 |
- Follow-Ups:
- Re: [Seqan-dev] Random access of large FASTA file
- From: Johannes Dröge <johdro@mpi-inf.mpg.de>
- Re: [Seqan-dev] Random access of large FASTA file
- References:
- [Seqan-dev] Random access of large FASTA file
- From: Johannes Dröge <johdro@mpi-inf.mpg.de>
- Re: [Seqan-dev] Random access of large FASTA file
- From: Johannes Dröge <johdro@mpi-inf.mpg.de>
- Re: [Seqan-dev] Random access of large FASTA file
- From: "Weese, David" <weese@campus.fu-berlin.de>
- Re: [Seqan-dev] Random access of large FASTA file
- From: Johannes Dröge <johdro@mpi-inf.mpg.de>
- [Seqan-dev] Random access of large FASTA file
-
seqan-dev - July 2011 - Archives indexes sorted by:
[ thread ] [ subject ] [ author ] [ date ] - Complete archive of the seqan-dev mailing list
- More info on this list...