Am 08.07.2011 um 16:05 schrieb Johannes Dröge: > Sorry, I still don't get it. > How can [ MutiFastaFile ==> Dna5String ==> StringSet<String<Dna5Q, MMap<> >, Owner<ConcatDirect<> > > ] work, if it copies the value of the sequence? > Doesn't assignSeq() copy the value into the Dna5String seq? assignSeq *extracts* the sequence information from a block that may contain a header, a sequence interspersed by newlines, quality values, etc. If want to get sequence substrings of an unprocessed Fasta file, they may contain whitespace. > > What happens when I use appendValue to add seq to the StringSet, where does it actually reside (it should still be in the MultiFasta file). As assignSeq(seq, ...) extracts the sequence character-by-character there is no association between seq and the Fasta file. > > I need to access the MultiFastaFile (on the hard disk) as a regular StringSet to read its contents on demand, not copy its sequences into a new memory-mapped file. Then you need to keep the split MultiSeqFile and extract the sequences on demand with assignSeq. If you access the sequences very often I would recommend to fill a StringSet<String<Dna5Q, MMap<> >, Owner<ConcatDirect<> > > (see my last mail) which also resides on your hard disk but can be accessed without assignSeq. > > Johannes > > > Am Donnerstag, 7. Juli 2011 20:28:00 schrieb Weese, David: >> Hi, >> >> follow the howto on http://trac.mi.fu-berlin.de/seqan/wiki/HowTo/EfficientImportOfMillionsOfSequences and simply change: >> >> StringSet<String<Dna5Q> > seqs; >> >> into: >> >> StringSet<String<Dna5Q, MMap<> >, Owner<ConcatDirect<> > > seqs; >> >> That should do what you want. >> >> Regards, >> David >> >>