FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] Random access of large FASTA file

<-- thread -->
<-- date -->
  • From: "Weese, David" <weese@campus.fu-berlin.de>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Fri, 8 Jul 2011 19:35:20 +0200
  • Acceptlanguage: de-DE
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] Random access of large FASTA file


Am 08.07.2011 um 16:05 schrieb Johannes Dröge:

> Sorry, I still don't get it.
> How can [ MutiFastaFile  ==> Dna5String ==>  StringSet<String<Dna5Q, MMap<> >, Owner<ConcatDirect<> > > ] work, if it copies the value of the sequence?
> Doesn't assignSeq() copy the value into the Dna5String seq?

assignSeq *extracts* the sequence information from a block that may contain a header, a sequence interspersed by newlines, quality values, etc.
If want to get sequence substrings of an unprocessed Fasta file, they may contain whitespace.

> 
> What happens when I use appendValue to add seq to the StringSet, where does it actually reside (it should still be in the MultiFasta file).

As assignSeq(seq, ...) extracts the sequence character-by-character there is no association between seq and the Fasta file.

> 
> I need to access the MultiFastaFile (on the hard disk) as a regular StringSet to read its contents on demand, not copy its sequences into a new memory-mapped file.

Then you need to keep the split MultiSeqFile and extract the sequences on demand with assignSeq.
If you access the sequences very often I would recommend to fill a StringSet<String<Dna5Q, MMap<> >, Owner<ConcatDirect<> > >  (see my last mail) which also resides on your hard disk but can be accessed without assignSeq.

> 
> Johannes
> 
> 
> Am Donnerstag, 7. Juli 2011 20:28:00 schrieb Weese, David:
>> Hi,
>> 
>> follow the howto on http://trac.mi.fu-berlin.de/seqan/wiki/HowTo/EfficientImportOfMillionsOfSequences and simply change:
>> 
>> StringSet<String<Dna5Q> > seqs;
>> 
>> into:
>> 
>> StringSet<String<Dna5Q, MMap<> >, Owner<ConcatDirect<> > > seqs;
>> 
>> That should do what you want.
>> 
>> Regards,
>> David
>> 
>> 




<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] Random access of large FASTA file
      • From: Johannes Dröge <johdro@mpi-inf.mpg.de>
  • References:
    • [Seqan-dev] Random access of large FASTA file
      • From: Johannes Dröge <johdro@mpi-inf.mpg.de>
    • Re: [Seqan-dev] Random access of large FASTA file
      • From: Johannes Dröge <johdro@mpi-inf.mpg.de>
    • Re: [Seqan-dev] Random access of large FASTA file
      • From: "Weese, David" <weese@campus.fu-berlin.de>
    • Re: [Seqan-dev] Random access of large FASTA file
      • From: Johannes Dröge <johdro@mpi-inf.mpg.de>
  • seqan-dev - July 2011 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal