FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] Saving an index for StringSet to disk: lots of files

<-- thread -->
<-- date
  • From: Hannes Hauswedell <hannes.hauswedell@fu-berlin.de>
  • To: Lucas van Dijk <info@lucasvandijk.nl>
  • Date: Fri, 23 Mar 2018 11:58:54 +0100
  • Cc: seqan-dev@lists.fu-berlin.de
  • Organization: MPI MolGen / FU-Berlin
  • Subject: Re: [Seqan-dev] Saving an index for StringSet to disk: lots of files

Am Freitag 16 März 2018, 20:40:00 schrieb Lucas van Dijk:
> Hi all,
> 
> I'm trying to build a simple (short) read filter: given a list of k-mers,
> keep only reads that contains at least one of the given k-mers. This will
> be used to analyse the behaviour of a tool we're working on. It doesn't
> need to be super memory efficient, it'll mostly be a debugging tool.
> 
> My strategy was to use SeqAn and built a suffix array index for the
> StringSet of reads, and quickly enumerate which reads contain a given
> k-mer. I got a simple prototype working that reads the whole FASTQ file in
> memory as StringSet, and then build an Index<StringSet<Dna5String>,
> IndexSa<>>.


You should use the StringSet<Dna5String, Owncer<ConcatDirect<>>> 
specialisation here, it will result in only two files being written for the 
StringSet (the concatenation of all strings plus a vector of delimiters).

Hope that helps,
Hannes
-- 
Hannes Hauswedell

Scientific staff & PhD candidate
Freie Universität Berlin / Max Planck Institute for Molecular Genetics

address     Institut für Informatik
            Takustraße 9
            Room 019
            14195 Berlin
telephone   +49 (0)30 838-75241
fax         +49 (0)30 838-75218
e-mail      hannes.hauswedell@fu-berlin.de



<-- thread -->
<-- date
  • References:
    • [Seqan-dev] Saving an index for StringSet to disk: lots of files
      • From: Lucas van Dijk <info@lucasvandijk.nl>
  • seqan-dev - March 2018 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal