FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] IndexEsa and longest prefix search (single step of greedy LZW decomposition)

<-- thread -->
<-- date -->
  • From: Sebastian Wandelt <wandelt@informatik.hu-berlin.de>
  • To: seqan-dev@lists.fu-berlin.de
  • Date: Sat, 15 Jun 2013 07:43:25 +0200
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] IndexEsa and longest prefix search (single step of greedy LZW decomposition)

Hi David,

thank you for your help. The keywords you mentioned helped me to solve my small puzzle :-)

Here is a short code snippet (in case anybody else encounters a similar problem):

    typedef seqan::TopDown<seqan::ParentLinks<> > TIterSpec;
typedef seqan::Index<seqan::String<char>, seqan::IndexEsa<TIterSpec> > TIndex;
    typedef seqan::Iter<TIndex, seqan::VSTree<TIterSpec> > TIter;

    string rawstring="TESTERTES";
    TIndex index_esa(rawstring);

    TIter myit(index_esa);

    seqan::goDown(myit, "TESA");
seqan::String< seqan::SAValue<TIndex>::Type > occs=seqan::getOccurrences(myit);
    seqan::orderOccurrences(occs);

    for(unsigned i = 0; i < seqan::length(occs); i++)
    {
        cout<<occs[i]<<"\n";
    }

Best regards
Sebastian


On 06/13/2013 02:24 PM, Weese, David wrote:
Hi Sebastian,

sounds like you want to do a simple prefix search of a pattern. That can be done with a top-down iterator, say iter, (that starts in the root node) which you move down along your not-indexed string with goDown(iter, "CCCGGGAAAT"); After that it stops in the node that still is a substring of the text, i.e. the longest prefix.
To get the occurrences or length of it use getOccurrences(iter) or repLength(iter)

Cheers,
David

--
David Weese                     weese@inf.fu-berlin.de
Freie Universität Berlin        http://www.inf.fu-berlin.de/
Institut für Informatik         Phone: +49 30 838 75137
Takustraße 9                    Algorithmic Bioinformatics
14195 Berlin                    Room 020

Am 13.06.2013 um 10:00 schrieb Sebastian Wandelt <wandelt@informatik.hu-berlin.de>:

Hi,

I would like to use IndexEsa to find the longest infix which is a prefix of another (not indexed) string.
This step is part of an algorithm to compute a greedy LZW-decomposition of a string with respect to a reference string.

Example:

Given an IndexEsa over AAAACCCCGGGGTTTT, I would like to have the longest prefix of CCCGGGAAAT, occurring as an infix in AAAACCCCGGGGTTTT.
The result should be position 5 (and length 6), since the prefix CCCGGG can be found at position 5 (0-based string access) and is of length 6.

Is there any ready code for solving this problem in Seqan? I think this should be a standard-operation for suffix trees.
I haven't found any code that can be used directly, although I think that TopDown-Iterators might be the thing to go for.

Thanks,
Sebastian

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev



<-- thread -->
<-- date -->
  • References:
    • [Seqan-dev] IndexEsa and longest prefix search (single step of greedy LZW decomposition)
      • From: Sebastian Wandelt <wandelt@informatik.hu-berlin.de>
    • Re: [Seqan-dev] IndexEsa and longest prefix search (single step of greedy LZW decomposition)
      • From: "Weese, David" <weese@campus.fu-berlin.de>
  • seqan-dev - June 2013 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal