FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] Trouble Hashing Kmers into Qgram Index

<-- thread -->
<-- date -->
  • From: Brett Bowman <bnbowman@gmail.com>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Thu, 20 Nov 2014 17:05:03 -0800
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] Trouble Hashing Kmers into Qgram Index

Thanks Rene, those solutions worked for me.

I have two follow-up Qgram Index questions:

1) Is there any way to iterate over all of the Kmers stored in a Qgram index?  You have sample code for the other index types here (http://seqan.readthedocs.org/en/master/Tutorial/IndexIterators.html), but it doesn't seem to work for me for the QgramIndx:
"""
 typedef Iterator<TIndex, TopDown<ParentLinks<>>> TIter;

 TIter::Type it(refSetIndex);
 do {
     // Print theletters from the root to the current node
     std::cout << representative(it) << std::endl;

     if (!goDown(it) && !goRight(it))
         while (goUp(it) && !goRight(it));
 } while (!isRoot(it));
'"""

2) Is there any way to mask or remove certain Kmers from an Index once it's been created?  I.e., to hide or delete common, low-complexity Kmers like homopolymers

-Brett




On Wed, Nov 19, 2014 at 4:25 AM, Rahn, René <rene.maerker@fu-berlin.de> wrote:
Hey,

the problem lies in the hash(indexShape(index), kmer) function.
The hash function only takes an iterator or pointer as second argument. 
So in your case the simple solution would be to call the function hash(indexShape(index), it), while using an iterator to iterate over the text instead of using the unsigned integer. This little behaviour does not seem obvious, as in the tutorial you pass a string literal. But in fact a string literal is just a const char *, so it works.

Here you see an example code to work with a) using the finder interface and b) directly operating on the index.
IHTH.

cheers,

René 

    typedef Index<DnaString, IndexQGram<UngappedShape<12> > > TQGramIndex;
    typedef Finder<TQGramIndex> TFinder;
    typedef Iterator<DnaString, Standard>::Type TIterator;
    typedef Fibre<TQGramIndex, QGramShape>::Type TShape;

    TQGramIndex index(ref);
    indexRequire(index, QGramSADir());  // On-demand index creation.

    // a) Using the finder interface.
    TFinder finder(index);

    for (TIterator it = begin(query, Standard()); it != end(query, Standard()) - 12; ++it)
    {
        std::cout << "Occ at: ";
        while(find(finder, infix(query, it, it+12)))
            std::cout << position(finder) << " ";
        std::cout << std::endl;
        clear(finder);  // Clear finder for next search.
    }

    // b) Using the index interface.
    TShape & shape = indexShape(index);
    hashInit(shape, begin(query, Standard()));
    for (TIterator it = begin(query, Standard()); it != end(query, Standard()) - 12; ++it)
    {
        std::cout << "Occ at: ";
        hashNext(shape, it);
        for (unsigned i = 0; i < length(getOccurrences(index, shape)); ++i)
            std::cout << getOccurrences(index, shape)[i] << " ";
        std::cout << std::endl;

    }

On 19 Nov 2014, at 07:34, Brett Bowman <bnbowman@gmail.com> wrote:

Sorry to be a bother, but I decided to go back to the docs and start fresh and I'm still having issues.  

We're using Qgrams, so I trying to write a simple loop to search for all 12mers from a query sequence "seq" in the index formed from a StringSet of references "refSeq", building on the example code in the Q-Gram Index tutorial:
"""
  TFinder qgramFinder(index);
 for (size_t i = 0; i < length(seq)-12; i++)
 {
     TInfix::Type kmer = infix(seq, i, i+12);
     std::cout << kmer << std::endl;

     seqan::hash(indexShape(index), kmer);
     for (unsigned i = 0; i < length(getOccurrences(index, indexShape(index))); ++i)
     {
         std::cout << getOccurrences(index, indexShape(index))[i] << std::endl;
     }
     std::cout << std::endl;
 }
"""

but I'm getting a fairly obtuse error message I don't quite understand, and doesn't point me back to any specific lines in my code.

"""
/usr/include/seqan/index/shape_base.h:535:55: error: indirection requires pointer operand ('seqan::Segment<const seqan::String<seqan::SimpleType<unsigned
      char, seqan::Dna5_>, seqan::Alloc<void> >, seqan::InfixSegment>' invalid)
                        hash * ValueSize<TValue>::VALUE + ordValue((TValue)*it),
                                                                           ^~~
/usr/include/seqan/index/shape_base.h:548:22: note: in instantiation of function template specialization 'seqan::_hashFixedShape<unsigned long,
      seqan::SimpleType<unsigned char, seqan::Dna5_>, seqan::Segment<const seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5_>, seqan::Alloc<void> >,
      seqan::InfixSegment>, 12>' requested here
                return me.hValue = _hashFixedShape(me.hValue, it, TValue(), UngappedShape<q>());
                                   ^
/home/bbowman/git/SRSLI/src/C++/SparseAlignment.hpp:102:16: note: in instantiation of function template specialization 'seqan::hash<seqan::SimpleType<unsigned
      char, seqan::Dna5_>, 12, seqan::Segment<const seqan::String<seqan::SimpleType<unsigned char, seqan::Dna5_>, seqan::Alloc<void> >, seqan::InfixSegment>
      >' requested here
        seqan::hash(indexShape(index), kmer);
               ^
/home/bbowman/git/SRSLI/src/C++/main.cpp:86:9: note: in instantiation of function template specialization 'FindSeeds2<FindSeedsConfig<12,
      seqan::UngappedShape<12>, seqan::IndexQGram<seqan::UngappedShape<12>, seqan::Tag<seqan::Default_> > > >' requested here
        FindSeeds2(querySeedHits, refSetIndex, idxAndRecord.second.Seq);
"""

Suggestions?

Sincerely,
Brett
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

---

René Rahn
Ph.D. Student
--------------------------------
Tel:  (+49) 30 838 75277
Mail: rene.rahn@fu-berlin.de
--------------------------------
Institute of Computer Science
Algorithmic Bioinformatics (ABI)
--------------------------------
Freie Universität Berlin
Takustraße 9
14195 Berlin
--------------------------------


_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev


<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] Trouble Hashing Kmers into Qgram Index
      • From: "Siragusa, Enrico" <Enrico.Siragusa@fu-berlin.de>
  • References:
    • [Seqan-dev] Trouble Hashing Kmers into Qgram Index
      • From: Brett Bowman <bnbowman@gmail.com>
    • Re: [Seqan-dev] Trouble Hashing Kmers into Qgram Index
      • From: Rahn, René <rene.maerker@fu-berlin.de>
  • seqan-dev - November 2014 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal