FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

[Seqan-dev] Finder and Pattern

<-- thread -->
<-- date -->
  • From: "Henderson, Stephen" <s.henderson@ucl.ac.uk>
  • To: "seqan-dev@lists.fu-berlin.de" <seqan-dev@lists.fu-berlin.de>
  • Date: Thu, 16 May 2013 21:34:12 +0000
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: [Seqan-dev] Finder and Pattern

Hi David

Thanks for offering to look at the rest (again). I'm just trying to get the hang of C++. A few points...

1) This maybe look odd as I am calling from and returning to R using RCpp libraries. So NumericMatrix is an R/C++ type. I "think" this bit is working as intended?? 

2) My best guess is the problem is coercing a std::string kmers[i] to a Dna5String Pattern??
 
i.e.
Pattern<Dna5String, Horspool> pattern(kmers[i]);

...or maybe not??? It seems to have "worked"

3) The data I am using is ~the first 400K head of SRR031709.fastq I know it doesn't have this odd pattern (i.e. it's not some artificial constructed test file) as I checked it with FastQC... and all the other stuff I tested with this file works OK.


#include <seqan/sequence.h> 
#include <iostream>
#include <seqan/file.h>
#include <seqan/seq_io.h>
#include <seqan/find.h>
#include <Rcpp.h>

using namespace Rcpp;
using namespace seqan;

// [[Rcpp::export]]

NumericMatrix kmerPlot3Cpp(std::string argv, std::vector< std::string > kmers)
{    
    std::fstream in(argv.c_str(), std::ios::binary | std::ios::in);
    seqan::RecordReader<std::fstream, seqan::SinglePass<> > reader(in);    

    // Read file record-wise.
    seqan::CharString id;
    seqan::Dna5String seq;
    seqan::CharString qual;
    unsigned numreads = 0;
    
    // read single record to get the cycle length
    if (readRecord(id, seq, qual, reader, seqan::Fastq()) != 0)
            return 1;  // Could not record from file.
    unsigned numcycles= length(seq);
    
    NumericMatrix kmerMatrix(kmers.size(), numcycles-4); // presume 5 mers    
    
    
    while (!atEnd(reader))
    {
        if (readRecord(id, seq, qual, reader, seqan::Fastq()) != 0)
            return 1;  // Could not record from file.

        Finder<Dna5String> finder(seq);      
        for(unsigned i=0;  i< kmers.size(); ++i)
        {
          Pattern<Dna5String, Horspool> pattern(kmers[i]);
          while (find(finder, pattern))
          {
            kmerMatrix(i,beginPosition(finder)) += 1;
          }
        }
    
    }  
  
    return kmerMatrix;
}



...hmmm

Stephen

Message: 1
Date: Thu, 16 May 2013 09:57:32 +0200
From: "Weese, David" <weese@campus.fu-berlin.de>
To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
Subject: Re: [Seqan-dev] Finder and Pattern
Message-ID: <3C493D68-64BF-429A-B5ED-A0AB1B39E3B5@fu-berlin.de>
Content-Type: text/plain; charset="iso-8859-1"

Hi Stephen,

mmh, I don't see why it shouldn't work. Maybe you can send us an example dataset and your full source code for debugging.

David

--
David Weese weese@inf.fu-berlin.de<mailto:weese@inf.fu-berlin.de>
Freie Universit?t Berlin http://www.inf.fu-berlin.de/
Institut f?r Informatik Phone: +49 30 838 75137
Takustra?e 9 Algorithmic Bioinformatics
14195 Berlin Room 020



<-- thread -->
<-- date -->
  • seqan-dev - May 2013 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal