[Seqan-dev] Finder and Pattern
Hi David
Thanks for offering to look at the rest (again). I'm just trying to get the hang of C++. A few points...
1) This maybe look odd as I am calling from and returning to R using RCpp libraries. So NumericMatrix is an R/C++ type. I "think" this bit is working as intended??
2) My best guess is the problem is coercing a std::string kmers[i] to a Dna5String Pattern??
i.e.
Pattern<Dna5String, Horspool> pattern(kmers[i]);
...or maybe not??? It seems to have "worked"
3) The data I am using is ~the first 400K head of SRR031709.fastq I know it doesn't have this odd pattern (i.e. it's not some artificial constructed test file) as I checked it with FastQC... and all the other stuff I tested with this file works OK.
#include <seqan/sequence.h>
#include <iostream>
#include <seqan/file.h>
#include <seqan/seq_io.h>
#include <seqan/find.h>
#include <Rcpp.h>
using namespace Rcpp;
using namespace seqan;
// [[Rcpp::export]]
NumericMatrix kmerPlot3Cpp(std::string argv, std::vector< std::string > kmers)
{
std::fstream in(argv.c_str(), std::ios::binary | std::ios::in);
seqan::RecordReader<std::fstream, seqan::SinglePass<> > reader(in);
// Read file record-wise.
seqan::CharString id;
seqan::Dna5String seq;
seqan::CharString qual;
unsigned numreads = 0;
// read single record to get the cycle length
if (readRecord(id, seq, qual, reader, seqan::Fastq()) != 0)
return 1; // Could not record from file.
unsigned numcycles= length(seq);
NumericMatrix kmerMatrix(kmers.size(), numcycles-4); // presume 5 mers
while (!atEnd(reader))
{
if (readRecord(id, seq, qual, reader, seqan::Fastq()) != 0)
return 1; // Could not record from file.
Finder<Dna5String> finder(seq);
for(unsigned i=0; i< kmers.size(); ++i)
{
Pattern<Dna5String, Horspool> pattern(kmers[i]);
while (find(finder, pattern))
{
kmerMatrix(i,beginPosition(finder)) += 1;
}
}
}
return kmerMatrix;
}
...hmmm
Stephen
Message: 1
Date: Thu, 16 May 2013 09:57:32 +0200
From: "Weese, David" <weese@campus.fu-berlin.de>
To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
Subject: Re: [Seqan-dev] Finder and Pattern
Message-ID: <3C493D68-64BF-429A-B5ED-A0AB1B39E3B5@fu-berlin.de>
Content-Type: text/plain; charset="iso-8859-1"
Hi Stephen,
mmh, I don't see why it shouldn't work. Maybe you can send us an example dataset and your full source code for debugging.
David
--
David Weese weese@inf.fu-berlin.de<mailto:weese@inf.fu-berlin.de>
Freie Universit?t Berlin http://www.inf.fu-berlin.de/
Institut f?r Informatik Phone: +49 30 838 75137
Takustra?e 9 Algorithmic Bioinformatics
14195 Berlin Room 020