Hi David Thanks for offering to look at the rest (again). I'm just trying to get the hang of C++. A few points... 1) This maybe look odd as I am calling from and returning to R using RCpp libraries. So NumericMatrix is an R/C++ type. I "think" this bit is working as intended?? 2) My best guess is the problem is coercing a std::string kmers[i] to a Dna5String Pattern?? i.e. Pattern<Dna5String, Horspool> pattern(kmers[i]); ...or maybe not??? It seems to have "worked" 3) The data I am using is ~the first 400K head of SRR031709.fastq I know it doesn't have this odd pattern (i.e. it's not some artificial constructed test file) as I checked it with FastQC... and all the other stuff I tested with this file works OK. #include <seqan/sequence.h> #include <iostream> #include <seqan/file.h> #include <seqan/seq_io.h> #include <seqan/find.h> #include <Rcpp.h> using namespace Rcpp; using namespace seqan; // [[Rcpp::export]] NumericMatrix kmerPlot3Cpp(std::string argv, std::vector< std::string > kmers) { std::fstream in(argv.c_str(), std::ios::binary | std::ios::in); seqan::RecordReader<std::fstream, seqan::SinglePass<> > reader(in); // Read file record-wise. seqan::CharString id; seqan::Dna5String seq; seqan::CharString qual; unsigned numreads = 0; // read single record to get the cycle length if (readRecord(id, seq, qual, reader, seqan::Fastq()) != 0) return 1; // Could not record from file. unsigned numcycles= length(seq); NumericMatrix kmerMatrix(kmers.size(), numcycles-4); // presume 5 mers while (!atEnd(reader)) { if (readRecord(id, seq, qual, reader, seqan::Fastq()) != 0) return 1; // Could not record from file. Finder<Dna5String> finder(seq); for(unsigned i=0; i< kmers.size(); ++i) { Pattern<Dna5String, Horspool> pattern(kmers[i]); while (find(finder, pattern)) { kmerMatrix(i,beginPosition(finder)) += 1; } } } return kmerMatrix; } ...hmmm Stephen Message: 1 Date: Thu, 16 May 2013 09:57:32 +0200 From: "Weese, David" <weese@campus.fu-berlin.de> To: SeqAn Development <seqan-dev@lists.fu-berlin.de> Subject: Re: [Seqan-dev] Finder and Pattern Message-ID: <3C493D68-64BF-429A-B5ED-A0AB1B39E3B5@fu-berlin.de> Content-Type: text/plain; charset="iso-8859-1" Hi Stephen, mmh, I don't see why it shouldn't work. Maybe you can send us an example dataset and your full source code for debugging. David -- David Weese weese@inf.fu-berlin.de<mailto:weese@inf.fu-berlin.de> Freie Universit?t Berlin http://www.inf.fu-berlin.de/ Institut f?r Informatik Phone: +49 30 838 75137 Takustra?e 9 Algorithmic Bioinformatics 14195 Berlin Room 020