[Seqan-dev] Problem Reading Records

Daniel Mapleson <daniel.mapleson@tgac.ac.uk> · Tue, 13 Aug 2013 11:19:30 +0100

Hello,

I work at the genome analysis centre in Norwich, UK, and am using SeqAn 

within a project primarily to load sequences from FastQ and FastA files.

I have encountered a problem reading records when the record contains a 

sequence outside the DNA5 alphabet.  It was my understanding from the 

documentation that any invalid characters would be automatically 

converted to an 'N'.  Instead the file reading seems to fail with no way 

of continuing where it left off. Is this a known problem or am I doing 

something wrong here?

Relevant code is as follows:

        // Open file, create RecordReader and check all is well
        std::fstream in(args->fasta_arg, std::ios::in);
        seqan::RecordReader<std::fstream, seqan::SinglePass<> > reader(in);

        // Create the AutoSeqStreamFormat object and guess the file format.
        seqan::AutoSeqStreamFormat formatTag;
        if (!guessStreamFormat(reader, formatTag))
        {

            cerr << "ERROR: Could not detect file format for: " << 

args->fasta_arg << endl;

            return;
        }

    ......

        for (unsigned i = 0; (res == 0) && (i < BATCH_SIZE) && 

!atEnd(reader); ++i)

        {
            CharString id;

            Dna5String seq; // Supposed to auto convert chars not in 

{A,T,G,C,N} to N

            res = seqan::readRecord(id, seq, reader, formatTag);
            if (res == 0)
            {
                seqan::appendValue(names, id);
                seqan::appendValue(seqs, seq);

                recordIndex++;
            }
            else
            {

                cerr << endl << "ERROR: cannot finish processing all 

records in file.  Encountered an error reading file at record: " << 

recordIndex

                     << "; Error code: " << res << "; Last sequence ID: 

" << id << "; Continuing to process currently loaded records." << endl;

            }
        }

Best regards,
Dan Mapleson