Re: [Seqan-dev] reading fasta with non-DNA5 characters
If I remember correctly, the <seqan/stream.h> interface will not allow you to read non-ACGTN characters when using read(..., Fasta()) but return an error value != 0 in this case.
Bernd, what you can do right now is to load your reads into CharStrings and convert them to Dna5.
In the long term, I guess we will need more control for the users over I/O behaviour. At the moment, the assumptions are fairly conservative, e.g. do not allow non-ACTG(N) characters for Dna(5) and tailored to what you find in NGS reads and whole genome data.
Cheers,
Manuel
________________________________________
From: Weese, David [weese@campus.fu-berlin.de]
Sent: Wednesday, April 25, 2012 10:44 PM
To: SeqAn Development
Subject: Re: [Seqan-dev] reading fasta with non-DNA5 characters
Hi,
actually it should automatically convert every non-ACGT character to N (or A for Dna targets). Have you already tried reading your files into string over Dna5 alphabets?
Cheers,
David
--
David Weese weese@inf.fu-berlin.de
Freie Universität Berlin http://www.inf.fu-berlin.de/
Institut für Informatik Phone: +49 30 838 75246
Takustraße 9 Algorithmic Bioinformatics
14195 Berlin Room 021
Am 25.04.2012 um 11:08 schrieb Bernd Jagla:
> Hi,
>
> I have a couple of genome seqeunces that contain characters other than ACTGN (i.e. Y, M,...)...
>
> Is there a way to read those sequences in as well and automatically convert those non conforming letters to N?
>
> Thanks,
>
> Bernd
>
> PS:
>
> I am using:
>
> RecordReader<String<char, MMap<> >, DoublePass<Mapped> > refReader(seqMMapString);
> int read2out = read2(seqIds, faSeqs, refReader, Fasta());
>
> for reading in the data and get an INVALID_FORMAT error...
>
>
> _______________________________________________
> seqan-dev mailing list
> seqan-dev@lists.fu-berlin.de
> https://lists.fu-berlin.de/listinfo/seqan-dev
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev