FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] reading fasta with non-DNA5 characters

<-- thread -->
<-- date -->
  • From: Hannes Hauswedell <hauswedell@mi.fu-berlin.de>
  • To: seqan-dev@lists.fu-berlin.de
  • Date: Wed, 16 May 2012 16:37:23 +0800
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] reading fasta with non-DNA5 characters

Hi,

it all depends on what type of String you give to the recordReader. It is designed to check for the SequenceType's alphabet and use that as a criterium, so the following should fail on non-ACGTN:


CharString seqIds;
Dna5String faSeqs;

RecordReader<String<char, MMap<> >, DoublePass<Mapped> > refReader(seqMMapString);
int read2out = read2(seqIds, faSeqs, refReader, Fasta());


but the following should accept any alphabetical[1]:


CharString seqIds;
CharString faSeqs;

RecordReader<String<char, MMap<> >, DoublePass<Mapped> > refReader(seqMMapString);
int read2out = read2(seqIds, faSeqs, refReader, Fasta());


This won't however convert anything. That could be done later, maybe using ModifiedString<> to avoid copying.

Regards,
Hannes


[1] I just checked that because I want to read a sequence including gaps, which fails (because '-' is non alphabetical --working on patch--). However the mentioned characters Y and M should not be a problem.


On 04/26/2012 06:29 AM, Holtgrewe, Manuel wrote:
If I remember correctly, the<seqan/stream.h>  interface will not allow you to read non-ACGTN characters when using read(..., Fasta()) but return an error value != 0 in this case.

Bernd, what you can do right now is to load your reads into CharStrings and convert them to Dna5.

In the long term, I guess we will need more control for the users over I/O behaviour. At the moment, the assumptions are fairly conservative, e.g. do not allow non-ACTG(N) characters for Dna(5) and tailored to what you find in NGS reads and whole genome data.

Cheers,
Manuel
________________________________________
From: Weese, David [weese@campus.fu-berlin.de]
Sent: Wednesday, April 25, 2012 10:44 PM
To: SeqAn Development
Subject: Re: [Seqan-dev] reading fasta with non-DNA5 characters

Hi,

actually it should automatically convert every non-ACGT character to N (or A for Dna targets). Have you already tried reading your files into string over Dna5 alphabets?

Cheers,
David
--
David Weese                             weese@inf.fu-berlin.de
Freie Universität Berlin                http://www.inf.fu-berlin.de/
Institut für Informatik                 Phone: +49 30 838 75246
Takustraße 9                                    Algorithmic Bioinformatics
14195 Berlin                                    Room 021

Am 25.04.2012 um 11:08 schrieb Bernd Jagla:

Hi,

I have a couple of genome seqeunces that contain characters other than ACTGN (i.e. Y, M,...)...

Is there a way to read those sequences in as well and automatically convert those non conforming letters to N?

Thanks,

Bernd

PS:

I am using:

        RecordReader<String<char, MMap<>  >, DoublePass<Mapped>  >  refReader(seqMMapString);
        int read2out = read2(seqIds, faSeqs, refReader, Fasta());

for reading in the data and get an INVALID_FORMAT error...


_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev




<-- thread -->
<-- date -->
  • seqan-dev - May 2012 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal