FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] Getting a sequence as a char *

<-- thread -->
<-- date -->
  • From: Nick Mapsy <nmapsy@gmail.com>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Fri, 8 Sep 2017 15:55:48 -0700
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] Getting a sequence as a char *

Hi Hannes,

Thank you so much for your reply! I'm really lost here so I appreciate it a lot.

Unfortunately I do need a char **, since I'm passing the data back to C code (actually, to Python ctypes).

Thank you for the tip on going from the row to a CharString. Is there a function which can take my TRow and return a CharString? I couldn't find anything in the documentation. Instead, I found out that my TRow can act as a Gaps, which allowed me to use all the Gaps functions. That allowed me to reconstruct the alignment using isGap() and the unaligned sequences.

Thank you,
Nick

P.S. My solution, in case anyone else runs into the same trouble:
(I'm using SeqAn 2.2.0 now)

#include <iostream>
#include <stdlib.h>
#include <seqan/align.h>
#include <seqan/score.h>
#include <seqan/sequence.h>
#include <seqan/graph_msa.h>

using namespace seqan;

char **align(int nseq, char *seqs[]) {

    Align<String<Dna5>> align;
    resize(rows(align), nseq);
    for (int i = 0; i < nseq; i++) {
        assignSource(row(align, i), seqs[i]);
    }

    globalMsaAlignment(align, EditDistanceScore());

    // Convert the Align rows to char *'s and store back in seqs.
    typedef typename Row<Align<String<Dna5>>>::Type TRow;
    for (int i = 0; i < nseq; i++) {
        // Each row is type TRow, but also functions as a Gaps. This is why isGap accepts it.
        TRow arow = row(align, i);
        int len = (int)length(arow);
        char *new_seq = (char *)malloc(sizeof(char) * len+1);
        int offset = 0;
        for (int j = 0; j < len; j++) {
            if (isGap(arow, j)) {
                new_seq[j] = '-';
                offset--;
            } else {
                new_seq[j] = seqs[i][j+offset];
            }
        }
        new_seq[len] = '\0';
        seqs[i] = new_seq;
    }

    return seqs;
}

int main(int argc, char *argv[]) {
    for (int i = 1; i < argc; i++) {
        argv[i-1] = argv[i];
    }
    char **aligned_seqs = align(argc-1, argv);
    for (int i = 0; i < argc-1; i++) {
        std::cout << aligned_seqs[i] << std::endl;
    }
    return 0;
}



On Fri, Sep 8, 2017 at 8:18 AM, Hannes Hauswedell <hannes.hauswedell@fu-berlin.de> wrote:
Hi Nick,

Am Mittwoch, 6. September 2017, 03:38:54 schrieb Nick Mapsy:
> Hi, I'm just getting started with SeqAn (and C++), so I'm sure I'm missing
> something simple here.
>
> I've got a multiple sequence alignment working and producing an
> Align<String<Dna5> > object. Now all I need is to return the aligned
> sequences (with gaps) as C strings (char *) from the function.

Are you sure you want to be passing around these char** ? This is C++ after
all and we have references :D

> It seems like a simple thing, but after hours reading through the
> documentation of all the types and functions (and yes, Language Entity
> Types), I can't find the path from Align to char *.
>
> I found toCString(), but it takes a String, and I don't know how to get
> (gapped) Strings out of an Align.
>
> Thank you for any help, and hopefully I'm able to make use of this great
> library!

You can create a CharString from the alignment row and then call toCString()
on the CharString. But, like I said, I would really recommend working with
Strings and StringSets instead of pointers and [].

> P.S. Here's what I've written so far:
> (I'm using SeqAn 1.4.1 on Ubuntu 16.04.)

Please update to SeqAn2 as SeqAn1 has been deprecated for a while now. The
ubuntu package is called libseqan2-dev. It is available since Ubuntu 17.04,
but can also be installed manually:
http://seqan.readthedocs.io/en/master/Infrastructure/Use/Install.html#library-package

Best regards,
Hannes
--
Hannes Hauswedell

Scientific staff & PhD candidate
Freie Universität Berlin / Max Planck Institute for Molecular Genetics

address     Institut für Informatik
            Takustraße 9
            Room 019
            14195 Berlin
telephone   +49 (0)30 838-75241
fax         +49 (0)30 838-75218
e-mail      hannes.hauswedell@[molgen.mpg.de|fu-berlin.de]

_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev

<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] Getting a sequence as a char *
      • From: Rahn, René <Rene.Rahn@fu-berlin.de>
  • References:
    • [Seqan-dev] Getting a sequence as a char *
      • From: Nick Mapsy <nmapsy@gmail.com>
    • Re: [Seqan-dev] Getting a sequence as a char *
      • From: Hannes Hauswedell <hannes.hauswedell@fu-berlin.de>
  • seqan-dev - September 2017 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal