Re: [Seqan-dev] CharString question

Raymond Wan <rwan@cuhk.edu.hk> · Thu, 09 Feb 2012 16:01:30 +0800

Hi Manuel,

Thank you for the prompt reply!

Quoting Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>:

> > Somewhat of a basic question, but I haven't been able to figure it out.
> >
> > How does one take a substring (and if possible, prefix or suffix) of a
> > CharString?
> 
> There is the Segment class with Infix, Prefix, Suffix specializations. 
> Try the infix(), prefix() and suffix() functions.

Ah!  Thanks for this!  I was looking in the documentation for the search term
"substring" and couldn't find anything.  "infix"...never thought of that; but
with prefix + suffix functions, I guess that makes sense!  Thanks!

> > And how does one find out the length of a CharString?
> Try length(). same as with all strings. ;)

Ah, I got it...I was using it as a member function.  i.e.,

foo.length ()

Never thought of using it as a function...  Thank you!

> (Summary: Eat your own dog food, simplicity, control over code, fewer 
> dependencies. Still, please tell us if you think that an external 
> library would be more suitable for our users!)

Thank you for the detailed explanation!

I can't think of a "better way"...I was just curious why the SeqAn developers
did it because it's obviously more work.  However, of all the libraries you
listed, Boost would be the only one that I would call "standard".  But, I know
many people who would disagree with this view.  Boost is the only one that is
supposedly one step away from standardization...mind you, that could be several
years away or never.  :-)

But as long as SeqAn and Boost are in its own namespaces, then it doesn't
matter.  Again, I was asking really because it seems like SeqAn did a lot of
extra work.  strings, though, are part of the std library.

> By the way, you should be able to use char* and std::string just like a 
> SeqAn String and thus also CharString (length(), infix should work).

Actually, I didn't explictly create a CharString.  I'm reading in data in SAM
format using BamAlignmentRecords and the sequences are stored in "seq" as a
CharString.  And I wanted to access and change them.

I thought the reason for not using a std::string was to do some error checking
if the input wasn't a nucleotide (for example).  i.e., throw an exception or
something.

[Not to be nit-picky, but for your sentence above, can I say that the reverse is
not true:  i.e., "able to use a SeqAn String just like a char* and std::string".
 It's only true one-way?]

> Another reason is that you have full control of your own code versus 
> other people's code: You don't have to inherit other code's complexity 
> (Boost accumulators come to mind) or bugs (Boost statistics generated 
> compiler warnings, GCC TR1 random module was buggy).

I see.  No arguments here.  Boost is fairly bloated nowadays and while some
parts are solid, others are a bit buggy.  Installing Boost gives you
everything.  If SeqAn relies on Boost, then you're at the mercy of Boost, but
you also will need to pick and choose which parts of Boost to use.

i.e., Boost has graphs, too, but I've found it to be a bit difficult to use. 
So, in that sense, SeqAn's implementation does give us an alternative.

> That said, we always welcome suggestions as to which functionality is 
> also provided by other libraries. If we provide duplicative 
> functionality that can be just as well come from another library then 
> please tell us so!

Thanks again for your excellent explanation!  :-)  I was just asking; don't mean
to suggest an alternative, though.

Ray

-------------------------------------------------
This e-mail is sent by CUHK WebMail http://webmail.cuhk.edu.hk