Re: [Seqan-dev] CharString question

Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de> · Thu, 09 Feb 2012 08:18:01 +0100

Hi Ray,

On 02/09/2012 06:26 AM, Raymond Wan wrote:

Hi all,

Somewhat of a basic question, but I haven't been able to figure it out.

How does one take a substring (and if possible, prefix or suffix) of a
CharString?

There is the Segment class with Infix, Prefix, Suffix specializations. 

Try the infix(), prefix() and suffix() functions.

And how does one find out the length of a CharString?

Try length(). same as with all strings. ;)

And can someone tell me the reasoning for creating CharString type in SeqAn and
not just using (or extending) C++'s<string>?  Likewise, I've noticed
functions, etc. that is also done by other libraries -- Boost is the one that
comes to mind.  I can't think of examples at the moment, but was there any
reason for duplicating such effort?  Was it to not rely on other non-standard

(Summary: Eat your own dog food, simplicity, control over code, fewer 

dependencies. Still, please tell us if you think that an external 

library would be more suitable for our users!)

CharString is a typedef to String<char, Alloc<> > and mostly there to 

show that the String class works fine for char, too. This goes along the 

"eat your own dog food" line. "What better way to test our library and 

show that it is good than to use it wherever possible if this does not 

come with too much effort." In the case of CharString, this is a typedef.

By the way, you should be able to use char* and std::string just like a 

SeqAn String and thus also CharString (length(), infix should work).

The reason for duplication is mostly that we do not want to depend on 

Boost for the main library and often users have problem installing it. 

Don't understand me wrong: We really like (most of) Boost but it is a 

too strong library.

One of the nicest things in SeqAn is (in my opinion) that you just 

download one library and you are ready to go. The same is true for most 

SeqAn apps: Download the SeqAn tarball and the only dependency for 

building are CMake and Python (where Python will hopefully disappear as 

a hard dependency for building). OK, you will need gzip for compression 

but that's optional and most likely installed on your system.

I have really struggled with compiling some bioinformatics software 

since they (indirectly) depend on a pandemonium of other (sometimes 

obscure) libraries. This would not be too much of a problem with Boost 

but, as stated above, users sometimes have problems installing it.

Another reason is that you have full control of your own code versus 

other people's code: You don't have to inherit other code's complexity 

(Boost accumulators come to mind) or bugs (Boost statistics generated 

compiler warnings, GCC TR1 random module was buggy).

So far, the policy has mostly been "the core library can only depend on 

C++ standard libraries", meaning STL, C++ streams, C standard library 

and POSIX/Windows for platform dependent code. Apps can use whatever 

they want and will only be compiled when the dependencies are available.

Also, since SeqAn is a library, you can just combine it with any other 

library. Don't like our I/O or RNG code? Fine, use something from Boost, 

the NCBI library, Bio++ etc.

Through the global interface of SeqAn you can even write adaptions of 

your own string implementation, for example, such that you can use our 

indices and algorithms on your own strings.

That said, we always welcome suggestions as to which functionality is 

also provided by other libraries. If we provide duplicative 

functionality that can be just as well come from another library then 

please tell us so!

HTH,
Manuel