Hi Ray, On 02/09/2012 06:26 AM, Raymond Wan wrote:
Hi all, Somewhat of a basic question, but I haven't been able to figure it out. How does one take a substring (and if possible, prefix or suffix) of a CharString?
There is the Segment class with Infix, Prefix, Suffix specializations. Try the infix(), prefix() and suffix() functions.
And how does one find out the length of a CharString?
Try length(). same as with all strings. ;)
And can someone tell me the reasoning for creating CharString type in SeqAn and not just using (or extending) C++'s<string>? Likewise, I've noticed functions, etc. that is also done by other libraries -- Boost is the one that comes to mind. I can't think of examples at the moment, but was there any reason for duplicating such effort? Was it to not rely on other non-standard
(Summary: Eat your own dog food, simplicity, control over code, fewer dependencies. Still, please tell us if you think that an external library would be more suitable for our users!)
CharString is a typedef to String<char, Alloc<> > and mostly there to show that the String class works fine for char, too. This goes along the "eat your own dog food" line. "What better way to test our library and show that it is good than to use it wherever possible if this does not come with too much effort." In the case of CharString, this is a typedef.
By the way, you should be able to use char* and std::string just like a SeqAn String and thus also CharString (length(), infix should work).
The reason for duplication is mostly that we do not want to depend on Boost for the main library and often users have problem installing it. Don't understand me wrong: We really like (most of) Boost but it is a too strong library.
One of the nicest things in SeqAn is (in my opinion) that you just download one library and you are ready to go. The same is true for most SeqAn apps: Download the SeqAn tarball and the only dependency for building are CMake and Python (where Python will hopefully disappear as a hard dependency for building). OK, you will need gzip for compression but that's optional and most likely installed on your system.
I have really struggled with compiling some bioinformatics software since they (indirectly) depend on a pandemonium of other (sometimes obscure) libraries. This would not be too much of a problem with Boost but, as stated above, users sometimes have problems installing it.
Another reason is that you have full control of your own code versus other people's code: You don't have to inherit other code's complexity (Boost accumulators come to mind) or bugs (Boost statistics generated compiler warnings, GCC TR1 random module was buggy).
So far, the policy has mostly been "the core library can only depend on C++ standard libraries", meaning STL, C++ streams, C standard library and POSIX/Windows for platform dependent code. Apps can use whatever they want and will only be compiled when the dependencies are available.
Also, since SeqAn is a library, you can just combine it with any other library. Don't like our I/O or RNG code? Fine, use something from Boost, the NCBI library, Bio++ etc.
Through the global interface of SeqAn you can even write adaptions of your own string implementation, for example, such that you can use our indices and algorithms on your own strings.
That said, we always welcome suggestions as to which functionality is also provided by other libraries. If we provide duplicative functionality that can be just as well come from another library then please tell us so!
HTH, Manuel