Hi Manuel, I'm looking to generally contribute performance-wise to SeqAn. My domain knowledge of bioinformatics topics is near zero, so C++-centric optimizations and singlethreaded to multithreaded optimizations are of most interest to me. In what files do the performance-critical algorithms you mentioned reside? q-gram, ESA, WOTD, alignment, etc. How can performance tests be written against these algorithms? In what files do the non-parallel compression and decompression of BGZF files reside? How can a performance test be written against this code? Thanks for the seqan-contrib link, that fixed the compiler errors. Cheers, Neil On Fri, Jun 1, 2012 at 5:00 AM, <seqan-dev-request@lists.fu-berlin.de> wrote: > Send seqan-dev mailing list submissions to > seqan-dev@lists.fu-berlin.de > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.fu-berlin.de/listinfo/seqan-dev > or, via email, send a message with subject or body 'help' to > seqan-dev-request@lists.fu-berlin.de > > You can reach the person managing the list at > seqan-dev-owner@lists.fu-berlin.de > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of seqan-dev digest..." > > > Today's Topics: > > 1. Re: seqan-dev Digest, Vol 32, Issue 8 (Holtgrewe, Manuel) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 31 May 2012 11:07:22 +0000 > From: "Holtgrewe, Manuel" <manuel.holtgrewe@fu-berlin.de> > To: SeqAn Development <seqan-dev@lists.fu-berlin.de> > Subject: Re: [Seqan-dev] seqan-dev Digest, Vol 32, Issue 8 > Message-ID: > <FCCAB9D80C3DAB47B5601C5B0E62872B096300@ex02a.campus.fu-berlin.de> > Content-Type: text/plain; charset="us-ascii" > > Hi Neal, > > Regarding performance sensitive code: My main assumption here is that you are looking for an area to contribute to. Maybe you could elaborate a bit on what you want to do. Do you want to look into performance-tweaking parts of the library or "generally contribute performance-wise" to the library? Below, I give some possible projects for both. > > The inner loops are inside the index (most interesting: q-gram, ESA, WOTD indices) and online string search finders (myers and banded myers algorithm are the most interesting here for edit distance). The index building algorithms are also a point for possible optimization. Also important to many applications is the SWIFT q-gram counting based and pigeonhole filter algorithms. > > Another larger area with inner loops are the alignment algorithms but we are in the process of rewriting this part of SeqAn. Hopefully this will be completed in 1-2 months so it becomes a possible target. > > I am not too sure whether these are low-hanging fruits for performance optimization because we looked into this in detail in most cases. That said, if you are interested in this area, I can give you the exact locations and explain how to write performance test programs with real-world input. > > If I may somewhat widen your question from "performance issues" to "performance-related", it might be easier to find a "big bang for the buck" problem to tackle. Here, improved implementations or parallelization of certain important library parts come to mind. Some examples: > > - Parallel compression and decompression of BGZF files (they consist of gzip compressed blocks and could thus be parallelized). This comes down to porting an existing threadpool library to C++11 and using C++11 threads for a parallel BGZF reader/writer and BAM file sorter. I have the beginning of this on my local disk but we could easily make this a github project. This requires no prior knowledge in sequence analysis algorithms but expertise in C++ and parallel programming. > > - Space efficient BWT construction. This basically consists of implementing an existing algorithm in SeqAn. More information can be found here: https://www.mi.fu-berlin.de/w/ABI/SpaceEfficientBWTConstruction. The requirement on sequence analysis algorithms is probably quite low here and the necessary parts could be learned on the fly. > > - Parallelization of suffix array construction. This is a large chunk of work and would require to rewrite the parallel external memory part of the algorithm and probably extend the external memory algorithms and data structures part of SeqAn to parallelism. > > Cheers, > Manuel > > PS: A valuable resource for looking SeqAn-related things up besides the Tutorial is http://docs.seqan.de/seqan/dev2/. > > > ------------------------------ > > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev > > > End of seqan-dev Digest, Vol 33, Issue 1 > ****************************************