On 04.11.2011 02:10, patrick reed wrote:
Hi, I am trying to use tophat/bowtie to allign RNAseq data to a reference genome that is about 7gb in size. From what i can decipher from your internals, i am a c++ novice, they use your software in the build functionality to build a reference genome, and SeqAn uses 32bit integers which limits the maximum reference size to about 4gb. Is there anyway to convert to 64bit integers to allow for an extremely large reference genome.
Dear Patrick,can you give me a link (file/lines) in the tophead/bowtie download ZIP file that you think causes the problem?
SeqAn is very generic in this aspect and Strings can be configured to either use 32 bit (to save memory in pointers/locations) or 64 bit addresses (to give you virtually unlimited addressable memory). We have successfully used SeqAn Strings of more than 4GB and indices on them.
It might be that the tophead/bowtie authors used the String class such that only 32 bit pointers were used (intentionally or unintentionally) or that they triggered a bug in SeqAn that we are not aware of. We might be able to help in the first case and will be in the second.
Bests, Manuel