Hello, I am using Seqan to access a large FASTA file. In this case, I am importing the whole RefSeq DB for random access (into memory or memory-mapped). This can be quite a huge file, so I decided to go for a dynamic strategy writing a generic SequenceStorage object. It works well for typedef seqan::String< seqan::Dna5 > StringType; //(default type) typedef seqan::String< seqan::Dna5, seqan::Packed<> > StringType; but not for typedef seqan::String< seqan::Dna5, seqan::MMap<> > StringType; Here is the Code that imports the data using the MMap-Trick from the HowTo and put it into a StringSet< StringType > data_; with an index data structure std::map< std::string, long unsigned int > id2pos_; -------------------------------------------------------------------------------- seqan::MultiSeqFile db_sequences; seqan::open( db_sequences.concat, filename.c_str(), seqan::OPEN_RDONLY ); seqan::split( db_sequences, seqan::Fasta() ); for( unsigned int i = 0; i < num_records; ++i ) { StringType seq; seqan::assignSeq( seq, db_sequences[i], fasta_format_ ); std::string id; seqan::assignSeqId( id, db_sequences[i], fasta_format_ ); id2pos_[ extractFastaCommentField( id, "gi" ) ] = seqan::assignValueById( data_, seq ); } -------------------------------------------------------------------------------- 1) seqan::assignValueById() will cause a segfault at sequence number 33,924 out of 276,313 when using a StringSet with mmap strings. 2) Also, I don't know how to define a StringSet using array strings. 3) Using a regular Dna5 string, the how operation will take about 5 minutes. A packed string requires much longer to load. Is there any way to speed this up? I could think of a (binary) sink for a StingSet to avoid parsing and recoding every time I load the DB sequences. Is there anything like this (planned)? I appreciate your help! Gruß Johannes