Hi Hieu,
This incomprehensible error message means that the index construction algorithm run out of disk space! The suffix array construction algorithm implemented in SeqAn (Skew7) works on external memory and requires at least 20-25 times the disk space of the
input text. The algorithm writes such temporary data to the system’s temporary folder (usually /tmp). In general, you can change this folder by (re)defining the environment variable TMPDIR.
Concerning any application that you write using SeqAn (or the MiniBowtie demo): you should redefine the SAValue metafunction to squeeze the size of the suffix array. By default, SAValue is a Pair<__uint64, __uint64>. For hg19, you can redefine SAValue as
follows:
namespace seqan {
template <typename TString,
typename TSpec>
struct SAValue<StringSet<TString, TSpec> >
{
typedef Pair<__uint8, __uint32, Pack> Type;
};
}
In this way each SA value will consume only 5 bytes (1 byte to index any sequence in the text collection + 4 bytes to index any position within any sequence).
Concerning Masai: please upgrade to Yara (http://www.seqan.de/projects/yara/) if you didn’t!
Enrico
On 27 Nov 2014, at 05:26, Tran Ngoc Hieu (Dr) <NHTran@ntu.edu.sg> wrote:
|