Just to say that in the end I've managed to index UniRef100 2012_06.So I could finally run lambda and test that it lives up to the expectation. It could do multiple query sequences in less than a second per query! That really is impressing, we will try to switch from blast as soon as possible! For our application (http://www.eppic-web.org) we need speed but not so much sensitivity, we are only interested in homologs > 50% sequence identitiy.
Cheers Jose On 08.05.2015 12:55, Jose Manuel Duarte wrote:
Hi Hannes Thanks so much for the answers. Some comments belowIf this is a different error from the one below, it is unexpected. Can you open an issue for this in the seqan bug tracker with a link to the exact file used? Please note that the requirements for free disk space for skew are veryThe error looked different, just a segfault without a trace. But of course it must have been the disk space as you explain.high (see below).Indeed the requirements for disk space are quite high for skew. As described in the help-page, I have measured 30x. So if your file is 8GB and say 6GB ofAlright I'll try that, thanks for the tip. But in the end of the day I would like to index the current UniRef100. Following your estimates I would need something like 600GB of disk space for that... I might be able still to try it, but surely in a few months from now UniRef100 will have a size that will be impossible to deal with. It's great that you guys are already working on new algos for indexing :)this is sequence data, than the external space requirement might well be 180GB...You might want to try the quicksort or quicksortbuckets algorithms. The don'trequire external disk space and if you have 128GB of RAM, this should be enough to build the index for your 8GB file.On an unrelated note, I also tried out the pre-indexed nr files you guys distribute from your website. There I get this:./bin/lambda -q query.fasta -d nr/nr.fasta -p blastp LAMBDA - the Local Aligner for Massive Biological DatA ====================================================== Version 0.4.7 Loading Subj Sequences… done.Loading Subj Ids…/home/mi/h4nn3s/takifugu/seqan-lambda-v0.4.7/core/include/seqan/basic/basic_exception.h:345 FAILED! (Uncaught exception of type std::bad_alloc: std::bad_alloc)stack trace: 0 [0xa97a0e] seqan::ClassTest::fail() + 0xe 1 [0x8fd5a2] ./bin/lambda() 2 [0x1510ed6] __cxxabiv1::__terminate(void (*)()) + 0x6 3 [0x1510f03] ./bin/lambda() 4 [0x151131e] ./bin/lambda() 5 [0x151121d] operator new(unsigned long) + 0x7d6 [0xfb8dd2] void seqan::AssignString_<seqan::Tag<seqan::TagExact_> >::assign_<seqan::String<char, seqan::Alloc<void> >, seqan::String<char, seqan::External<seqan::ExternalConfigLarge<seqan::File<seqan::Async<void> >, 4194304u, 2u> > > const>(seqan::String<char, seqan::Alloc<void> >&, seqan::String<char, seqan::External<seqan::ExternalConfigLarge<seqan::File<seqan::Async<void> >, 4194304u, 2u> > > const&) + 0x1927 [0x919266] ./bin/lambda()8 [0xfe74d0] int loadSubjects<(seqan::BlastFormatFile)8, (seqan::BlastFormatProgram)1, (seqan::BlastFormatGeneration)0, seqan::SimpleType<unsigned char, seqan::ReducedAminoAcid_<seqan::Tag<seqan::Murphy10_> > >, seqan::Score<int, seqan::ScoreMatrix<seqan::SimpleType<unsigned char, seqan::AminoAcid_>, seqan::Blosum62_> >, seqan::FMIndex<void, seqan::FMIndexConfig<void> > >(GlobalDataHolder<seqan::SimpleType<unsigned char, seqan::ReducedAminoAcid_<seqan::Tag<seqan::Murphy10_> > >, seqan::Score<int, seqan::ScoreMatrix<seqan::SimpleType<unsigned char, seqan::AminoAcid_>, seqan::Blosum62_> >, seqan::FMIndex<void, seqan::FMIndexConfig<void> >, (seqan::BlastFormatFile)8, (seqan::BlastFormatProgram)1, (seqan::BlastFormatGeneration)0>&, LambdaOptions const&) + 0x230 9 [0x14e15f5] int argConv2<(seqan::BlastFormatFile)8, (seqan::BlastFormatProgram)1, (seqan::BlastFormatGeneration)0, seqan::SimpleType<unsigned char, seqan::ReducedAminoAcid_<seqan::Tag<seqan::Murphy10_> > > >(LambdaOptions const&, seqan::Tag<seqan::BlastFormat_<(seqan::BlastFormatFile)8, (seqan::BlastFormatProgram)1, (seqan::BlastFormatGeneration)0> > const&, seqan::SimpleType<unsigned char, seqan::ReducedAminoAcid_<seqan::Tag<seqan::Murphy10_> > > const&) + 0x33510 [0x15078ec] argConv0(LambdaOptions const&) + 0x6c 11 [0x8e5cb7] main + 0x3e7 12 [0x7fea204d0a40] __libc_start_main + 0xf0 13 [0x8e6299] ./bin/lambda() Aborted (core dumped)I am assuming that the nr db is protein and that I can do protein queries (blastp) against it, is that right?Thanks for all the help! Jose _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de https://lists.fu-berlin.de/listinfo/seqan-dev