FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] lambda_indexer trouble

<-- thread -->
<-- date -->
  • From: Jose Manuel Duarte <jose.duarte@psi.ch>
  • To: Hannes Hauswedell <hannes.hauswedell@fu-berlin.de>, <seqan-dev@lists.fu-berlin.de>
  • Date: Fri, 8 May 2015 12:55:28 +0200
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] lambda_indexer trouble

Hi Hannes

Thanks so much for the answers. Some comments below


If this is a different error from the one below, it is unexpected. Can you
open an issue for this in the seqan bug tracker with a link to the exact file
used? Please note that the requirements for free disk space for skew are very
high (see below).
The error looked different, just a segfault without a trace. But of course it must have been the disk space as you explain.

Indeed the requirements for disk space are quite high for skew. As described
in the help-page, I have measured 30x. So if your file is 8GB and say 6GB of
this is sequence data, than the external space requirement might well be
180GB...

You might want to try the quicksort or quicksortbuckets algorithms. The don't
require external disk space and if you have 128GB of RAM, this should be
enough to build the index for your 8GB file.

Alright I'll try that, thanks for the tip. But in the end of the day I would like to index the current UniRef100. Following your estimates I would need something like 600GB of disk space for that... I might be able still to try it, but surely in a few months from now UniRef100 will have a size that will be impossible to deal with. It's great that you guys are already working on new algos for indexing :)

On an unrelated note, I also tried out the pre-indexed nr files you guys distribute from your website. There I get this:

./bin/lambda -q query.fasta -d nr/nr.fasta -p blastp
LAMBDA - the Local Aligner for Massive Biological DatA
======================================================
Version 0.4.7

Loading Subj Sequences… done.
Loading Subj Ids…/home/mi/h4nn3s/takifugu/seqan-lambda-v0.4.7/core/include/seqan/basic/basic_exception.h:345 FAILED! (Uncaught exception of type std::bad_alloc: std::bad_alloc)

stack trace:
  0          [0xa97a0e]  seqan::ClassTest::fail() + 0xe
  1          [0x8fd5a2]  ./bin/lambda()
  2         [0x1510ed6]  __cxxabiv1::__terminate(void (*)()) + 0x6
  3         [0x1510f03]  ./bin/lambda()
  4         [0x151131e]  ./bin/lambda()
  5         [0x151121d]  operator new(unsigned long) + 0x7d
6 [0xfb8dd2] void seqan::AssignString_<seqan::Tag<seqan::TagExact_> >::assign_<seqan::String<char, seqan::Alloc<void> >, seqan::String<char, seqan::External<seqan::ExternalConfigLarge<seqan::File<seqan::Async<void> >, 4194304u, 2u> > > const>(seqan::String<char, seqan::Alloc<void> >&, seqan::String<char, seqan::External<seqan::ExternalConfigLarge<seqan::File<seqan::Async<void> >, 4194304u, 2u> > > const&) + 0x192
  7          [0x919266]  ./bin/lambda()
8 [0xfe74d0] int loadSubjects<(seqan::BlastFormatFile)8, (seqan::BlastFormatProgram)1, (seqan::BlastFormatGeneration)0, seqan::SimpleType<unsigned char, seqan::ReducedAminoAcid_<seqan::Tag<seqan::Murphy10_> > >, seqan::Score<int, seqan::ScoreMatrix<seqan::SimpleType<unsigned char, seqan::AminoAcid_>, seqan::Blosum62_> >, seqan::FMIndex<void, seqan::FMIndexConfig<void> > >(GlobalDataHolder<seqan::SimpleType<unsigned char, seqan::ReducedAminoAcid_<seqan::Tag<seqan::Murphy10_> > >, seqan::Score<int, seqan::ScoreMatrix<seqan::SimpleType<unsigned char, seqan::AminoAcid_>, seqan::Blosum62_> >, seqan::FMIndex<void, seqan::FMIndexConfig<void> >, (seqan::BlastFormatFile)8, (seqan::BlastFormatProgram)1, (seqan::BlastFormatGeneration)0>&, LambdaOptions const&) + 0x230 9 [0x14e15f5] int argConv2<(seqan::BlastFormatFile)8, (seqan::BlastFormatProgram)1, (seqan::BlastFormatGeneration)0, seqan::SimpleType<unsigned char, seqan::ReducedAminoAcid_<seqan::Tag<seqan::Murphy10_> > > >(LambdaOptions const&, seqan::Tag<seqan::BlastFormat_<(seqan::BlastFormatFile)8, (seqan::BlastFormatProgram)1, (seqan::BlastFormatGeneration)0> > const&, seqan::SimpleType<unsigned char, seqan::ReducedAminoAcid_<seqan::Tag<seqan::Murphy10_> > > const&) + 0x335
 10         [0x15078ec]  argConv0(LambdaOptions const&) + 0x6c
 11          [0x8e5cb7]  main + 0x3e7
 12    [0x7fea204d0a40]  __libc_start_main + 0xf0
 13          [0x8e6299]  ./bin/lambda()

Aborted (core dumped)






I am assuming that the nr db is protein and that I can do protein queries (blastp) against it, is that right?

Thanks for all the help!

Jose



<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] lambda_indexer trouble
      • From: Jose Manuel Duarte <jose.duarte@psi.ch>
  • References:
    • [Seqan-dev] lambda_indexer trouble
      • From: Jose Manuel Duarte <jose.duarte@psi.ch>
    • Re: [Seqan-dev] lambda_indexer trouble
      • From: Hannes Hauswedell <hannes.hauswedell@fu-berlin.de>
  • seqan-dev - May 2015 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal