Dear Jose, I am sorry to hear that it is not working for you as expected. The next Lambda version will contain indexing methods that are more more memory efficient. I will still try to answer your questions between the lines: Am Freitag, 8. Mai 2015, 10:27:01 schrieb Jose Manuel Duarte: > I am having a lot of trouble using lambda_indexer to index full > UniRef100 fasta files. I followed the instructions in the lamda website: > > % /path/to/segmasker -infmt fasta -in db.fasta -outfmt interval -out db.seg > > % bin/lambda_indexer -d db.fasta -s db.seg That's correct. > I first tried with the current UniRef100 release (2015_05), which is > huge (26GB uncompressed fasta file) and then I came across the memory > problems that are documented in lambda_indexer's help. So I ended up > using "-a skew7ext" and ran it in the largest memory system I had > available (128GB). The program ran, but at some point after "Generating > Index..." it died with a segfault and no other information. If this is a different error from the one below, it is unexpected. Can you open an issue for this in the seqan bug tracker with a link to the exact file used? Please note that the requirements for free disk space for skew are very high (see below). > Then I decided to try on a smaller UniRef, so I took an older version > (2012_06, only 8GB uncompressed fasta file). I ran again with "-a > skew7ext" and this time it did go further, but eventually also died: > > Dumping unreduced Subj Sequences… done. > Generating Index…Asynchronous I/O operation failed (waitFor): "Success" > [...] This is always an indicator of running out of disk space in the TMPDIR. > I'm pretty sure I have enough space available on the disk where I'm > running it (>100GB). Is there anything obvious that I am doing wrong? Do > you guys have any experience in indexing large files like this? Indeed the requirements for disk space are quite high for skew. As described in the help-page, I have measured 30x. So if your file is 8GB and say 6GB of this is sequence data, than the external space requirement might well be 180GB... You might want to try the quicksort or quicksortbuckets algorithms. The don't require external disk space and if you have 128GB of RAM, this should be enough to build the index for your 8GB file. > Apologies in advance if the message does not belong in the dev list. I > couldn't find any more appropriate place to post it. I'd like to confirm > this is a bug rather than me misusing the software before submitting an > issue to github. Feel free to write to this list, post on github or write to me directly! All ways are accepted :) Best regards, -- Hannes Hauswedell PhD student Max Planck Institute for Molecular Genetics / Freie Universität Berlin address Institut für Informatik Takustraße 9 Room 019 14195 Berlin telephone +49 (0)30 838-75241 fax +49 (0)30 838-75218 e-mail hannes.hauswedell@[molgen.mpg.de|fu-berlin.de]