Have you tried storing the database file (including lambda's files) in a shared memory filesystem, e.g. /dev/shm ? If you do this all data will already be in main memory when the program is started -- however it will still need to be copied around, so of course its not optimal. Also during program run-time the sequences will both be in the program's allocated memory and in the shm, so they will effectively use double the space. But it might still be worthwhile for you, I can't say without knowing the exact use-case and hardware available.
I've tried /dev/shm already but it didn't make a difference.Here are the runtimes from 3 consecutive runs (269 sequences in one file against a database that takes 7.7GB in plain text fasta file).
Reading from disk: real 1m49.591s real 1m49.259s real 1m49.282s Reading from /dev/shm: real 1m49.480s real 1m49.290s real 1m49.007sAs you say the data still needs to be copied around and that is most likely where most of that time is spent (steps "Loading Subj Sequences" and "Loading Subj Ids" seem to be the slow ones).
My guess is that there's also a lot of disk buffering happening when it's read from disk (the system I'm running has 128GB of memory and not so loaded at the moment, so I'm sure it has enough memory to keep all the files in the buffer cache), that's my explanation as to why there's not much difference between the disk and the /dev/shm runs.
Cheers Jose