Re: [Seqan-dev] Client-server lambda?


Have you tried storing the database file (including lambda's files) in a
shared memory filesystem, e.g. /dev/shm ? If you do this all data will already
be in main memory when the program is started -- however it will still need to
be copied around, so of course its not optimal. Also during program run-time
the sequences will both be in the program's allocated memory and in the shm,
so they will effectively use double the space. But it might still be
worthwhile for you, I can't say without knowing the exact use-case and
hardware available.

I've tried /dev/shm already but it didn't make a difference.

Here are the runtimes from 3 consecutive runs (269 sequences in one file against a database that takes 7.7GB in plain text fasta file).

Reading from disk:

real    1m49.591s
real    1m49.259s
real    1m49.282s

Reading from /dev/shm:

real    1m49.480s
real    1m49.290s
real    1m49.007s


As you say the data still needs to be copied around and that is most likely where most of that time is spent (steps "Loading Subj Sequences" and "Loading Subj Ids" seem to be the slow ones).

My guess is that there's also a lot of disk buffering happening when it's read from disk (the system I'm running has 128GB of memory and not so loaded at the moment, so I'm sure it has enough memory to keep all the files in the buffer cache), that's my explanation as to why there's not much difference between the disk and the /dev/shm runs.

Cheers

Jose