Note that SeqAn rounds up the allocated memory when using Generous() allocation (the default) such that the problem might occur earlier than you expect.
I reproduced your error with the 1.5 gig file on Windows 7 with Visual Studio 2010, using a 2GB file in a 32 bit build:
I tracked the error down to a failing MapViewOfFile() call. This is a Windows API call, so you might want to look up the limitations of this function.
The assignSeq() function itself is written such that only constant memory plus the resulting string is required for reading files.
I also reproduced your other problem with the a 800 MB file. Here, the internal aligned malloc call returns NULL which is not handled by SeqAn.
I already added a ticket for this here https://trac.seqan.de/ticket/925Note that this ticket only asks for notifying the user that a malloc error happened but it would not enable you to make the function work in your configuration.
On 12/16/2011 09:05 PM, Johannes Merkle wrote:
Hello, I'm trying to read a sequence from a textfile has a size of around 3gigabytes based on the tutorial to import millions of sequences. For files bigger than around 1.5gig the program of the tutorial immediately stops because open(multiSeqFile.concat, argv[1], OPEN_RDONLY) returns false. For file(s) (since i only tested it with one file) bigger than around 700mb the program crashs on an invalid null pointer after a while. What are the limitations here? I'm running this on a win7 64 machine with 4gig of ram. The program is compiled in visual studio 2010 Is there a better way to do this? Since i'm not trying to read multiple sequences but only one that is very large. Thanks, Johannes