FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] Reading very long sequences from file

<-- thread -->
<-- date -->
  • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Mon, 19 Dec 2011 17:10:27 +0100
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] Reading very long sequences from file

Short answer: Try using the "Visual Studio 10 Win64" generator, your problem appears to be that you are doing 32 bit builds such that your application can only use 2 GB of main memory. This should resolve the problems described below.

Note that SeqAn rounds up the allocated memory when using Generous() allocation (the default) such that the problem might occur earlier than you expect.


I reproduced your error with the 1.5 gig file on Windows 7 with Visual Studio 2010, using a 2GB file in a 32 bit build:

I tracked the error down to a failing MapViewOfFile() call. This is a Windows API call, so you might want to look up the limitations of this function.

The assignSeq() function itself is written such that only constant memory plus the resulting string is required for reading files.


I also reproduced your other problem with the a 800 MB file. Here, the internal aligned malloc call returns NULL which is not handled by SeqAn.

I already added a ticket for this here

https://trac.seqan.de/ticket/925

Note that this ticket only asks for notifying the user that a malloc error happened but it would not enable you to make the function work in your configuration.

On 12/16/2011 09:05 PM, Johannes Merkle wrote:
Hello,

I'm trying to read a sequence from a textfile has a size of around
3gigabytes based on the tutorial to import millions of sequences.

For files bigger than around 1.5gig the program of the tutorial
immediately stops because open(multiSeqFile.concat, argv[1],
OPEN_RDONLY)  returns false.

For file(s) (since i only tested it with one file) bigger than around
700mb the program crashs on an invalid null pointer after a while.

What are the limitations here? I'm running this on a win7 64 machine
with 4gig of ram. The program is compiled in visual studio 2010

Is there a better way to do this? Since i'm not trying to read multiple
sequences but only one that is very large.

Thanks,
Johannes




<-- thread -->
<-- date -->
  • References:
    • [Seqan-dev] Reading very long sequences from file
      • From: Johannes Merkle <merkle@in.tum.de>
  • seqan-dev - December 2011 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal