From theo@stillwater-sc.com Sat Mar 02 00:00:29 2013 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.80.1) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1UBYw8-001eHI-U3>; Sat, 02 Mar 2013 00:00:21 +0100 Received: from mail32c40.carrierzone.com ([209.235.156.172]) by relay1.zedat.fu-berlin.de (Exim 4.80.1) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1UBYw8-000No8-IF>; Sat, 02 Mar 2013 00:00:20 +0100 X-Authenticated-User: theo.stillwater-sc.com Received: from [192.168.1.106] (173-162-153-241-NewEngland.hfc.comcastbusiness.net [173.162.153.241]) (authenticated bits=0) by mail32c40.carrierzone.com (8.13.6/8.13.1) with ESMTP id r21N0Cus026121 for ; Fri, 1 Mar 2013 23:00:13 +0000 Message-ID: <513132FB.7050804@stillwater-sc.com> Date: Fri, 01 Mar 2013 18:00:11 -0500 From: Theodore Omtzigt Organization: Stillwater Supercomputing, Inc. User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: SeqAn Dev List X-Enigmail-Version: 1.5.1 Content-Type: multipart/alternative; boundary="------------090906090203010406030109" X-CSC: 0 X-CHA: v=2.0 cv=c6Yct2Bl c=1 sm=1 a=4MqCJTanTeL1f/Xet1sSjg==:17 a=zopZFYjvPDYA:10 a=Hf-sAbsWREgA:10 a=ek83P5UFAAAA:8 a=QTV-e6nB_TAA:10 a=44UmsU95y4f0J7DcHwsA:9 a=wPNLvfGTeEIA:10 a=5nO_i54kEXZHjdNtU-UA:9 a=_W_S_7VecoQA:10 a=7Rx-RgI5Mj8vRmBi:21 a=4MqCJTanTeL1f/Xet1sSjg==:117 X-CTCH-RefID: str=0001.0A020204.513132FE.0015, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Rules: X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-Originating-IP: 209.235.156.172 X-ZEDAT-Hint: A X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1362178820-00000A3F-88C0D97F/0-0/0-0 X-Bogosity: Unsure, tests=bogofilter, spamicity=0.500065, version=1.2.2 X-Spam-Flag: NO X-Spam-Status: No, score=1.5 required=5.0 tests=FU_BOGO_UNSURE,HTML_MESSAGE, RCVD_IN_DNSWL_NONE X-Spam-Checker-Version: SpamAssassin 3.3.2 on Algerien.ZEDAT.FU-Berlin.DE X-Spam-Level: x Subject: [Seqan-dev] Fastq test files X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.14 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Mar 2013 23:00:30 -0000 This is a multi-part message in MIME format. --------------090906090203010406030109 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit I just got a set of FASTQ test files from Illumina BaseSpace and SeqAn is barfing on them reporting INVALID_FORMAT. s_G1_L001_I1_001.fastq.1, s_G1_L001_I1_002.fastq.1, s_G1_L001_R1_001.fastq.1, s_G1_L001_R1_002.fastq.1, s_G1_L001_R2_001.fastq.1, s_G1_L001_R2_002.fastq.1 Here is a quick snippet of the first file @:89:A0172:1:1:12008:1323 1:N:0:1 TTAGGC + ;B@FFF @:89:A0172:1:1:15627:1329 1:N:0:1 TTAGGC + @CCFFF @:89:A0172:1:1:19263:1331 1:N:0:1 TTAGGC + @@CDDF @:89:A0172:1:1:24249:1331 1:N:0:1 TTAGGC + BCCFFF @:89:A0172:1:1:15721:1332 1:N:0:1 TTAGGC + <@ I just got a set of FASTQ test files from Illumina BaseSpace and SeqAn is barfing on them reporting INVALID_FORMAT.

s_G1_L001_I1_001.fastq.1,
s_G1_L001_I1_002.fastq.1,
s_G1_L001_R1_001.fastq.1,
s_G1_L001_R1_002.fastq.1,
s_G1_L001_R2_001.fastq.1,
s_G1_L001_R2_002.fastq.1
Here is a quick snippet of the first file

@:89:A0172:1:1:12008:1323 1:N:0:1
TTAGGC
+
;B@FFF
@:89:A0172:1:1:15627:1329 1:N:0:1
TTAGGC
+
@CCFFF
@:89:A0172:1:1:19263:1331 1:N:0:1
TTAGGC
+
@@CDDF
@:89:A0172:1:1:24249:1331 1:N:0:1
TTAGGC
+
BCCFFF
@:89:A0172:1:1:15721:1332 1:N:0:1
TTAGGC
+
<@<DAD
@:89:A0172:1:1:15433:1333 1:N:0:1
TTAGGC


Would it be possible to include a couple of very short test files in the SeqAn src tree, say under seqan/data, that do pass successfully through readRecord() so that software development can continue while I/O issues are sorted out?

Would also be nice to know why these Illumina BaseSpace files don't pass.
--------------090906090203010406030109-- From manuel.holtgrewe@fu-berlin.de Sat Mar 02 00:17:15 2013 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.80.1) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1UBZCU-001epS-Cw>; Sat, 02 Mar 2013 00:17:14 +0100 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.80.1) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1UBZCU-0035GN-AO>; Sat, 02 Mar 2013 00:17:14 +0100 Received: from cas2.campus.fu-berlin.de ([130.133.170.202]) by relay2.zedat.fu-berlin.de (Exim 4.80.1) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1UBZCT-002TeA-Vp>; Sat, 02 Mar 2013 00:17:14 +0100 Received: from EX02A.campus.fu-berlin.de ([130.133.170.132]) by CAS2.campus.fu-berlin.de ([130.133.170.202]) with mapi id 14.03.0123.003; Sat, 2 Mar 2013 00:17:13 +0100 From: "Holtgrewe, Manuel" To: SeqAn Development Thread-Topic: [Seqan-dev] Fastq test files Thread-Index: AQHOFtCZdGfvL086YUaXLT9kWH6W8JiRde8d Message-ID: References: <513132FB.7050804@stillwater-sc.com> In-Reply-To: <513132FB.7050804@stillwater-sc.com> Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: multipart/alternative; boundary="_000_FCCAB9D80C3DAB47B5601C5B0E62872B29452292ex02acampusfube_" MIME-Version: 1.0 Date: Sat, 02 Mar 2013 00:17:11 +0100 X-Original-Date: Fri, 1 Mar 2013 23:17:11 +0000 X-Originating-IP: 130.133.170.202 X-ZEDAT-Hint: A X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1362179834-00000A3F-2C211259/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-Spam-Flag: NO X-Spam-Status: No, score=-50.0 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE X-Spam-Checker-Version: SpamAssassin 3.3.2 on Botsuana.ZEDAT.FU-Berlin.DE X-Spam-Level: Subject: Re: [Seqan-dev] Fastq test files X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.14 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Mar 2013 23:17:15 -0000 --_000_FCCAB9D80C3DAB47B5601C5B0E62872B29452292ex02acampusfube_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Theo, we already have nightly tests. http://cdash.seqan.de/index.php?project=3DSeqAn Are you use SequenceStream? What does your source code look like? What are you reading the sequences into? DnaString? CharString? Can you giv= e more details here? Your snippet parses nicely with SequenceStream. Currently, there is a limitation that when reading sequence into Dna5String= then any non-CGATN character causes an error. We will resolve this issue w= ith a configuration object to the readRecord function in the future that al= lows to switch between error/coerce-to-N for other characters (e.g. when th= ere are IUPAC characters indicating an A-C ambiguity). *m ________________________________ From: Theodore Omtzigt [theo@stillwater-sc.com] Sent: Saturday, March 02, 2013 12:00 AM To: SeqAn Dev List Subject: [Seqan-dev] Fastq test files I just got a set of FASTQ test files from Illumina BaseSpace and SeqAn is b= arfing on them reporting INVALID_FORMAT. s_G1_L001_I1_001.fastq.1, s_G1_L001_I1_002.fastq.1, s_G1_L001_R1_001.fastq.1, s_G1_L001_R1_002.fastq.1, s_G1_L001_R2_001.fastq.1, s_G1_L001_R2_002.fastq.1 Here is a quick snippet of the first file @:89:A0172:1:1:12008:1323 1:N:0:1 TTAGGC + ;B@FFF @:89:A0172:1:1:15627:1329 1:N:0:1 TTAGGC + @CCFFF @:89:A0172:1:1:19263:1331 1:N:0:1 TTAGGC + @@CDDF @:89:A0172:1:1:24249:1331 1:N:0:1 TTAGGC + BCCFFF @:89:A0172:1:1:15721:1332 1:N:0:1 TTAGGC + <@
Hi Theo,

we already have nightly tests.


Are you use SequenceStream? What does your source code look like?

What are you reading the sequences in= to? DnaString? CharString? Can you give more details here?

Your snippet parses nicely with SequenceStream.

Currently, there is a limitation that when reading sequence into Dna5S= tring then any non-CGATN character causes an error. We will resolve this is= sue with a configuration object to the readRecord function in the future th= at allows to switch between error/coerce-to-N for other characters (e.g. when there are IUPAC characters indicating an A= -C ambiguity).

*m

From: Theodore Omtzigt [theo@stillwater-sc.= com]
Sent: Saturday, March 02, 2013 12:00 AM
To: SeqAn Dev List
Subject: [Seqan-dev] Fastq test files

I just got a set of FASTQ test files from Illumina BaseSpace and SeqAn= is barfing on them reporting INVALID_FORMAT.

s_G1_L001_I1_001.fastq.1,
s_G1_L001_I1_002.fastq.1,
s_G1_L001_R1_001.fastq.1,
s_G1_L001_R1_002.fastq.1,
s_G1_L001_R2_001.fastq.1,
s_G1_L001_R2_002.fastq.1
Here is a quick snippet of the first file

@:89:A0172:1:1:12008:1323 1:= N:0:1
TTAGGC
+
;B@FFF
@:89:A0172:1:1:15627:1329 1:N:0:1
TTAGGC
+
@CCFFF
@:89:A0172:1:1:19263:1331 1:N:0:1
TTAGGC
+
@@CDDF
@:89:A0172:1:1:24249:1331 1:N:0:1
TTAGGC
+
BCCFFF
@:89:A0172:1:1:15721:1332 1:N:0:1
TTAGGC
+
<@<DAD
@:89:A0172:1:1:15433:1333 1:N:0:1
TTAGGC


Would it be possible to include a couple of very short test files in the Se= qAn src tree, say under seqan/data, that do pass successfully through readR= ecord() so that software development can continue while I/O issues are sort= ed out?

Would also be nice to know why these Illumina BaseSpace files don't pass.