FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] SAM issue?

<-- thread -->
<-- date -->
  • From: Mat <matthias.dodt@mdc-berlin.de>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Mon, 21 Mar 2011 11:11:13 +0100
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] SAM issue?

Hi Manuel!

I tried this but the generated SAM looks fine. Still i get the wrong coordinates displayed- I simplified the loop to a basic output (cpp attached).

So i tried to increase the filesize to see what happens:

head -n 70000 test.sam > bug2.sam
illumina_80bp_3kb.000000571
19626,19706

head -n 80000 test.sam > bug2.sam
illumina_80bp_3kb.000000571
19628,19708

head -n 90000 test.sam > bug2.sam
illumina_80bp_3kb.000000571
19629,19709

Pretty weired! Looks to me like an overflow problem? I am running a Mac with OSX...

Could you check if you run into the same problems on windows?

I don't want to spam the list with parts the SAM file but you can download the entire sam file from here:

http://rapidshare.com/files/453617711/test.sam

Thanks!

best,

mat
 



Am 3/18/11 5:06 PM, schrieb Manuel Holtgrewe:
Another thing you could try is to read the SAM file, and write it out immediately after and see whether there are any problems with the generated SAM.

*m

Am 18.03.2011 um 17:05 schrieb Mat:

Hi Manuel&thanks for the reply!

I only send you the head of the SAM file i used- i don't get the issue with that one either - so now i have to figure out what's wrong there...

Ill text you guys if i find an issue - thanks a lot!

cheers

mat

Am 3/18/11 4:46 PM, schrieb Manuel Holtgrewe:

Hi, Mat.

Here is what I get:

$ ./demos/SAMtest -s bug.sam | grep -A 1 571
illumina_80bp_3kb.000000571
illumina_80bp_3kb.000000571
19625,19705 and 22480,22400

Which is OK since indices are 1-based in SAM and 0-based in SeqAn.

I just saw that there were uninitialized variables in your code, try to compile it with all warnings enabled and in optimized mode (the compiler only detects certain things when you tell him to think harder about the code). Me getting the right values (probably my OS initializes data differently from yours), and you wrong indicates problems with initializations.

Normally, I would also point you at valgrind, but it did not find the uninitialized memory on the stack :)

Below is the C++ output:

Building CXX object demos/CMakeFiles/SAMtest.dir/Users/manuel/Development/seqan-trunk/projects/library/demos/SAMtest.cpp.o
/Users/manuel/Development/seqan-trunk/projects/library/demos/SAMtest.cpp: In function ‘int main(int, const char**)’:
/Users/manuel/Development/seqan-trunk/projects/library/demos/SAMtest.cpp:57: warning: ‘cId1’ may be used uninitialized in this function
/Users/manuel/Development/seqan-trunk/projects/library/demos/SAMtest.cpp:57: warning: ‘id1’ may be used uninitialized in this function



HTH

Am 18.03.2011 um 15:52 schrieb Mat:

Hey guys!

I have a strange issue loading a SAM-File into a FragmentStore. The SAM file has the following paired-read alignment:
...
illumina_80bp_3kb.000000571     99      contig00002     19626   60      80M     =       22401   2855
illumina_80bp_3kb.000000571     147     contig00002     22401   60      80M     =       19626   -285


I load this SAM file into a FragmentStore and sort it by PairMatchId. Then i extract all the pairs that map on the same contig:

illumina_80bp_3kb.000000571
illumina_80bp_3kb.000000571
19638,19718 and 22495,22415

I don't really get why these coordinates differ?!

Thanks a lot!!

(see sourcecode attached)...
<bug.sam><SAMtest.cpp><ATT00001..txt>

<ATT00001..txt>

//============================================================================
// Name        : SAMtest.cpp
// Author      : 
// Version     :
// Copyright   : Your copyright notice
// Description : Hello World in C++, Ansi-style
//============================================================================

#include <iostream>
#include <iostream>
#include <sstream>
#include <seqan/sequence.h>  // CharString, ...
#include <seqan/file.h>      // to stream a CharString into cout
#include <seqan/misc/misc_cmdparser.h>
#include <seqan/store.h>
#include <seqan/file.h>
#include <seqan/graph_types.h>

using namespace seqan;

int main(int argc, const char* argv[]) {
	CommandLineParser cmdParser("parser");
	addOption(cmdParser,CommandLineOption("s","sam", "SAM file(s) with aligned reads", OptionType::String));

	if(!parse(cmdParser,argc,argv,std::cerr)){
			std::cerr << "Error parsing command line. Check the mandatory parameters." << std::endl;
			help(cmdParser,std::cerr);
			return -1;
		}
	std::string SAMFileName;
	typedef FragmentStore<> TFragmentStore;
	TFragmentStore fragStore;

	typedef TFragmentStore::TAlignedReadStore ARStore;
	typedef TFragmentStore::TReadStore TReadStore;
	typedef Value<TReadStore>::Type TReadStoreElement;

	typedef unsigned int TID;
	typedef unsigned int TValue;

	if(getOptionValueLong(cmdParser,"sam",SAMFileName)&&(length(SAMFileName)>0)){
			std::cout << "Loading SAM file " << SAMFileName << std::endl;
			std::fstream strmReads(SAMFileName.c_str(), std::fstream::in | std::fstream::binary);
			if(!strmReads.is_open()){
				std::cerr << "Couldn't open file " << SAMFileName.c_str() << std::endl;
				return -1;
			}
			else{
				read(strmReads, fragStore, seqan::Sam());
				std::cout << "Reads cached..." << std::endl;


				unsigned int nrAlignments = length(fragStore.alignedReadStore);
				//sortAlignedReads(fragStore.alignedReadStore, SortPairMatchId());
				//sortAlignedReads(fragStore.alignedReadStore, SortId());

				//std::fstream strmWrite("/Users/mat/test2.sam",std::fstream::out | std::fstream::binary);
				//write(strmWrite,fragStore,seqan::Sam());

				TID lastPairMatchId = TReadStoreElement::INVALID_ID;
				TID id1,id2,cId1,cId2;
				//Iterate over aligned reads (sorted by pairMatchId)
				for (unsigned int n=0;n<nrAlignments;++n){
					//if current alignment has a mate...
					std::cout << "n:" << n << " id: " << fragStore.alignedReadStore[n].id << std::endl;
					id1 = fragStore.alignedReadStore[n].id;
					TValue start1 = fragStore.alignedReadStore[id1].beginPos;
					TValue end1 = fragStore.alignedReadStore[id1].endPos;
					//TValue l1 = length(fragStore.contigStore[cId1].seq);
																	//	TValue l2 = length(fragStore.contigStore[cId2].seq);
					std::cout << fragStore.readNameStore[fragStore.alignedReadStore[id1].readId]<<std::endl;
											std::cout << start1 << "," << end1 << std::endl;

					if(fragStore.readNameStore[fragStore.alignedReadStore[id1].readId]=="illumina_80bp_3kb.000000571"){
						//std::cout << fragStore.readNameStore[fragStore.alignedReadStore[id1].readId]<<std::endl;
						//std::cout << start1 << "," << end1 << std::endl;
											exit(0);
					}

					/*if(fragStore.alignedReadStore[n].pairMatchId!=TReadStoreElement::INVALID_ID){

						//indexed by pairMatchID
						if(lastPairMatchId == fragStore.alignedReadStore[n].pairMatchId){
							id2 = fragStore.alignedReadStore[n].id;
							cId2 = fragStore.alignedReadStore[n].contigId;

							//now check if contigs do not span...
							if(cId1==cId2){
								//Now check coordinates
								TValue start1 = fragStore.alignedReadStore[id1].beginPos;
								TValue end1 = fragStore.alignedReadStore[id1].endPos;
								TValue start2 = fragStore.alignedReadStore[id2].beginPos;
								TValue end2 = fragStore.alignedReadStore[id2].endPos;
								//TValue l1 = length(fragStore.contigStore[cId1].seq);
																				//	TValue l2 = length(fragStore.contigStore[cId2].seq);


								if(fragStore.readNameStore[fragStore.alignedReadStore[id1].readId]=="illumina_80bp_3kb.000000571"){
									std::cout << fragStore.readNameStore[fragStore.alignedReadStore[id1].readId]<<std::endl;
									std::cout << fragStore.readNameStore[fragStore.alignedReadStore[id2].readId]<<std::endl;
									std::cout << start1 << "," << end1 << " and " << start2 << "," << end2 << std::endl;
								}



							}

							lastPairMatchId = TReadStoreElement::INVALID_ID;
						}
						else {
							id1 = fragStore.alignedReadStore[n].id;
							cId1 = fragStore.alignedReadStore[n].contigId;
							lastPairMatchId = fragStore.alignedReadStore[n].pairMatchId;
						}
					}*/
				}
			}
		}
		else {
			std::cout << "No SAM file specified...\n" << std::endl;
		}

	return 0;
}
<-- thread -->
<-- date -->
  • Follow-Ups:
    • Re: [Seqan-dev] SAM issue?
      • From: "Holtgrewe, Manuel" <manuel.holtgrewe@fu-berlin.de>
  • References:
    • [Seqan-dev] SAM issue?
      • From: Mat <matthias.dodt@mdc-berlin.de>
    • Re: [Seqan-dev] SAM issue?
      • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
    • Re: [Seqan-dev] SAM issue?
      • From: Mat <matthias.dodt@mdc-berlin.de>
    • Re: [Seqan-dev] SAM issue?
      • From: Manuel Holtgrewe <manuel.holtgrewe@fu-berlin.de>
  • seqan-dev - March 2011 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal