python - getting records which are different from two fastq files -


i have 2 fastq files f1.fastq , f2.fastq. f2.fastq smaller file subset of reads f1.fastq. want reads in f1.fastq not in f2.fastq. following python code not seem work. can suggest edits?

needed_reads = []  reads_array = []  chosen_array = []  x in bio.seqio.parse("f1.fastq","fastq"):          reads_array.append(x)  y in bio.seqio.parse("f2.fastq","fastq"):          chosen_array.append(y)  y in chosen_array:          x in reads_array:                  if str(x.seq) != str(y.seq) : needed_reads.append(x)  output_handle = open("diff.fastq","w")  seqio.write(needed_reads,output_handle,"fastq")  output_handle.close() 

you can use sets accomplishing requirement , can convert list1 set , list2 set , , set(list1) - set(list2) , give items in list1 not in list2 .

sample code -

needed_reads = []  reads_array = []  chosen_array = []  x in bio.seqio.parse("f1.fastq","fastq"):          reads_array.append(x)  y in bio.seqio.parse("f2.fastq","fastq"):          chosen_array.append(y)  needed_reads = list(set(reads_array) - set(chosen_array))  output_handle = open("diff.fastq","w")  seqio.write(needed_reads,output_handle,"fastq")  output_handle.close() 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -