python - getting records which are different from two fastq files -
i have 2 fastq files f1.fastq , f2.fastq. f2.fastq smaller file subset of reads f1.fastq. want reads in f1.fastq not in f2.fastq. following python code not seem work. can suggest edits?
needed_reads = [] reads_array = [] chosen_array = [] x in bio.seqio.parse("f1.fastq","fastq"): reads_array.append(x) y in bio.seqio.parse("f2.fastq","fastq"): chosen_array.append(y) y in chosen_array: x in reads_array: if str(x.seq) != str(y.seq) : needed_reads.append(x) output_handle = open("diff.fastq","w") seqio.write(needed_reads,output_handle,"fastq") output_handle.close()
you can use sets accomplishing requirement , can convert list1
set
, list2
set
, , set(list1) - set(list2)
, give items in list1
not in list2
.
sample code -
needed_reads = [] reads_array = [] chosen_array = [] x in bio.seqio.parse("f1.fastq","fastq"): reads_array.append(x) y in bio.seqio.parse("f2.fastq","fastq"): chosen_array.append(y) needed_reads = list(set(reads_array) - set(chosen_array)) output_handle = open("diff.fastq","w") seqio.write(needed_reads,output_handle,"fastq") output_handle.close()
Comments
Post a Comment