i trying sort duplicate values column csv file not getting expected result in python.
input file: (.csv)
column names:
uniprot acc, pdb id, ligand id, structure title, uniprot recommended name, gene name, macromolecular name
i want sort duplicates values , single of uniport acc column along pdb id , ligand id.
input file: uni port acc pdb id ligand id * p0aet8 1ahi nai * p04036 1arz nai * q59771 1c1d nai * p0c0f4 1dlj nai * q9qyy9 1e3e nai * q9qyy9 1e3i nai * q14376 1ek6 nai * q16836 1f17 nai * p0aet8 1fmc nai * q46220 1giq nai * p97852 1gz6 nai * p07195 1i0z nai * p00338 1i10 nai * p11986 1jki nai * p10760 1ky5 nai * q2rsb2 1l7e nai * q27743 1ldg nai * o32080 1lsu nai * p00334 1mg5 nai * p26392 1n2s nai * p9wgt1 1nfq nai * p0abh7 1nxg nai * p05091 1nzw nai * p05091 1nzz nai * p27443 1o0s nai * p0a6d5 1o9b nai * p20974 1og4 nai * p11986 1p1j nai expected result: uni port acc pdb id ligand id * p0aet8 1ahi nai * p0aet8 1fmc nai * p04036 1arz nai * q59771 1c1d nai * p0c0f4 1dlj nai * q9qyy9 1e3e nai * q9qyy9 1e3i nai . . . want sort how many uniport acc id same pdb id along single id, no need remove id.
code:
import csv import re import sys import os f1 = csv.reader(open('one.csv', 'rb')) writer = csv.writer(open("output_file_1.csv", "wb")) def has_duplicates(f1): in range(0, len(f1)): x in range(i + 1, len(f1)): if f1[i] == f1[x]: var = f1[i] writer.writerow(var)
you can first store values in list can find duplicate values in sorted order. see below code.
import csv import re import sys import os f1 = csv.reader(open('one.csv', 'rb')) writer = csv.writer(open("output_file_1.csv", "wb")) def has_duplicates(f1): list = [] in range(0, len(f1)): list.append(f1[i]) var in set([x x in list if list.count(x) > 1]): writer.writerow(var) # print duplicate values in sorted list
new edits pr expected result
if can use sorted
give expected result little bit difference there. can use following code expected result.
def sort_duplicates(f1): in range(0, len(f1)): f1.insert(f1.index(f1[i])+1, f1[i]) f1.pop(i+1) var in f1: writer.writerow(var)
i have tested list. result screen shot..
>>> a=['p0aet8', 'q59771', 'p0c0f4','dfc4h', 'p0aet8','q59771','acg5d'] >>> print sorted(a) ['acg5d', 'dfc4h', 'p0aet8', 'p0aet8', 'p0c0f4', 'q59771', 'q59771']
and if use above code result.
>>> a=['p0aet8', 'q59771', 'p0c0f4','dfc4h', 'p0aet8','q59771','acg5d'] >>> in range(0,len(a)): ... a.insert(a.index(a[i])+1, a[i]) ... a.pop(i+1) >>> print ['p0aet8', 'p0aet8', 'q59771', 'q59771', 'p0c0f4', 'dfc4h', 'acg5d']
Comments
Post a Comment