what i'm doing feeding python script csv file contains millions of records separated commas. strings "contained double qoutes".
i pass .csv file through python script
import csv import string import sys, getopt infile = open(sys.argv[1], 'r') outfile = open(sys.argv[1][:-4] + '_no-nulls.csv', 'w') data = csv.reader(infile) writer = csv.writer(outfile) specials = "null" line in data: line = [value.replace(specials, '') value in line] writer.writerow(line) infile.close() outfile.close() and end result has quotes stipped off strings.
doing wrong?
edit
sample input:
897555,2021-03-31 00:00:00.000,null,"45687","b","qa",29,null,null,null,null,null,null,null,"5648987qexxx",6,null,null,"doe","john",null,null,null,null,null,"q",1994-04-24 00:00:00.000,"r","cx","zz",null,null,null,null,null,"y",null,"ga","r","de",null,null,null,null,null,"en",null,"y","op",null,"r","xz",null,null,null,"8945564",2005-03-01 12:00:00.000,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null sample output:
897555,2021-03-31 00:00:00.000,,"45687","b","qa",29,,,,,,,,"5648987qexxx",6,,,"doe","john",,,,,,"q",1994-04-24 00:00:00.000,"r","cx","zz",,,,,,"y",,"ga","r","de",,,,,,"en",,"y","op",,"r","xz",,,,"8945564",2005-03-01 12:00:00.000,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
this normal. when reading, csv.reader strip off quotes because it's assumed program consuming data doesn't want or need them. csv.writer put them on if necessary, depending on setting of quoting pass, default being quote_minimal - add quotes if there characters in string misinterpreted.
you set both reader , writer quote_none preserve quotes in original file, or set writer quote_all requote output.
Comments
Post a Comment