what i'm doing feeding python script csv file contains millions of records separated commas. strings "contained double qoutes".
i pass .csv file through python script
import csv import string import sys, getopt infile = open(sys.argv[1], 'r') outfile = open(sys.argv[1][:-4] + '_no-nulls.csv', 'w') data = csv.reader(infile) writer = csv.writer(outfile) specials = "null" line in data: line = [value.replace(specials, '') value in line] writer.writerow(line) infile.close() outfile.close()
and end result has quotes stipped off strings.
doing wrong?
edit
sample input:
897555,2021-03-31 00:00:00.000,null,"45687","b","qa",29,null,null,null,null,null,null,null,"5648987qexxx",6,null,null,"doe","john",null,null,null,null,null,"q",1994-04-24 00:00:00.000,"r","cx","zz",null,null,null,null,null,"y",null,"ga","r","de",null,null,null,null,null,"en",null,"y","op",null,"r","xz",null,null,null,"8945564",2005-03-01 12:00:00.000,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null
sample output:
897555,2021-03-31 00:00:00.000,,"45687","b","qa",29,,,,,,,,"5648987qexxx",6,,,"doe","john",,,,,,"q",1994-04-24 00:00:00.000,"r","cx","zz",,,,,,"y",,"ga","r","de",,,,,,"en",,"y","op",,"r","xz",,,,"8945564",2005-03-01 12:00:00.000,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
this normal. when reading, csv.reader
strip off quotes because it's assumed program consuming data doesn't want or need them. csv.writer
put them on if necessary, depending on setting of quoting
pass, default being quote_minimal
- add quotes if there characters in string misinterpreted.
you set both reader , writer quote_none
preserve quotes in original file, or set writer quote_all
requote output.
Comments
Post a Comment