this code:
import pandas import datetime decimal import decimal file_ = open('myfile.csv', 'r') result = pandas.read_csv( file_, header=none, names=('sec', 'date', 'sale', 'buy'), usecols=('date', 'sale', 'buy'), parse_dates=['date'], iterator=true, chunksize=100, compression=none, engine="c", date_parser=lambda dt: datetime.datetime.strptime(dt, '%y%m%d %h:%m:%s.%f'), converters={'sale': (lambda u: decimal(u)), 'buy': (lambda u: decimal(u))} )
and try...
result.get_chunk()
only error this:
cparsererror: error tokenizing data. c error: expected 3 fields in line 3, saw 4
from file (i show first 4 lines - file has no header, , lines have format):
eur/usd,20160701 00:00:00.071,1.11031,1.11033 eur/usd,20160701 00:00:00.255,1.11031,1.11033 eur/usd,20160701 00:00:00.256,1.11025,1.11033 eur/usd,20160701 00:00:00.258,1.11027,1.11033 ... > l0.000.000 lines these
my intention object iterate chunks , not have whole crap in memory (the actual file has 560mb!). want discard first column (there 4 columns since file has same value in first column, want discard such column). want keep columns 1, 2, , 3 (discarding 0) date, sale, , purchase price.
actually first attempt pandas, since former solution used standard python csv module, , takes lot of time.
what missing? why getting such error?
#try code import pandas pd import numpy np import csv # print 3 columns , create data frame,to give names columns in csv file ',' seperator myfile.csv: sec,date,sale,buy eur/usd,20160701 00:00:00.071,1.11031,1.11033 eur/usd,20160701 00:00:00.255,1.11031,1.11033 eur/usd,20160701 00:00:00.256,1.11025,1.11033 eur/usd,20160701 00:00:00.258,1.11027,1.11033 data = pd.read_csv('myfile.csv',sep=',') df = pd.dataframe({'date':data.date,'sale':data.sale,'buy':data.buy}) print(df) output: buy date sale 0 1.11033 20160701 00:00:00.071 1.11031 1 1.11033 20160701 00:00:00.255 1.11031 2 1.11033 20160701 00:00:00.256 1.11025 3 1.11033 20160701 00:00:00.258 1.11027
Comments
Post a Comment