i'm doing bit more complex operation on dataframe compare 2 rows can anywhere in frame.
here's example:
import pandas pd import numpy np d = {'a':['a','a','c','e','e','b','b'],'b':['c','f','a','b','d','a','e']\ ,'aw':[1,2,3,4,5,6,7],'bw':[10,20,30,40,50,60,70]} p = pd.dataframe(d) p = p.sort_values(['a','b']) p['ab'] = p.a+'_'+p.b p['awbw'] = p.aw+p.bw
now doing here have pairings of strings in a
, b
, example a_c
call ab
. , have reverse pairing c_a
well. sum on numbers aw
, bw
each pairing, called awbw
.
now want subtract summed value of a_c
value of c_a
, same thing every string pairing both variants exist. other values should nan
, result should this:
aw b bw ab awbw rowdelta 0 1 c 10 a_c 11 -22.0 1 2 f 20 a_f 22 nan 5 b 6 60 b_a 66 nan 6 b 7 e 70 b_e 77 33.0 2 c 3 30 c_a 33 22.0 3 e 4 b 40 e_b 44 -33.0 4 e 5 d 50 e_d 55 nan
i have solved way this, there's 1 problem left i'm stuck at.
here's solution far:
for i,row in p.iterrows(): p.ix[i,'rowdelta'] = row['awbw']\ - p[(p['a'] == row.ab[2]) & (p['b'] == row.ab[0])]['awbw'].get(0,np.nan)
the problem p[(p['a'] == row.ab[2]) & (p['b'] == row.ab[0])]['awbw']
returns series either empty or has 1 element index variable.
now series.get
method solves problem of returning nan
when series empty wants definitive index value, in case use 0
, can not dynamic index there.
i can not example
t = p[(p['a'] == row.ab[2]) & (p['b'] == row.ab[0])]['awbw'] t.get(t.index[0],np.nan)
because there no index if series empty , leads error when doing t.index[0]
. same goes attempts using iloc
.
is there way dynamically unknown 1 index of series if has 1 element (and never more one) while @ same time handling case of empty series?
credit goes pirsquared pointing me right direction solution:
ab = p.ab.str.split('_', expand=true) ab = ab.merge(ab, left_on=[0, 1], right_on=[1, 0],how='inner')[[0,1]] ab = ab.merge(p,left_on=[0,1], right_on=['a','b'])[['a','aw','b','bw']] ab = ab.merge(p,left_on=['a','b'], right_on=['b','a'])[['aw_x','bw_x','aw_y','bw_y','ab']] ab['rowdelta'] = ab.aw_y+ab.bw_y-ab.aw_x-ab.bw_x p = p.merge(ab[['ab','rowdelta']],on='ab',how='outer')
maybe can made shorter or nicer, works sure.
Comments
Post a Comment