python - Remove columns where all items in column are identical (excluding header) and match a specified string -
my question extension of delete column in pandas based on condition, have headers , information isn't binary. instead of removing column containing zeros, i'd able pass variable "search_var" (containing string) filter out columns containing string.
i thought should read in df , iterate across each column, read each column in list, , print columns len(col_list) > 2 , search_var not in col_list. solution provided previous post involving boolean dataframe (df != search_var) intrigued me there might simpler way, how go around issue header not match , therefore cannot purely filter on true/false?
what have (non-working):
import pandas pd df = pd.read_table('input.tsv', dtype=str) open('output.tsv', 'aw') ofh: df['col_list'] = list(df.values) if len(col_list) < 3 , search_var not in col_list: df.to_csv(ofh, sep='\t', encoding='utf-8', header=false)
example input, search_var = 'red'
name header1 header2 header3 name1 red red red name2 red orange red name3 red yellow red name4 red green red name5 red blue blue
expected output
name header2 header3 name1 red red name2 orange red name3 yellow red name4 green red name5 blue blue
you can check number of non-red
item in column, if not 0 select using loc
:
df.loc[:, (df != 'red').sum() != 0] # name header2 header3 # 0 name1 red red # 1 name2 orange red # 2 name3 yellow red # 3 name4 green red # 4 name5 blue blue
Comments
Post a Comment