python - Remove columns where all items in column are identical (excluding header) and match a specified string -


my question extension of delete column in pandas based on condition, have headers , information isn't binary. instead of removing column containing zeros, i'd able pass variable "search_var" (containing string) filter out columns containing string.

i thought should read in df , iterate across each column, read each column in list, , print columns len(col_list) > 2 , search_var not in col_list. solution provided previous post involving boolean dataframe (df != search_var) intrigued me there might simpler way, how go around issue header not match , therefore cannot purely filter on true/false?

what have (non-working):

import pandas pd df = pd.read_table('input.tsv', dtype=str) open('output.tsv', 'aw') ofh:     df['col_list'] = list(df.values)     if len(col_list) < 3 , search_var not in col_list:         df.to_csv(ofh, sep='\t', encoding='utf-8', header=false) 

example input, search_var = 'red'

name  header1 header2 header3 name1 red     red     red name2 red     orange  red name3 red     yellow  red name4 red     green   red name5 red     blue    blue 

expected output

name  header2 header3 name1 red     red name2 orange  red name3 yellow  red name4 green   red name5 blue    blue 

you can check number of non-red item in column, if not 0 select using loc:

df.loc[:, (df != 'red').sum() != 0]  #    name   header2   header3 # 0 name1       red       red # 1 name2    orange       red # 2 name3    yellow       red # 3 name4     green       red # 4 name5      blue      blue 

Comments