i have dataframe, interested in relationship between 2 categorical variables type , location, type has 5 levels , location has 20 levels.
i want plot percentage of types each location. wanted know if there concise way of doing using ggplot2 ?
in case variable in x axis has 20 levels running spacing issues, appreciated
edit: more concrete example:
df gender beverage 1 female coke 2 male bear 3 male coke 4 female bear 5 male tea 6 male bear 7 female water 8 female tea 9 female bear 10 male tea
i want plot gender wise percentage of each beverage, eg: there 3 tea drinkers of 2 male , 1 female male % 66.67 , female percentage 33.33 in x axis corresponding tea there should 2 bars male y = 66.67 , female y = 33.33.
the easiest way pre-process, since have calculate percentages separately gender. use complete
make sure have 0 percent bars explicitly in data.frame, otherwise ggplot
ignore bar , widen other gender's bar.
library(dplyr) library(tidyr) df2 <- df %>% group_by(gender, beverage) %>% tally() %>% complete(beverage, fill = list(n = 0)) %>% mutate(percentage = n / sum(n) * 100) ggplot(df2, aes(beverage, percentage, fill = gender)) + geom_bar(stat = 'identity', position = 'dodge') + theme_bw()
or other way around:
df3 <- df %>% group_by(beverage, gender) %>% tally() %>% complete(gender, fill = list(n = 0)) %>% mutate(percentage = n / sum(n) * 100) ggplot(df3, aes(beverage, percentage, fill = gender)) + geom_bar(stat = 'identity', position = 'dodge') + theme_bw()
Comments
Post a Comment