lets have n lists known. each list has items, may repeat (not set) eg:
{a,a,b,c}, {a,b,c}, {b,b,b,c,c}
i need algorithm (some machine-learning 1 maybe?) answers following question:
given new & unknown partial list of items, example, {a,b}, probability c appear in list based on know previous lists. if possible, more fine-grained probability of: given partial list l, probability c appear in list once, probability appear twice, etc... order doesn't matter. probability of c appearing twice in {a,b} should equal appearing twice in {b,a}
any algorithms can this?
this pure mathematics, no actual "algorithms", estimate probabilities dataset (literally count occurences). in particular can simple data structure achieve goal. represent each "list" bag of letters, thus:
{a,a,b,c} -> {a:2, b:1, c:1} {a,b} -> {a:1, b:1}
etc. , create basic reverse indexing of sort, example keep indexes each letter separately, sorted counts.
now, when query comes, {a,b} + c
search data contains @ least 1 , 1 b (using indexes), , estimate probability computing fraction of retrived results containing c (or 1 c) vs. retrived results (this valid probability estimate assuming data bunch of independent samples underlying data-generating distribution).
alternatively, if alphabet small can precompute values p(c|{a,b})
etc. combinations of letters.
Comments
Post a Comment