how vocabarray out of lda model (org.apache.spark.ml.clustering.lda) . getting vocabsize returns number of words scanned.
ideally need array of actual words model , based on termindices want see words inside bucket.
i need in scala. suggestion helpful.
things have tried till now, topicindices dataframe
topicindices: org.apache.spark.sql.dataframe = [topic: int, termindices: array<int>, termweights: array<double>]
i trying fetch topics this
val topics = topicindices.map { case (terms, termweights) => terms.zip(termweights).map { case (term, weight) => (vocabarray(term.toint), weight) } }
but throws following error
> val topics = topicindices.map { case (terms, termweights) => terms.zip(termweights).map { case (term, weight) => (vocabarray(term.toint), weight) } } <console>:96: error: constructor cannot instantiated expected type; found : (t1, t2) required: org.apache.spark.sql.row val topics = topicindices.map { case (terms, termweights) => ^ <console>:97: error: not found: value terms terms.zip(termweights).map { case (term, weight) => (vocabarray(term.toint), weight) } ^
got issue solved. here missing piece. once df describetopics here code can corresponding words. (note: code working ml library lda )
val topicdf = model.describetopics(maxtermspertopic = 10) ((row) <- topicdf) { val topicnumber = row.get(0) val topicterms = row.get(1) println ("topic: "+ topicnumber) } import scala.collection.mutable.wrappedarray val vocab = vectorizer.vocabulary ((row) <- topicdf) { val topicnumber = row.get(0) //val terms = row.get(1) val terms:wrappedarray[int] = row.get(1).asinstanceof[wrappedarray[int]] ((termidx) <- 0 until 4) { println("topic:" + topicnumber + " word:" + vocab(termidx)) } } topicdf.printschema import org.apache.spark.sql.row topicdf.collect().foreach { r => r match { case _: row => ("topic:" + r) case unknow => println("something else") } } topicdf.collect().foreach { r => { println("topic:" + r(0)) val terms:wrappedarray[int] = r(1).asinstanceof[wrappedarray[int]] terms.foreach { t => { println("term:" + vocab(t)) } } } }
Comments
Post a Comment