scala - How to convert topic index to topic words in LDA -


how vocabarray out of lda model (org.apache.spark.ml.clustering.lda) . getting vocabsize returns number of words scanned.

ideally need array of actual words model , based on termindices want see words inside bucket.

i need in scala. suggestion helpful.

things have tried till now, topicindices dataframe

topicindices: org.apache.spark.sql.dataframe = [topic: int, termindices: array<int>, termweights: array<double>] 

i trying fetch topics this

val topics = topicindices.map { case (terms, termweights) =>       terms.zip(termweights).map { case (term, weight) => (vocabarray(term.toint), weight) }     } 

but throws following error

>   val topics = topicindices.map { case (terms, termweights) =>       terms.zip(termweights).map { case (term, weight) => (vocabarray(term.toint), weight) }     } <console>:96: error: constructor cannot instantiated expected type;  found   : (t1, t2)  required: org.apache.spark.sql.row        val topics = topicindices.map { case (terms, termweights) =>                                             ^ <console>:97: error: not found: value terms              terms.zip(termweights).map { case (term, weight) => (vocabarray(term.toint), weight) }              ^ 

got issue solved. here missing piece. once df describetopics here code can corresponding words. (note: code working ml library lda )

val topicdf = model.describetopics(maxtermspertopic = 10) ((row) <- topicdf) {         val topicnumber = row.get(0)         val topicterms  = row.get(1)         println ("topic: "+ topicnumber) }  import scala.collection.mutable.wrappedarray  val vocab = vectorizer.vocabulary  ((row) <- topicdf) {     val topicnumber = row.get(0)     //val terms = row.get(1)     val terms:wrappedarray[int] = row.get(1).asinstanceof[wrappedarray[int]]     ((termidx) <- 0 until 4) {         println("topic:" + topicnumber + " word:" + vocab(termidx))     } }  topicdf.printschema import org.apache.spark.sql.row  topicdf.collect().foreach { r =>                  r match {                         case _: row => ("topic:" + r)                         case unknow => println("something else")         } }  topicdf.collect().foreach { r => {                         println("topic:" + r(0))                         val terms:wrappedarray[int] = r(1).asinstanceof[wrappedarray[int]]                         terms.foreach {                                 t => {                                         println("term:" + vocab(t))                                 }                         }                 }         } 

Comments