python - Why Eigen values of adajcency matrix are actually the sentence scores in Textrank -


here route textrank:

  1. document summarized expressed tf-idf matrix
  2. (tf-idf matrix)*(tf-idf matrix).transpose = adjacency matrix of graph vertices sentences of above document
  3. page rank applied on graph -> returns pr values of each sentence

now, this pr values eigen values of adjacency matrix
physical meaning or intuition behind this.?

why eigen values ranks ?

here link page rank: http://www.cs.princeton.edu/~chazelle/courses/bib/pagerank.htm

here extract above page:
pagerank or pr(a) can calculated using simple iterative algorithm, , corresponds principal eigenvector of normalized link matrix of web.

link textrank: https://joshbohde.com/blog/document-summarization

to begin with, question bit mistaken. eignevalues not scores. rather, entries of stationary eigenvector scores.

textrank works on graphical approach words. has number of variations, have following common steps:

  1. create weighted graph vertices entities (words or sentences), , weights transition probabilities between entities.

  2. find stochastic matrix associated graph, , score each entity according stationary distribution.

in case, graph built follows. first, matrix built rows sentences , columns words. entries of matrix specified tf-idf. find similarity between sentences, normalized matrix multiplied transform. because, each 2 sentences , word, there similarity between sentences based on product of tf-idf of word in each sentence, , need sum on words. if think bit, summing products matrix multiplication transpose does.

so have stochastic matrix p can interpreted probability of transition sentence i sentence j. score stationary distribution x, means that

p x = x = 1 x.

this means x eigenvector associated eigenvalue 1. perron-frobenius theorem, eigenvector exists under mild conditions, , 1 largest eigenvalue. last part pagerank.


Comments