i have large (~500,000 x ~500,000) sparse matrix in r, , trying divide each column sum:
sm = t(t(sm) / colsums(sm))
however, when following error:
# error in evaluating argument 'x' in selecting method function 't': # error: cannot allocate vector of size 721.1 gb
is there better way in r? able store colsums
fine, compute , store transpose of sparse matrix, problem seems arrive when trying perform "/"
. looks sparse matrix converted full dense matrix here.
any appreciated. thank you!
this can do, assuming a
dgcmatrix
:
a@x <- a@x / rep.int(colsums(a), diff(a@p))
this requires understanding of dgcmatrix
class.
@x
stores none-zero matrix values, in packed 1d array;@p
stores cumulative number of non-zero elements column, hencediff(a@p)
gives number of non-zero elements each column.
we repeat each element of colsums(a)
number of none-zero elements in column, divide a@x
vector. in end, update a@x
rescaled values. in way, column rescaling done in sparse manner.
example:
library(matrix) set.seed(2); <- matrix(rbinom(100,10,0.05), nrow = 10) #10 x 10 sparse matrix of class "dgcmatrix" # [1,] . . 1 . 2 . 1 . . 2 # [2,] 1 . . . . . 1 . 1 . # [3,] . 1 1 1 . 1 1 . . . # [4,] . . . 1 . 2 . . . . # [5,] 2 . . . 2 . 1 . . . # [6,] 2 1 . 1 1 1 . 1 1 . # [7,] . 2 . 1 2 1 . . 2 . # [8,] 1 . . . . 3 . 1 . . # [9,] . . 2 1 . 1 . . 1 . #[10,] . . . . 1 1 . . . . diff(a@p) ## number of non-zeros per column # [1] 4 3 3 5 5 7 4 2 4 1 colsums(a) ## column sums # [1] 6 4 4 5 8 10 4 2 5 2 a@x <- a@x / rep.int(colsums(a), diff(a@p)) ## sparse column rescaling #10 x 10 sparse matrix of class "dgcmatrix" # [1,] . . 0.25 . 0.250 . 0.25 . . 1 # [2,] 0.1666667 . . . . . 0.25 . 0.2 . # [3,] . 0.25 0.25 0.2 . 0.1 0.25 . . . # [4,] . . . 0.2 . 0.2 . . . . # [5,] 0.3333333 . . . 0.250 . 0.25 . . . # [6,] 0.3333333 0.25 . 0.2 0.125 0.1 . 0.5 0.2 . # [7,] . 0.50 . 0.2 0.250 0.1 . . 0.4 . # [8,] 0.1666667 . . . . 0.3 . 0.5 . . # [9,] . . 0.50 0.2 . 0.1 . . 0.2 . #[10,] . . . . 0.125 0.1 . . . .
@thelatemail mentioned method, first converting dgcmatrix
dgtmatrix
:
aa <- as(a, "dgtmatrix") a@x <- a@x / colsumns(a)[aa@j + 1l]
for dgtmatrix
class there no @p
@j
, giving column index (0 based) none 0 matrix elements.
Comments
Post a Comment