memory - merging really not that large data.tables immediately results in R being killed -
i have 32gb of ram on machine, can r killed faster ;)
example
the goal here achieve rbind()
of 2 data.tables using functions make use of data.table's efficiency.
input:
rm(list=ls()) gc()
output:
used (mb) gc trigger (mb) max used (mb) ncells 1604987 85.8 2403845 128.4 2251281 120.3 vcells 3019405 23.1 537019062 4097.2 468553954 3574.8
input:
tmp.table <- data.table(x1=sample(1:7,4096000,replace=true), x2=as.factor(sample(1:2,4096000,replace=true)), x3=sample(1:1000,4096000,replace=true), x4=sample(1:256,4096000,replace=true), x5=sample(1:16,4096000,replace=true), x6=rnorm(4096000)) setkey(tmp.table,x1,x2,x3,x4,x5,x6) join.table <- data.table(x1 = integer(), x2 = factor(), x3 = integer(), x4=integer(), x5 = integer(), x6 = numeric()) setkey(join.table,x1,x2,x3,x4,x5,x6) tables()
output:
name nrow mb cols key [1,] join.table 0 1 x1,x2,x3,x4,x5,x6 x1,x2,x3,x4,x5,x6 [2,] tmp.table 4,096,000 110 x1,x2,x3,x4,x5,x6 x1,x2,x3,x4,x5,x6 total: 111mb
input:
join.table <- merge(join.table,tmp.table,all.y=true)
output:
ha! nope. rstudio restarts session.
question
what's going on here? explicitly setting factor levels in join.table
had no effect. rbind()
instead of merge()
didn't help--exact same behavior. have done way more complicated , bulky things related data without problems.
version info
$platform [1] "x86_64-pc-linux-gnu" $arch [1] "x86_64" $os [1] "linux-gnu" $system [1] "x86_64, linux-gnu" $version.string [1] "r version 3.0.2 (2013-09-25)" $nickname [1] "frisbee sailing" > rstudio::versioninfo() $version [1] ‘99.9.9’ $mode [1] "server"
data.table version 1.8.11.
update: has been fixed in commit 1123 of v1.8.11. news:
o
rbindlist
@ least 1 factor column along presence of @ least 1 emptydata.table
resulted in segfault (or in linux/mac reported error related hash tables). fixed, #5355. trevor alexander reporting on (and mnel filing bug report): merging not large data.tables results in r being killed
this can reproduced single row data.table
factor
column , zero-row data.table factor column.
library(data.table) <- data.table(x=factor(1), key='x') b <- data.table(x=factor(), key='x') merge(b, a, all.y=true) # rstudio -> r encountered fatal error # r gui -> r windoze gui has stopped working
using debugonce(data.table:::merge.data.table)
can traced line rbind(dt,yy)
equivalent of
rbind(b,a)
which, if run it, give same error.
this has been reported package authors issue #5355
Comments
Post a Comment