r - How to specify covariates in a regression model -
the dataset analyse looks this
n <- 4000 tmp <- t(replicate(n, sample(49,6))) dat <- matrix(0, nrow=n, ncol=49) colnames(dat) <- paste("p", 1:49, sep="") dat <- as.data.frame(dat) dat[, "win.frac"] <- rnorm(n, mean=0.0176504, sd=0.002) (i in 1:nrow(dat)) (j in 1:6) dat[i, paste("p", tmp[i, j], sep="")] <- 1 str(dat)
now perform regression depended variable win.frac
, other variables (p1
, ..., p49
) explanatory variables.
however, approaches tried coefficient p49
na, message "1 not defined because of singularities". tried
modspec <- paste("win.frac ~", paste("p", 1:49, sep="", collapse=" + ")) fit1 <- lm(as.formula(modspec), data=dat) fit2 <- lm(win.frac ~ ., data=dat)
interestingly, regression works if use 48 explanatory variables. may (p2, ..., p49) or may not (p1, ..., p48) contain p49, hence think not related variable p49 itself. tried larger values of n
, same result.
i tried betareg
betareg
package, since win.frac
restricted between 0 , 1. regression in case fails too, error message (roughly translated) "error in optim(...): non-finite value of optim specified"
library(betareg) fit3 <- betareg(as.formula(modspec), data=dat, link="log")
now stuck. how can perform regression? there maximum of variables? problem due fact explanatory variables either 0 or 1?
any hint appreciated!
i assume dummy encoded factor variables.
if following can see perfect fit if try model 1 of regressors others:
regressormod <- lm(p49 ~ . - win.frac, data = dat) summary(regressormod)$r.sq #[1] 1
it's (mathematically) impossible include coeffcients dummy-encoded factor variables in regression model includes intercept (see answer on cross validated). that's why r excludes 1 factor level default if let dummy encoding you.
Comments
Post a Comment