r - How to specify covariates in a regression model -


the dataset analyse looks this

n <- 4000 tmp <- t(replicate(n, sample(49,6))) dat <- matrix(0, nrow=n, ncol=49) colnames(dat) <- paste("p", 1:49, sep="") dat <- as.data.frame(dat) dat[, "win.frac"] <- rnorm(n, mean=0.0176504, sd=0.002) (i in 1:nrow(dat))    (j in 1:6) dat[i, paste("p", tmp[i, j], sep="")] <- 1 str(dat) 

now perform regression depended variable win.frac , other variables (p1, ..., p49) explanatory variables.

however, approaches tried coefficient p49 na, message "1 not defined because of singularities". tried

modspec <- paste("win.frac ~", paste("p", 1:49, sep="", collapse=" + ")) fit1 <- lm(as.formula(modspec), data=dat) fit2 <- lm(win.frac ~ ., data=dat) 

interestingly, regression works if use 48 explanatory variables. may (p2, ..., p49) or may not (p1, ..., p48) contain p49, hence think not related variable p49 itself. tried larger values of n, same result.

i tried betareg betareg package, since win.frac restricted between 0 , 1. regression in case fails too, error message (roughly translated) "error in optim(...): non-finite value of optim specified"

library(betareg) fit3 <- betareg(as.formula(modspec), data=dat, link="log") 

now stuck. how can perform regression? there maximum of variables? problem due fact explanatory variables either 0 or 1?

any hint appreciated!

i assume dummy encoded factor variables.

if following can see perfect fit if try model 1 of regressors others:

regressormod <- lm(p49 ~ . - win.frac, data = dat) summary(regressormod)$r.sq #[1] 1 

it's (mathematically) impossible include coeffcients dummy-encoded factor variables in regression model includes intercept (see answer on cross validated). that's why r excludes 1 factor level default if let dummy encoding you.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -