python - Evaluate slope and error for specific category for statsmodels ols fit -


i have dataframe df following fields: weight, length, , animal. first 2 continuous variables, while animal categorical variable values cat, dog, , snake.

i'd estimate relationship between weight , length, needs conditioned on type of animal, interact length variable animal categorical variable.

model = ols(formula='weight ~ length * animal', data=df) results = model.fit() 

how can programmatically extract slope of relationship between weight , length e.g. snakes? understand how manually: add coefficient length coefficient animal[t.snake]:length. cumbersome , manual, , requires me handle base case specially, i'd extract information automatically.

furthermore, i'd estimate error on slope. believe understand how calculate combining standard errors , covariances (more precisely, performing calculation here). more cumbersome above, , i'm wondering if there's shortcut extract information.

my manual method calculate these follows.

edit (06/22/2015): there seems error in original code below calculating errors. standard errors calculated in user333700's answer different ones calculate, haven't invested time in figuring out why.

def get_contained_animal(animals, p):     # relies on parameters of form animal[t.snake]:length.     in animals:         if in p:             return     return none  animals = ['cat', 'dog', 'snake'] slopes = {} errors = {} animal in animals:     slope = 0.     params = []     # if param related length variable ,     # animal in question, add slope.     param, val in results.params.iteritems():         ac = get_contained_animal(animals, param)         if (param == 'length' or              ('length' in param ,               ac none or ac == animal)):             params.append(param)             slope += val      # calculate overall error adding standard errors ,      # covariances.     tot_err = 0.     i, p1 in enumerate(params):         tot_err += results.bse[p1]*results.bse[p1]         j, p2 in enumerate(params[i:]):             # add covariance of these parameters             tot_err += 2*results.cov_params()[p1][p2]      slopes[animal] = slope     errors[animal] = tot_err**0.5 

this code might seem overkill, in real-world use case have continuous variable interacting 2 separate categorical variables, each large number of categories (along other terms in model need ignore these purposes).

very brief background:

the general question how prediction change if change on of explanatory variables, holding other explanatory variables fixed or averaging on those.

in nonlinear discrete models, there special margins method calculates this, although not implemented changes in categorical variables.

in linear model, prediction , change in prediction linear function of estimated parameters, , can (mis)use t_test calculate effect, standard error , confidence interval us.

(aside: there more helper methods in works statsmodels make prediction , margin calculations easier , available later in year.)

as brief explanation of following code:

  • i make similar example.
  • i define explanatory variables length = 1 or 2, each animal type
  • then, calculate difference in these explanatory variables
  • this defines linear combinations or contrast of parameters, can used in t_test.

finally, compare result predict check didn't make obvious mistakes. (i assume correct had written pretty fast.)

import numpy np import pandas pd  statsmodels.regression.linear_model import ols  np.random.seed(2) nobs = 20 animal_names = np.array(['cat', 'dog', 'snake']) animal_idx = np.random.random_integers(0, 2, size=nobs) animal = animal_names[animal_idx] length = np.random.randn(nobs) + animal_idx weight = np.random.randn(nobs) + animal_idx + length  data = pd.dataframe(dict(length=length, weight=weight, animal=animal))  res = ols.from_formula('weight ~ length * animal', data=data).fit() print(res.summary())   data_predict1 = data = pd.dataframe(dict(length=np.ones(3), weight=np.ones(3),                                          animal=animal_names))  data_predict2 = data = pd.dataframe(dict(length=2*np.ones(3), weight=np.ones(3),                                          animal=animal_names))  import patsy x1 = patsy.dmatrix('length * animal', data_predict1) x2 = patsy.dmatrix('length * animal', data_predict2)  tt = res.t_test(x2 - x1) print(tt.summary(xname=animal_names.tolist())) 

the result of last print is

                             test constraints                              ==============================================================================                  coef    std err          t      p>|t|      [95.0% conf. int.] ------------------------------------------------------------------------------ cat            1.0980      0.280      3.926      0.002         0.498     1.698 dog            0.9664      0.860      1.124      0.280        -0.878     2.811 snake          1.5930      0.428      3.720      0.002         0.675     2.511 

we can verify results using predict , compare difference in predicted weight if length given animal type increases 1 2:

>>> [res.predict({'length': 2, 'animal':[an]}) - res.predict({'length': 1, 'animal':[an]}) in animal_names] [array([ 1.09801656]), array([ 0.96641455]), array([ 1.59301594])] >>> tt.effect array([ 1.09801656,  0.96641455,  1.59301594]) 

note: forgot add seed random numbers , numbers cannot replicated.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -