SUMMARY MARKETING STRATEGY RESEARCH
WEEK 1 | INTRODUCTION
Consumers data tools strategy There are 9.932 analytical tools – this course covers 5
oLinear regression: market responses
oConjoint analysis: new project design
oBass model: new project diffusion
oCluster analysis: segmentation
oMulti-dimensional scaling: positioning
Principles of data-driven marketing 1.Any statistical analysis is to reduce information loss 2.Causation cannot be learned directly from data 3.Prediction does not care about statistical significance 4.Practical usefulness triumphs statistical criteria
Case: pricing strategy for Jetstar
oFormulate strategies based on analytical results
WEEK 2 | LINEAR REGRESSION
LECTURE
What & why: intro to predictive modeling
oMarket response model: how to predict market response
oE.g. Target knowing when someone is pregnant based on behavior oPrediction machine: find functional relationship between input (data) and output (prediction) oLinear regression = simplest form, straight line / : y = a + bx - with an intercept and b slope Terminologies: with a toy example, using price to predict sales - sales = a + b*price
oX: price = independent variable - input
oY: sales = dependent variable - output
oPrinciple: any statistical analysis is to reduce information loss
Prediction as close as possible to observations – choose line to minimize differences
How: 5 steps to perform a linear regression
oExamining the data: make sure data is clean, check for correlation, multi-collinearity, etc.
Multi-collinearity: VIF < 10 not an issue, VIF > 10 high collinearity
High correlation indicates trouble, get biased and misleading estimated coefficients Use one variable in regression, transform correlated variables, collect more data oFormulating the model: decide which variables to use as input: IV's, DV, and residual Translate conceptual model to a R formula oEstimating the model: any statistical analysis to minimize information loss (residuals) Choose coefficients so differences (residuals) between actual & predicted are minimized
Least squares criterion: minimize residual sum of squares (RSS)
oValidating the model: look at model's significance
Naïve prediction: prediction with only intercepts, but no other IV's - assumption Null hypothesis using F statistics and check p-value in R output - significance R-squared: model fit or strength of association - % of variation in DV explained by model How good is the model for prediction? – validate the model Test significance of individual coefficient: H0, t-test, check p-value 1 / 2
oMaking predictions: use predict() function, a new data set and confidence interval Extending the use of linear regression
oNominal variables: cannot directly put into a regression - need to be numeric
Designate a variable as factor: R will do the rest – weather <- as.factor(weather) oDummy coding (binary variable, 0-1) - always baseline M-1 dummy variables, choose baseline: weather <- relevel (weather, ref=”sunny”)
oInterpretation of coefficients: we only know the difference between conditions
When coefficient not significant: difference of baseline not significant: same level
Risk control: assumptions in linear regression on residuals
oNormality - test using residuals in histogram or K-S test oEqual variance - test using scatter plot or Y^ and residuals Obtain residuals and DV -> standardize both -> draw scatter plot x-as DV and y-as residue
TUTORIAL
Step 1 checking the VIFS: vnames <- colnames(train)[2:5] & vif(vnames, train)
Step 2 formulate model: Sales = β0 + β1IV1 + β2IV2 + β3IV3 + β4IV4 + е
Step 3 estimate model: model <- lm(Sales ~ IV + IV + IV + IV, data = train) & summary(model)
Step 4 validate model: check significance of overall model and coefficients
oTest H0: β1 = β2 = … = 0 & H0: βk = 0 | when p < 0.05 reject H0: predictive value oCheck R-squared = % variation explained by model – depends on environment (>90% sales) Step 5 make predictions: test <- set[76:100,] & str(test) & model2 <-predict(model, newdata = test)
oModel2 <- as.data.frame(model2) & model2$week <- 76:100 & ggplot
Comparison: Repeat steps without IV Brand Equity and compare the two models
Risk control: violation leads to biased estimation and bad prediction
oNormality assumption: KS test H0: The variable follows a normal distribution H0 should not be rejected, so KS test should NOT be significant p > 0.05
oEqual variance assumption: plot residuals and check if span/ranges are similar
Categorical variables cannot go in regression: first transformed to factors – setting a baseline
oInterpretation is tricky: always relative to the baseline
CASE Product line cannibalization = older lines not selling anymore after introducing new ones
Objective: to find possible cannibalization effects
oHow introduction and sales of new styles influence sales of the previous line Causation ≠ correlation – causal structures can produce same correlation pattern
oConfounder variables: contaminates the causal effect
add to regression as control variables to kill lurking variables Week as control variable - week 1 is baseline Simple running a regression gives you correlation: causation is difficult to get in practices
WEEK 3 | CONJOINT ANALYSIS
LECTURE
To understand preferences = holy grail – voting, consumption, social life Product is combination of attributes & levels – e.g. decide product attributes of laptop oManager = chef, attributes = ingredients, conjoint analysis = recipe
- / 2