Business Analytics HBS CORe Latest Update - Actual Exam Questions and 100% Verified Correct Answers Guaranteed A+
A/B test - CORRECT ANSWER: An experiment that compares the value of a specified dependent variable (such as the likelihood that a web site visitor purchases an item) across two different groups (usually a control group and a treatment group). The members of each group must be randomly selected to ensure that the only difference between the groups is the "manipulated" independent variable (for example, the size of the font on two otherwise-identical web sites). An A/B test is a hypothesis test that tests whether the means of the dependent variable are the same across the two groups. (An A/B test can also be used to test whether another parameter, such a standard deviation, is the same across two groups.)
adjusted R-squared - CORRECT ANSWER: A measure of the explanatory power of a regression analysis.Adjusted R-squared is equal to R-squared multiplied by an adjustment factor that decreases slightly as each independent variable is added to a regression model. Unlike R-squared, which can never decrease when a new independent variable is added to a regression model, Adjusted R-squared drops when an independent variable is added that does not improve the model's true explanatory power. Adjusted R2 should always be used when comparing the explanatory power of regression models that have different numbers of independent variables.
alternative hypothesis - CORRECT ANSWER: An alternative hypothesis is the theory or claim we are trying to substantiate, and is stated as the opposite of a null hypothesis. When our data allow us to nullify the null hypothesis, we substantiate the alternative hypothesis.
asymmetric distribution - CORRECT ANSWER: A probability distribution that is not symmetric around the mean.
average - CORRECT ANSWER: The most common statistic used to describe the center of the values in a data set. The mean is also known as the average. For a distribution that has discrete values, the mean is equal to sum of the values of all the data points in the set, divided by the number of data points.
base case - CORRECT ANSWER: The category of a categorical variable for which a dummy variable is NOT included in a regression model. A regression model with a categorical variable that has n categories should have n-1 dummy variables. The coefficients of the dummy variables included in the regression model are interpreted in relation to the base case. The analyst can select any category to be excluded 1 / 3
from the regression model; however, different base cases lead to different interpretations of the dummy variables' coefficients. For example, suppose we are trying to determine the average difference in height between men and women in a sample, and suppose that on average men are 5 inches taller than women in the sample. If we use Female as the base case then the coefficient for the dummy variable for Male would be +5. If we use Male as the base case, the coefficient for the dummy variable for Female would be -5.
bias - CORRECT ANSWER: The tendency of a measurement process to over- or under-estimate the value of a population parameter. Although a sample statistic will almost always differ from the population parameter, for an unbiased sample, the difference will be random. In contrast, for a biased sample, the statistic will differ in a systematic way (e.g., tend to be too high). Some common reasons for bias include non-random sampling methods and non-neutral question phrasing.
biased sample - CORRECT ANSWER: A sample that is not representative of the population from which it is collected. Sampling practices that can introduce bias include poorly phrased survey questions and non- random sampling.
bimodal distribution - CORRECT ANSWER: A multi-modal distribution with two clearly discernable peaks.The two peaks may be of the same height (that is, have equal frequency), or one may be the true mode while the other has a very high (but not the highest) frequency.
bin - CORRECT ANSWER: A range of values used to categorize data. In a histogram, observations are divided into a set of non-overlapping bins, each corresponding to a range of values. The bins are constructed to ensure that the set of bins contains all observations in the data set. The height of the bar corresponding to a bin is equal to the number of observations in the data set that fall within that bin's range. Typically, all bins in a given histogram are the same width (i.e., the difference between the largest value and the smallest value is the same for each bin). In an Excel histogram, each bin is labeled by the value of the upper boundary of the bin's range. For example, in a histogram with three bins (each of width 1), labeled 1, 2, and 3, the bin labeled 2 contains all observations greater than 1 and less than or equal to 2. See histogram.
binomial distribution - CORRECT ANSWER: A distribution of the possible successful outcomes in a given number of trials, where there are only two possible outcomes for each trial, and each trial has the same probability of success (e.g., flipping a coin). For example, the binomial distribution for the number of "heads" that result from flipping a coin 50 times specifies the probability for each possible outcome, from observing 0 "heads" to observing 50 "heads". The binomial distribution is used to create confidence intervals for proportions.
- / 3
Central Limit Theorem - CORRECT ANSWER: A theorem stating that if we take sufficiently large randomly- selected samples from a population, the means of these samples will be normally distributed regardless of the shape of the underlying population. (Technically, the underlying population must have a finite variance.)
coefficient of variation (CV) - CORRECT ANSWER: A measure of a data set's variability relative to its mean. The coefficient of variation (CV) is particularly helpful when comparing the variability of two data sets with different means. Calculated as the standard deviation divided by the mean, the CV is typically expressed as a percentage. For example the CV of a data set with mean = 100 hours and standard deviation = 15 hours is 15 hours/100 hours = 15%.
conditional mean - CORRECT ANSWER: A conditional mean is the mean (average) of a subset of data. We apply a condition and calculate the mean for values that meet that condition. For example, in a data set that contains data on both males and females, a conditional mean might be the mean of the data pertaining to only the females in the data set.
confidence interval for a population mean - CORRECT ANSWER: A range constructed around a sample mean that estimates the true population mean. The confidence level of a confidence interval indicates how confident we are that the range contains the true population mean. For example, we are 95% confident that a 95% confidence interval contains the true population mean. The confidence level is equal to 1 - significance level.
confidence level - CORRECT ANSWER: The percentage of all possible samples that can be expected to include the true population parameter. For example, for a 95% confidence level, the intervals should be constructed so that, on average, for 95 out of 100 samples, the confidence interval will contain the true population mean. Note that this does not mean that for any given sample, there is a 95% chance that the population mean is in the interval; each confidence interval either contains the true mean or it does not.
control group - CORRECT ANSWER: One of two or more groups (typically a control group and one or more treatment groups) in an experiment. The control group either is not manipulated in any way or is treated in the way the population has historically been treated (e.g., exposed to traditional advertising rather than proposed advertising), whereas the treatment group(s) is (are) are manipulated. Ideally, participants should be randomly assigned to groups so that there are no systematic differences between the members of the control and the treatment groups.
correlation coefficient - CORRECT ANSWER: A measure of the strength of a linear relationship between two variables. The correlation coefficient can range from -1 to +1. A correlation coefficient of -1 indicates
- / 3