Step-by-Step Instructions
Calculate the Mean of x and y
Calculate the mean of the x and y variables using the formulas μx = (Σx) / n and μy = (Σy) / n
Calculate the Deviations from the Mean
Calculate the deviations from the mean for x and y by subtracting the mean from each individual data point
Calculate the Slope (b1)
Calculate the slope of the regression line using the formula b1 = Σ[(xi - μx)(yi - μy)] / Σ(xi - μx)^2
Calculate the Intercept (b0)
Calculate the intercept of the regression line using the formula b0 = μy - b1 * μx
Calculate the Correlation Coefficient (r)
Calculate the correlation coefficient using the formula r = Σ[(xi - μx)(yi - μy)] / sqrt[Σ(xi - μx)^2 * Σ(yi - μy)^2]
Calculate the Residuals
Calculate the residuals by subtracting the predicted y values from the observed y values
Introduction to Linear Regression
Linear regression is a statistical method used to model the relationship between two variables. The goal is to create a linear equation that best predicts the value of one variable based on the value of another variable. In this guide, we will walk through the steps to calculate the regression slope, intercept, correlation coefficient, and residuals by hand.
Prerequisites
Before starting, ensure you have a basic understanding of algebra and statistical concepts. You will need a dataset with two variables, x and y, to perform the calculations.
Step-by-Step Calculations
To calculate the linear regression equation, follow these steps:
Step 1: Calculate the Mean of x and y
First, calculate the mean of the x and y variables. The mean is calculated by summing all the values and dividing by the total number of observations (n). The formulas for the mean of x and y are: μx = (Σx) / n μy = (Σy) / n
Step 2: Calculate the Deviations from the Mean
Next, calculate the deviations from the mean for x and y. These deviations are calculated by subtracting the mean from each individual data point. (x1 - μx), (x2 - μx), ..., (xn - μx) (y1 - μy), (y2 - μy), ..., (yn - μy)
Step 3: Calculate the Slope (b1)
The slope of the regression line (b1) is calculated using the formula: b1 = Σ[(xi - μx)(yi - μy)] / Σ(xi - μx)^2
Step 4: Calculate the Intercept (b0)
The intercept of the regression line (b0) is calculated using the formula: b0 = μy - b1 * μx
Step 5: Calculate the Correlation Coefficient (r)
The correlation coefficient (r) measures the strength of the linear relationship between x and y. The formula for r is: r = Σ[(xi - μx)(yi - μy)] / sqrt[Σ(xi - μx)^2 * Σ(yi - μy)^2]
Step 6: Calculate the Residuals
The residuals are the differences between the observed y values and the predicted y values. The predicted y values are calculated using the regression equation: y_pred = b0 + b1 * xi The residuals are calculated by subtracting the predicted y values from the observed y values: residual = yi - y_pred
Worked Example
Suppose we have the following dataset:
| x | y |
|---|---|
| 1 | 2 |
| 2 | 3 |
| 3 | 5 |
| 4 | 7 |
| 5 | 11 |
First, calculate the mean of x and y: μx = (1 + 2 + 3 + 4 + 5) / 5 = 3 μy = (2 + 3 + 5 + 7 + 11) / 5 = 5.6
Next, calculate the deviations from the mean: (x1 - μx) = 1 - 3 = -2 (x2 - μx) = 2 - 3 = -1 (x3 - μx) = 3 - 3 = 0 (x4 - μx) = 4 - 3 = 1 (x5 - μx) = 5 - 3 = 2 (y1 - μy) = 2 - 5.6 = -3.6 (y2 - μy) = 3 - 5.6 = -2.6 (y3 - μy) = 5 - 5.6 = -0.6 (y4 - μy) = 7 - 5.6 = 1.4 (y5 - μy) = 11 - 5.6 = 5.4
Then, calculate the slope (b1): b1 = [(-2 * -3.6) + (-1 * -2.6) + (0 * -0.6) + (1 * 1.4) + (2 * 5.4)] / [(-2)^2 + (-1)^2 + (0)^2 + (1)^2 + (2)^2] b1 = (7.2 + 2.6 + 0 + 1.4 + 10.8) / (4 + 1 + 0 + 1 + 4) b1 = 21.9 / 10 b1 = 2.19
Next, calculate the intercept (b0): b0 = 5.6 - 2.19 * 3 b0 = 5.6 - 6.57 b0 = -0.97
Now, calculate the correlation coefficient (r): r = [(-2 * -3.6) + (-1 * -2.6) + (0 * -0.6) + (1 * 1.4) + (2 * 5.4)] / sqrt[(4 + 1 + 0 + 1 + 4) * (12.96 + 6.76 + 0.36 + 1.96 + 29.16)] r = 21.9 / sqrt(10 * 50.2) r = 21.9 / sqrt(502) r = 21.9 / 22.4 r = 0.98
Finally, calculate the residuals: y_pred = -0.97 + 2.19 * 1 = 1.22 residual = 2 - 1.22 = 0.78 y_pred = -0.97 + 2.19 * 2 = 3.41 residual = 3 - 3.41 = -0.41 y_pred = -0.97 + 2.19 * 3 = 5.6 residual = 5 - 5.6 = -0.6 y_pred = -0.97 + 2.19 * 4 = 7.79 residual = 7 - 7.79 = -0.79 y_pred = -0.97 + 2.19 * 5 = 10.02 residual = 11 - 10.02 = 0.98
Common Mistakes to Avoid
When performing linear regression calculations by hand, it is essential to avoid common mistakes such as:
- Forgetting to calculate the deviations from the mean
- Incorrectly calculating the slope or intercept
- Failing to square the deviations when calculating the correlation coefficient
Using a Calculator for Convenience
While it is possible to perform linear regression calculations by hand, it can be time-consuming and prone to errors. For larger datasets or more complex analyses, it is recommended to use a calculator or statistical software to perform the calculations.