The concept of the sum of squares plays a pivotal role in regression analysis, a statistical technique widely used in economics, finance, and various other fields to model relationships between variables. In this context, the sum of squares can be understood as a critical tool for assessing a regression model’s goodness of fit.
Sum of squares formula:
where
- Σ represents the sum for all observations from 1 to n.
- n is the sample size.
- Xᵢ is an individual data point.
- X̅ is the mean of the data points.
This formula provides us with a measure of variability or dispersion in a data set. The process for how to find the sum of squares involves the following:
- Take each data point and subtract the mean from it.
- Square that difference.
- Add all the squared values to the running total.
A regression model divides the sum of squares into two components:
- ESS – the explained sum of squares, and
- RSS – the residual sum of squares.
The ESS quantifies the variance the regression model explains, indicating how well the model fits the data. Conversely, the RSS represents the unexplained variance or the disparity between the model’s predictions and the actual data.
The relationship between these two components is encapsulated in the coefficient of determination, R-squared, obtained by dividing ESS by the total sum of squares (TSS). R-squared ranges from 0 to 1, with the higher values indicating a better fit. It helps researchers determine the proportion of variance in the dependent variable, which can, thus, be attributed to the independent variable(s).
Researchers use the sum of squares to assess the statistical significance of regression models. By comparing the F-statistic, calculated as the ratio of ESS to RSS, to a critical value, they can determine if the model provides a statistically significant improvement over a simple mean-based model.
List of recommended resources #
For a broad overview #
Introduction to REGRESSION! | SSE, SSR, SST | R-squared | Errors (ε vs. e)
This video tutorial by zedstatistics provides a basic introduction to the topics of regression, SSE, SSR and SST.
Sum of Squares: Calculation, Types, and Examples
The Investopedia blog by Rajeev Dhir gives an overview of the statistical technique of sum of squares. the steps to calculate it and its various types.
For in depth understanding #
Linear Models in Statistics, 2nd Edition
Written by Alvin C. Rencher and G. Bruce Schaalje, this book provides an essential introduction to the theory and application of linear models and discusses the theories related to it.
Sum of Squares: Theory and Applications
This American Mathematical Society book edited by Pablo A. Parrilo and Rekha R. Thomas provides a concise state-of-the-art overview of the theory and applications of polynomials that are sums of squares.
Case study #
This Policy Research Paper by Gianni Betti, Vasco Molini and Lorenzo Mori adds to the debate on ways to improve the calculation of inequality measures in developing countries experiencing severe budget constraints. Using data from Moroccan Household Budget Surveys and Labor Force Surveys, the paper proposes a method for overcoming these limitations based on an algorithm that minimizes the sum of the squared difference between a certain number of direct estimates of an index and its empirical version obtained from the predicted values.
References #
Sequential (or Extra) Sums of Squares