Regression analysis is the mathematical process of using observations to find the line of best fit through the data in order to make estimates and predictions about the behaviour of the variables. This line of best fit may be linear (straight) or curvilinear to some mathematical formula.
Correlation analysis is the process of finding how well (or badly) the line fits the observations, such that if all the observations lie exactly on the line of best fit, the correlation is considered to be 1 or unity.
When observations are made and recorded, usually they are scattered around the field in such a way that no usable information can be gained from them. More can be gained from assessing the general direction of the mass of points, known as the trend. The purpose of regression analysis is to generate this trend line through the data.
Basically, the better the line represents the observations the nearer the correlation approaches 1.0. With absolute randomness the correlation is zero. The graph illustrates two of these situations.
Generally, in business we are concerned with linear regression. The most usual way for calculating such a regression line is by using the "method of least squares" so called because it minimizes the sum of the squares of the vertical distances of each observation to the line of best fit. This produces a line of best fit, or "model" of the situation in the form y = b.x + a (sometimes expressed as y = m.x + c), where:
The equation or model can be derived from manual calculation using a regression formula, or from a scientific hand-calculator or spreadsheet (EXCEL or Lotus 1-2-3).
A simple example of a linear regression equation for loading parcels onto a delivery lorry, if b is calculated to be 0.3 minutes and a is equal to 3.4 minutes, would be:
Application: so, if there were 32 parcels to load, the time should be:
So far we have looked at a simple two-dimensional, one variable example. Where more than one variable are present, for example in the above, besides number of parcels we have (a) total weight carried and (b) total volume, an extension of the equation, called multiple regression, is appropriate. A description of multiple regression analysis can be found elsewhere on this website.
Trend analysis is another application which produces a trend line through the past data such as sales over five years, which can be extrapolated to estimate the possible sales over the next five years. Also, the trend might be curvilinear and not described by a straight line. Extrapolation must be used with great care, because what happened over the past years may not continue over the following years.
Once the formula or model has been obtained, it can be validated using correlation analysis. This again is best tackled using a scientific calculator or a spreadsheet. A value of zero indicates no relationship (randomness) between the x variable(s) and the result, or y, variable whereas a value of 1 indicates perfect correlation (i.e. the values of y all lie on the line (or plane) of the regression model).
Regression and correlation can be used wherever it is necessary to study the behaviour of one or more variables and how they affect the final result.
Work measurement computer modelling (to set times for jobs, tasks and whole projects)
Trend analyses for assessing future sales, fluctuations in turnover and forecasting in general
Validating the data collected by a method of sampling a situation with two or more independent variables.