Why is it important that the relationship between the explanatory and response variable be linear when performing a linear regression analysis?

Generate predictions more easily

You can perform linear regression in Microsoft Excel or use statistical software packages such as IBM SPSS® Statistics that greatly simplify the process of using linear-regression equations, linear-regression models and linear-regression formula. SPSS Statistics can be leveraged in techniques such as simple linear regression and multiple linear regression.

You can perform the linear regression method in a variety of programs and environments, including:

  • R linear regression
  • MATLAB linear regression
  • Sklearn linear regression
  • Linear regression Python
  • Excel linear regression

Why linear regression is important

Linear-regression models are relatively simple and provide an easy-to-interpret mathematical formula that can generate predictions. Linear regression can be applied to various areas in business and academic study.

You’ll find that linear regression is used in everything from biological, behavioral, environmental and social sciences to business. Linear-regression models have become a proven way to scientifically and reliably predict the future. Because linear regression is a long-established statistical procedure, the properties of linear-regression models are well understood and can be trained very quickly.

A proven way to scientifically and reliably predict the future

Business and organizational leaders can make better decisions by using linear regression techniques. Organizations collect masses of data, and linear regression helps them use that data to better manage reality — instead of relying on experience and intuition. You can take large amounts of raw data and transform it into actionable information.

You can also use linear regression to provide better insights by uncovering patterns and relationships that your business colleagues might have previously seen and thought they already understood. For example, performing an analysis of sales and purchase data can help you uncover specific purchasing patterns on particular days or at certain times. Insights gathered from regression analysis can help business leaders anticipate times when their company’s products will be in high demand.

Learn more about linear regression at the IBM Knowledge Center

Key assumptions of effective linear regression

Assumptions to be considered for success with linear-regression analysis:

  • For each variable: Consider the number of valid cases, mean and standard deviation. 
  • For each model: Consider regression coefficients, correlation matrix, part and partial correlations, multiple R, R2, adjusted R2, change in R2, standard error of the estimate, analysis-of-variance table, predicted values and residuals. Also, consider 95-percent-confidence intervals for each regression coefficient, variance-covariance matrix, variance inflation factor, tolerance, Durbin-Watson test, distance measures (Mahalanobis, Cook and leverage values), DfBeta, DfFit, prediction intervals and case-wise diagnostic information. 
  • Plots: Consider scatterplots, partial plots, histograms and normal probability plots.
  • Data: Dependent and independent variables should be quantitative. Categorical variables, such as religion, major field of study or region of residence, need to be recoded to binary (dummy) variables or other types of contrast variables.  
  • Other assumptions: For each value of the independent variable, the distribution of the dependent variable must be normal. The variance of the distribution of the dependent variable should be constant for all values of the independent variable. The relationship between the dependent variable and each independent variable should be linear and all observations should be independent.

Try SPSS Statistics for free

Make sure your data meets linear-regression assumptions

Before you attempt to perform linear regression, you need to make sure that your data can be analyzed using this procedure. Your data must pass through certain required assumptions.

Here’s how you can check for these assumptions:

  1. The variables should be measured at a continuous level. Examples of continuous variables are time, sales, weight and test scores. 
  2. Use a scatterplot to find out quickly if there is a linear relationship between those two variables.
  3. The observations should be independent of each other (that is, there should be no dependency).
  4. Your data should have no significant outliers. 
  5. Check for homoscedasticity — a statistical concept in which the variances along the best-fit linear-regression line remain similar all through that line.
  6. The residuals (errors) of the best-fit regression line follow normal distribution.

Use this hands-on tutorial to learn more about linear regression data assumptions

Examples of linear-regression success

Evaluating trends and sales estimates

You can also use linear-regression analysis to try to predict a salesperson’s total yearly sales (the dependent variable) from independent variables such as age, education and years of experience.

Analyze pricing elasticity

Changes in pricing often impact consumer behavior — and linear regression can help you analyze how. For instance, if the price of a particular product keeps changing, you can use regression analysis to see whether consumption drops as the price increases. What if consumption does not drop significantly as the price increases? At what price point do buyers stop purchasing the product? This information would be very helpful for leaders in a retail business.

Assess risk in an insurance company

Linear regression techniques can be used to analyze risk. For example, an insurance company might have limited resources with which to investigate homeowners’ insurance claims; with linear regression, the company’s team can build a model for estimating claims costs. The analysis could help company leaders make important business decisions about what risks to take.

Sports analysis

Linear regression isn’t always about business. It’s also important in sports. For instance, you might wonder if the number of games won by a basketball team in a season is related to the average number of points the team scores per game. A scatterplot indicates that these variables are linearly related. The number of games won and the average number of points scored by the opponent are also linearly related. These variables have a negative relationship. As the number of games won increases, the average number of points scored by the opponent decreases. With linear regression, you can model the relationship of these variables. A good model can be used to predict how many games teams will win.

Why is linear correlation important?

A correlation analysis provides information on the strength and direction of the linear relationship between two variables, while a simple linear regression analysis estimates parameters in a linear equation that can be used to predict values of one variable based on the other.

What is relationship between response variable and explanatory variable?

The difference between explanatory and response variables is simple: An explanatory variable is the expected cause, and it explains the results. A response variable is the expected effect, and it responds to explanatory variables.

Why is linear regression so important?

Understanding linear regression is important because it provides a scientific calculation for identifying and predicting future outcomes.

Why is it important to know the correlation coefficient for a linear model?

The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y. However, the reliability of the linear model also depends on how many observed data points are in the sample.