Correlation and Regression

Correlationand Regression

Correlationand Regression

  1. Describe in words and statistical terms the relationship between the number of years in education and corresponding salary earned. Discuss the results in terms of this sample and the population in general.

Fromthe graph, scatter plots are close to the regression line (line ofbest fit), implying that a strong correlation between the twovariables exists. The relationship between the two variables can beexplained further by the value of correlation coefficient, denoted as‘r.` According to Breheny(n.d.),thecorrelation coefficient is a value that describes the direction andstrength of the linear association between the two variables understudy which ranges between -1 to 1. The magnitude of this valueindicates how strong a relationship between the variables is. Whenthe value is close to 1 or -1, it implies that a strong correlationexists. Referring to the sample given, the calculated correlationcoefficient (r) is 0.92331335, which is close to 1 indicating thatthere is a strong positive correlation. A change in the number ofeducation years has a significant effect on the income earned.

Whenanalyzing the data using the coefficient of determination (r2),a strong correlation also exists. Usually, the value of r2rangesfrom 0 to 1. According to Weiss (2016), the coefficient ofdetermination refers to the proportion of variation in the observedvalues of the response variable as described by the regression. Itdetermines how precise a prediction is. The calculated r2valuefor the population is 0.85250754,implying that the regression equation can explain 85% of the totalvariation in salary earned. The regression line can, therefore, besaid to represent the data well, and, as previously stated, a strongpositive correlation exists. To further justify the relationshipbetween income and education years, the t-test can be applied. Thet-test is used to determine whether the relation is merely apparentand could just have arisen by chance (Draper&amp Smith, 2014),and shows the level of significance, denoted as ‘p.` From thecalculations, the level of significance (P) is 0.0001, which is lessthan 0.001. The correlation coefficient, therefore, can be regardedas highly significant and strongly represents the data.

  1. If William has 16 years in education what would be the expected yearly salary?

Generally,the regression equation is represented as (Weiss,2016)

Wherea= y-intercept, b= slope, x is value of independent variable, which,in this case= education years, and y=dependent variable(Income/salary).

Theregression equation =

IfWilliam has 16 years in education, the expected yearly salary=

  1. If John only has 5 years of education what salary would he expect on average?

Usingthe regression equation the estimated salary will be


Breheny,P. (n.d.). Descriptive statistics: Correlation and regression.Retrieved November 7, 2016, from

Draper,N. R., &amp Smith, H. (2014). Appliedregression analysis.John Wiley &amp Sons.

Weiss,N. A. (2016). Introductory Statistics (10th ed.). New York, NY:Pearson Education.