Scatter Diagrams and Regression Lines

Scatter Diagrams

If data is given in pairs then the scatter diagram of the data is just the points plotted on the xy-plane.  The scatter plot is used to visually identify relationships between the first and the second entries of paired data.

Example The scatter plot above represents the age vs. size of a plant.  It is clear from the scatter plot that as the plant ages, its size tends to increase.  If it seems to be the case that the points follow a linear pattern well, then we say that there is a high linear correlation, while if it seems that the data do not follow a linear pattern, we say that there is no linear correlation.  If the data somewhat follow a linear path, then we say that there is a moderate linear correlation.

A bivariate sample consists of pairs of data (x,y).  If we plot these pairs on the xy-plane then we have a scatter diagram.

Given a scatter plot, we can draw the line that best fits the data .

Recall that to find the equation of a line, we need the slope and the y-intercept.  We will write the equation of the line as

 y = a + bx

Where a is the y-intercept and b is the slopex is the independent or predictor variable and y is the dependent or response variable.  To find a and b we follow the steps:

1. List

1. The sum of the x--  Sx

2. The sum of the y--  Sy

3. The sum of the squares of   x--  Sx2

4. The sum of the products of x and y--  Sxy

2. Calculate: 3. Calculate

 a = y - bx

Interpretations

We can interpret a as the value of y when x is zero and we can interpret b as the amount that y increases when x increases by one.

Example

Suppose that a study was done to determine the weight loss after taking various amounts of a diet pill in combination with exercise.  If the regression line was

y = 3 + 2x

where x denotes the grams of the pill per day and y represents the weight loss, then we can say that with only the exercise and no pill the average weight loss is 3 pounds.  We can also say that if a person takes an additional gram of the pill, then that on average the person should expect to lose an additional 2 pounds.  If a person takes 5 grams than that person can expect to lose an average of 13 pounds.

Example

Data was collected to compare the length of time x (in months) couples have been in a relationship to the amount of money y that is spent when they go out.  The equation of the regression line was found to be

y  =  70 - 5x

The y-intercept tells us that at the beginning of the relationship, the average date costs \$70.  The slope tells us at the relationship lasts an additional month, the average date costs \$5 less than the previous date.  We can use the regression line to predict the amount of money that a date costs when the relationship has lasted, for example, six months.  We have

y(6)  =  70 - 5(6)  =  40

Estimating the Mean Value of y for a Particular Value of x

Suppose that you own a pizza restaurant and are interesting in sending out menus to local residents.  You research what your 8 competitors have done to find the relationship between number of mailings and amount of pizzas bought per week.  You find that the equation of the regression line is

y = 100 + .2x.

You calculate Se to be 4, the total mean to be 990, and SSx  =  73.

Next week you plan an advertising blitz of 1000 mailings.  How many pizzas do you expect to sell and what is a 95% confidence interval for this estimate.

Solution

We will use the main theorem that states that an unbiased estimate for the value of y given a fixed value of x is

a + bx

The standard deviation is Hence we predict that we will sell about

100 + .2(1000) = 300 pizzas.

We find the standard deviation From the table, we have

tc  =  2.365

so that a 95% confidence interval is

300 2.365(6.31)

or

[283, 317]

Hence we expect between 283 and 317 pizzas to be sold.