Scatter Diagrams and Regression Lines Scatter Diagrams If data is given in pairs then the scatter diagram of the data is just the points plotted on the xy-plane. The scatter plot is used to visually identify relationships between the first and the second entries of paired data. Example
The scatter plot above represents the age vs. size of a plant. It is clear from the scatter plot that as the plant ages, its size tends to increase. If it seems to be the case that the points follow a linear pattern well, then we say that there is a high linear correlation, while if it seems that the data do not follow a linear pattern, we say that there is no linear correlation. If the data somewhat follow a linear path, then we say that there is a moderate linear correlation.
A bivariate sample consists of pairs of data (x,y). If we plot these pairs on the xy-plane then we have a scatter diagram.
Given a scatter plot, we can draw the line that best fits the data . Recall that to find the equation of a line, we need the slope and the y-intercept. We will write the equation of the line as
Where a is the y-intercept and b is the slope. x is the independent or predictor variable and y is the dependent or response variable. To find a and b we follow the steps:
Interpretations We can interpret a as the value of y when x is zero and we can interpret b as the amount that y increases when x increases by one.
Example Suppose that a study was done to determine the weight loss after taking various amounts of a diet pill in combination with exercise. If the regression line was y = 3 + 2x where x denotes the grams of the pill per day and y represents the weight loss, then we can say that with only the exercise and no pill the average weight loss is 3 pounds. We can also say that if a person takes an additional gram of the pill, then that on average the person should expect to lose an additional 2 pounds. If a person takes 5 grams than that person can expect to lose an average of 13 pounds.
Example Data was collected to compare the length of time x (in months) couples have been in a relationship to the amount of money y that is spent when they go out. The equation of the regression line was found to be y = 70 - 5x The y-intercept tells us that at the beginning of the relationship, the average date costs $70. The slope tells us at the relationship lasts an additional month, the average date costs $5 less than the previous date. We can use the regression line to predict the amount of money that a date costs when the relationship has lasted, for example, six months. We have y(6) = 70 - 5(6) = 40
Estimating the Mean Value of y for a
Particular Value of x Suppose that you own a pizza restaurant and are interesting in sending out menus to local residents. You research what your 8 competitors have done to find the relationship between number of mailings and amount of pizzas bought per week. You find that the equation of the regression line is y = 100 + .2x. You calculate Se to be 4, the total mean to be 990, and SSx = 73. Next week you plan an advertising blitz of 1000
mailings. How many pizzas do you expect to sell and what is a 95%
confidence interval for this estimate. Solution We will use the main theorem that states that an unbiased estimate for the value of y given a fixed value of x is a + bx The standard deviation is
Hence we predict that we will sell about 100 + .2(1000) = 300 pizzas. We find the standard deviation
From the table, we have tc = 2.365 so that a 95% confidence interval is 300 2.365(6.31) or
[283, 317]
Back to the Regression and Nonparametric Home Page Back to the Elementary Statistics (Math 201) Home Page Back to the Math Department Home Page e-mail Questions and Suggestions
|