Math 201 Practice Exam 3

Please work out each of the given problems.  Credit will be based on the steps that you show towards the final answer.  Show your work.

Back to the Practice Exam

Problem 1

Go to the link below and practice the True or False applet.  Be sure to check "Hypothesis Testing", "Regression Analysis", "Chi-Square", and "ANOVA".

http://www.ltcconline.net/greenl/java/Statistics/TrueFalse/statTrueFalseWithChoices2.html\


Problem 2

Thirteen males and fourteen females participated in a study leg strength.  Right leg strength (in Newtons) was recorded for each participant resulting in the table below.  Is there a difference between strength in men and women?  Use a 5% level of significance.  Give the P-value and interpret what it means.  State any assumptions needed.

Gender n x s
Male 13 2127 513
Female 14 1643 446

 

Solution

There are 2 samples given, the survey question is quantitative so it is for means, the samples are independent, and the population standard deviation is unknown so we use the Student's T-Distribution.  Since we want to see if there is a "difference" this is a two tailed test.  The null and alternative hypotheses are:

H0m1  =  m2        

H1m1 m2 

We use the calculator with 2-SampTTest with STATS to get

t = 2.60758

P-Value = 0.015

Since the P-Value is less than the level of significance (0.05), we reject the null hypothesis and accept the alternative hypothesis.  At the 5% level of significance, there is sufficient evidence to conclude that the mean leg strength for men is not the same as the mean leg strength for women. If the leg strength's were the same and another study was done with sample sizes 13 Males and 14 Females, then there would be a 1.5% chance that that study would result is the difference between male and female leg strength being at least as large as 484.  Since the sample sizes are less than 30, we must assume that both populations' distributions are approximately normal.


Problem 3

Is one ski resort better than another?  Data was collected to determine whether the ski resort that was visited had a bearing on how much enjoyment the skier had.  The following table shows the data that was collected.  What can you conclude at the 5% level?  

Bored Had an OK time Had a Great Time The Best Experience Ever
Heavenly 7 25 42 4
Sierra-at-Tahoe 5 20 30 1
Kirkwood 9 12 30 15

 

Solution

Since there are two categorical variables here and we want to see if they are independent or dependent, we will use a c2 test for independence.  We first state the null and alternative hypotheses

        H0:  Ski resort and enjoyment are independent

        H1:  Ski resort and enjoyment are dependent

We enter the data into a 3x4 matrix into the calculator.  Then we use the c2-TEST to get:

c2  =  21.67

P-Value  =  0.0013

Since the P-Value is less than the level of significance (0.05) we reject the null hypothesis and accept the alternative hypothesis.  There is statistically significance evidence to conclude that that ski resort people go to and their enjoyment are dependent.


Problem 4

You are the owner of an automobile dealership and have done research on the relationship between the cost of the clothes (x) that a potential buyer wears and the price of the car (y) that the person will buy.  The data below gives the data that you have collected.

Clothes ($) 35 45 60 80 95 120 140 150
Car (in $1000) 15 18 14 22 25 33 30 75

 

A.  Find the equation of the regression line.

We put the data into L1 and L2 into the calculator and use the LinRegTTest to get that the equation of the regression line is

ŷ  =  -3.55 +0.36x

B.  A man walks into your dealership sporting a $100 outfit.  What is your prediction for the price of the car that this man will buy?

Solution

We just plug 100 in for x into the regression equation to get

ŷ  =  -3.55 +0.36 (100)

   =  32.45

The best prediction for this man wearing the $100 outfit is that his car will be worth around $32,450.

 

C.  Interpret the slope in the context of the study.

Solution

The slope is 0.36.  This means that on average for every additional dollar that people spend on their outfits, they are predicted to spend an additional $360 on their car.

D.  Interpret the y-intercept if applicable.  If not, explain why it does not have meaning.

Solution

The y-intercept here does not have any meaning.  People do not come to a car dealership naked (wearing a $0 outfit).

E.  Find the correlation.

Solution

The calculator gives that the correlation is r = 0.78

F.  Find r2 and give its interpretation in terms of explained variation in the the context of the question.

Solution

The calculator gives that r2 = 0.62.    We can say that 62% of the variation in the price of the car that customers buy can be explained by the linear relationship between the price of the outfit and the price of the car the customer buys.  38% of the variation in the price of the car that customers buy cannot be explained by this line.

 

G.  Conduct the appropriate hypothesis test to see if there is a correlation and interpret your results using a complete sentence.

Solution

We have

H0r = 0

H1r ≠ 0

The calculator gives that

P-Value = 0.02115

Since the P-Value is small there is significant evidence to support the claim that there is a correlation between the price of the outfit that a person wears and the price of the car that that person purchases.

H.  What assumptions are needed for the test from part G?

Solution

We need to assume that the population of car prices for each fixed value of outfit value is approximately normally distributed. 


Problem 5

You are the owner of the Tahoe Inn Motel and are interested in how the price per room is related to the number of units that are occupied.  Below is the SPSS readout produced from motels throughout the Tahoe area.  

A.  What is the equation of the regression line?  Interpret the slope of the regression line for this study.  Interpret the y-intercept.

Solution

The equation is

        ŷ  =  91.4 - 0.52x

The slope tells us that for every $1 that the price is raised, we expect to lose .52 occupants.

The y-intercept tells us that if we allow people to stay in our rooms for free, then we can expect about 90 of our rooms occupied.

 

B.  Use your regression line to provide a point estimate for the number of units occupied when the price per room is $100.

Solution

We plug 100 into the equation

        ŷ  =  91.4 - .52(100)  =  39.4

 

C.  What is the correlation coefficient?  Interpret this coefficient.  

Solution

The correlation coefficient is r  =  -0.69.

We can say that there is a moderate negative correlation between the price per room and occupancy rate.

D.  Construct a possible scatter plot for this data and explain using a complete sentence or two your reasoning in constructing the scatter plot the way you did it.

Solution

       

There is a general trend downward, but the data do not perfectly fit the regression line. 


Simple linear regression results:
Dependent Variable: Units
Independent Variable: Price

Sample size: 26
Correlation coefficient: -0.69
Estimate of sigma: 20.581867

 

Parameter Estimate Std. Err. DF T-Stat P-Value
Intercept 91.39109 6.9450126 24 13.159241 <0.0001
Slope -0.5198245 0.111319035 24 -4.669682 <0.0001




Problem 6

You are interested in seeing whether the soil quality in the Tahoe Basin has changed since 1970.  Soil samples were taken then and this year.  The table below shows the results of this study.

  Good Fair Poor Very Poor
1970 45 80 36 50
This Year 30 40 60 75

What can be concluded at the 0.05 level of significance?

Solution

Since we want to see if there is a difference between two distribution and we do not know either of the population distributions, we conduct a hypothesis test for homogeneity.  We have

H0:  The distributions of soil quality are the same for 1970 and this year.

H1:  The distributions of soil quality are different for 1970 and this year.

We enter the data into a 2x4 matrix into the calculator.  Then we use the c2-TEST to get:

c2  =  27.25

P-Value  =  0.0000521

Since the P-Value is smaller than the level of significance, we can conclude that the distribution of soil quality this year is different from the distribution of soil quality of 1970.


Problem 7

The percent of community college students in California by their main goals is given in the table below
Goal Transfer Job Skills Personal development Improve Basic Skills Certification
Percent 48 23 12 8 9

A study was done to see if this distribution is different at LTCC.  The table below shows the finding of this study.
Goal Transfer Job Skills Personal development Improve Basic Skills Certification
# of Students 72 40 25 20 15

What can be concluded at the 0.05 level of significance? 

Solution

Since we want to see if the distribution of LTCC students fits a known population distribution, we use the c2 Goodness of Fit Test.  We have

H0:  The distributions of goals for LTCC students is the same as the distribution of goals for all California Community College Students.

H1:  The distributions of goals for LTCC students is not the same as the distribution of goals for all California Community College Students.

We put the LTCC data values into L1.  Notice that the total sample size of LTCC students is n = 172.  Thus the expected count for the goal of transfer is 0.48*172.  Similarly to get the rest of the expected counts, we multiply the percent * 0.01 * 172.  We then put these values into L2.  Now use the c2GOF-TEST with 4 degrees of freedom (df) to get

c2  =  5.12

P-Value = 0.275 

Since the P-Value is larger than 0.05, there is insufficient evidence to make a conclusion about the distribution of LTCC students' goals being different from the distribution of all California Community College students' goals.


Problem 8

A researcher is interested in determining whether there is a difference between the mean amount of money spent on textbooks in the fall at the three California public university systems.  Ten randomly selected students from UC campuses, 8 randomly selected students from Cal State campuses and 14 randomly selected students from community colleges were surveyed.  Below is the StatCrunch readout for this survey.

Analysis of Variance results:
Data stored in separate columns.
Column means
 

Column n Mean Std. Error
UC 10 234.9 23.159088
Cal State 8 201.75 11.846865
Comm Col 12 183.16667 13.26983


ANOVA table
 

Source df SS MS F-Stat P-value
Treatments 2 14740.9 7370.45 2.5071433 0.1003
Error 27 79374.07 2939.7803    
Total 29 94114.97      

 

 

 

 

A.  What assumptions have we made about the data to apply a single-factor ANOVA test?

Solution

Since the data comes from independent random samples, , we need assume only that each group of data came from a normal distribution, and that all the groups came from distributions with about the same standard deviation.

B.  What can be concluded at the 0.05 level of significance?

Solution

Since the P-value is 0.1003 is greater than the level of significance of 0.05, we do not have significant evidence to conclude that there is a difference in the mean amount of money spent on textbooks by students at the three institutions.


Problem 9

A study was done to see if Caucasians have a lower pass rate than Latinos in their statistics class.  192 Caucasians and 83 Latinos were considered.  135 of the Caucasians and 65 of the Latinos passed the course. 

A.  Conduct the appropriate hypothesis test and state your conclusion in the context of the problem using a 0.05 level of significance.

Solution

Since the survey question, "Did you pass your statistics class?" is a Yes or No question, and since there are two samples, and since we want to see if the pass rate for Caucasians is lower than the Latino pass rate, we have

H0:  p1  =  p2        

H1p1 <  p2 

We use the calculator with 2-PropZTest.  We get

Z  =  -1.37

P-Value = 0.0857

Since the P-Value is greater than the level of significance, there is insufficient evidence to conclude that the pass rate for all Caucasians is lower than the pass rate for all Latinos.

B.  Find the appropriate 95% confidence interval and explain in a complete sentence what it means.

Solution

We use the calculator with a 2-PropZInt to get

(-0.1897,0.02971)

Since the interval contains both negative and positive values it is likely that Caucasians have a higher pass rate than Latinos and it is also likely that Caucasians have a lower pass rate than Latinos.  Therefore we cannot say whether the Caucasian's pass rate is lower than the Hispanic's pass rate.


Problem 10

A biologist measured the muscle masses in grams of ten laboratory rats before and after putting them on a high protein diet to see if the mean muscle mass increases with high protein diet.  The results are shown in the table below.
Before 4 6 2 3 4 5 3 2
After 5 8 3 3 5 4 5 4

A.  What can be concluded at the 0.05 level of significance?

Solution

Since each rat is measured before and after the diet, this is a two sample test with paired (dependent) samples.  We put the data into L1 (Before) and L2 (After) and then store the differences into L3

L1-L2 STO -> L3

Since the population standard deviation is unknown, we use a T-Test to get

T = -2.65

P-Value = 0.0166

Since the P-Value is less than 0.05, there is sufficient evidence to conclude that the mean muscle mass of all rats after being given the high protein diet is larger than to mean muscle mass before being given the high protein diet.

B.  Construct and interpret the appropriate 95% confidence interval.

Solution

We use a T-Interval to get

(-1.894,-0.1063)

With a 95% level of confidence we can conclude that the mean muscle mass of all rats on the protein diet is between 0.1 g and 1.9 g more than the mean muscle mass for all rats without the high protein diet.