Problem 11_20, pg.590
The manager of a certain commuter
rail system wants to determine which factors have a significant impact on the
demand for rides in the large city served by the transportation network. The variables that he wants to relate to the
number of weekly riders on the city’s rail are price per ride, population of
the city, disposable income per capita of the citizens and the parking rate in
the city parking lots.
As I look at each variable, some of
them are expected to be positively related to the number of riders per week by
their nature. Others are in no way, or
seem not to be, related to weekly rider-ship.
First, examining price per ride, compared to the number of weekly
riders, one would expect this to be a positive linear relationship, and
therefore expect the signs of the coefficient to be positive. The second variable, population of the city,
is one I would expect to have a negative relationship in one sense, and a
positive relationship in another. The
expected sign of the coefficient at first glance of the data seems to be
negative. At first, in 1966, over half
of the population is riding the rail system on a weekly basis, and as the years
go by, just a little less, and then a little more than half the population is
riding the commuter rail. This change
could be more attributable to the increased popularity and affordability of
cars than being directly related to the number of weekly riders. Next, the data on income suggest a positive
linear relationship between the variables.
As disposable income per capita rises, the number of weekly rail riders
decreases, again implying that possibly with the introduction of affordable
cars or other forms of transportation, less people rode the city rail
system. The last variable is weekly
rider-ship compared to the parking rate of city parking lots. As the price of parking goes up, fewer
citizens are riding the rail (not related to number of weekly riders), which
again can be attributed to other more affordable modes of transportation, and
an expected positive sign of the coefficient.
Results of multiple regression for
Weekly_Riders |
|
|
|
||||||||||||
|
|
|
|
|
|
|
|
||||||||
Summary measures |
|
|
|
|
|
|
|||||||||
|
Multiple R |
0.9776 |
|
|
|
|
|
||||||||
|
R-Square |
0.9557 |
|
|
|
|
|
||||||||
|
Adj R-Square |
0.9477 |
|
|
|
|
|
||||||||
|
StErr of Est |
21.4867 |
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
||||||||
ANOVA Table |
|
|
|
|
|
|
|||||||||
|
Source |
df |
SS |
MS |
F |
p-value |
|
||||||||
|
Explained |
4 |
219260.4797 |
54815.1199 |
118.7301 |
0.0000 |
|
||||||||
|
Unexplained |
22 |
10156.9277 |
461.6785 |
|
|
|
||||||||
|
|
|
|
|
|
|
|
||||||||
Regression coefficients |
|
|
|
|
|
||||||||||
|
|
Coefficient |
Std
Err |
t-value |
p-value |
Lower
limit |
Upper
limit |
||||||||
|
Constant |
124.4269 |
516.7803 |
0.2408 |
0.8120 |
-947.3109 |
1196.1648 |
||||||||
|
Price_per_Ride |
-166.9641 |
52.0106 |
-3.2102 |
0.0040 |
-274.8275 |
-59.1006 |
||||||||
|
Population |
0.6210 |
0.2751 |
2.2570 |
0.0343 |
0.0504 |
1.1915 |
||||||||
|
Income |
-0.0472 |
0.0129 |
-3.6572 |
0.0014 |
-0.0740 |
-0.0204 |
||||||||
|
Parking_Rate |
194.6798 |
36.6143 |
5.3170 |
0.0000 |
118.7463 |
270.6133 |
||||||||
|
|
|
|
|
|
|
|
||||||||
Results of multiple regression for
Weekly_Riders |
|
|
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|||||||
Summary measures |
|
|
|
|
|
|
|
||||||||
|
Multiple R |
0.9724 |
|
|
|
|
|
|
|||||||
|
R-Square |
0.9455 |
|
|
|
|
|
|
|||||||
|
Adj R-Square |
0.9384 |
|
|
|
|
|
|
|||||||
|
StErr of Est |
23.3208 |
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|||||||
ANOVA Table |
|
|
|
|
|
|
|
||||||||
|
Source |
df |
SS |
MS |
F |
p-value |
|
|
|||||||
|
Explained |
3 |
216908.6379 |
72302.8793 |
132.9440 |
0.0000 |
|
|
|||||||
|
Unexplained |
23 |
12508.7695 |
543.8595 |
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|||||||
Regression coefficients |
|
|
|
|
|
|
|||||||||
|
|
Coefficient |
Std
Err |
t-value |
p-value |
Lower
limit |
Upper
limit |
|
|||||||
|
Constant |
1289.6821 |
24.6300 |
52.3622 |
0.0000 |
1238.7311 |
1340.6331 |
|
|||||||
|
Price_per_Ride |
-203.7555 |
53.6060 |
-3.8010 |
0.0009 |
-314.6477 |
-92.8632 |
|
|||||||
|
Income |
-0.0691 |
0.0093 |
-7.4576 |
0.0000 |
-0.0882 |
-0.0499 |
|
|||||||
|
Parking_Rate |
212.0409 |
38.8528 |
5.4575 |
0.0000 |
131.6679 |
292.4140 |
|
|||||||
For first regression
table, #Weekly Riders= 124.43-166.96Price per Ride+0.621Population- 0.047Income+ 194.68Parking Rate
For second regression
table, #Weekly Riders=1289.68- 203.76Price per Ride- 0.069Income+ 212.04Parking Rate
Table
of correlations |
|
|
|
|
|
|
|
|
Weekly_Riders |
Price_per_Ride |
Population |
Income |
Parking_Rate |
|
Weekly_Riders |
1.000 |
|
|
|
|
|
Price_per_Ride |
-0.896 |
1.000 |
|
|
|
|
Population |
0.946 |
-0.936 |
1.000 |
|
|
|
Income |
-0.934 |
0.944 |
-0.971 |
1.000 |
|
|
Parking_Rate |
-0.825 |
0.955 |
-0.919 |
0.949 |
1.000 |
After
getting the regression coefficients and the correlations, I graphed the
relationships to see how they presented themselves.
After
examining the variables in graph form, I found that the coefficients did not resemble
what I thought the data suggested. I did
multiple regression analyses with and without the population variable. At first, including the population variable,
population and parking rate are the only positive coefficient variables. The other two, price per ride and income have
negative coefficients. When I excluded
the population variable (because I thought this was the variable that would
have no relationship with the dependent variable number of weekly riders), I
found that price per ride and income still have negative coefficients. But the P-values of these coefficients
suggest that they should still be included in the analysis because they are all
less than .05.
The correlations of the coefficients
for price per ride, population, income, and parking rate are -0.896, 0.946,
-0.934, and -0.825 respectively. These
all suggest that there are strong relationships between the dependent and each
independent variable.
Although this data does not show at
all what was expected from the variables, only 5% and 6% respectively of the
total variation in the number of weekly riders is not explained by the
estimated multiple regression model.