三.关于中国民航客运量模型建立与研究
为研究我国民航客运量的变化趋势及其成因,我们以民航客运量作为因变量Y,以国民收入,消费额,铁路客运量,民航航线里程,来华旅游入境人数为影响民航客运量的主要因素。Y表示民航客运量(万人),X1表示国民收入(亿元),X2表示消费额(亿元),X3表示铁路客运量(万人),X4表示民航航线里程(万公里),X5表示来华旅游入境人数(万人)。统计数据如下:
年份 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 19 1990 1991 1992 1993 y 231 298 343 401 445 391 554 744 997 1310 1442 1283 1660 2178 2886 3383 x1 3010 3350 3688 3941 4258 4736 5652 7020 7859 9313 11738 13176 14384 16557 20223 24882 x2 1888 2195 2531 2799 3054 3358 3905 4879 5552 6386 8038 9005 9663 10969 12985 15949 x3 81491 863 92204 95300 99922 106044 110353 112110 108579 112429 1225 113807 95712 95081 99693 105458 x4 14. 16.00 19.53 21.82 23.27 22.91 26.02 27.72 32.43 38.91 37.38 47.19 50.68 55.91 83.66 96.08 x5 180.92 420.39 570.25 776.71 792.43 947.70 1285.22 1783.30 2281.95 2690.23 3169.48 2450.14 2746.20 3335.65 3311.50 4152.70
现对该数据进行分析如下:
回归分析
Correlations
y x1 x2 x3 x4 x5
y Pearson Correlation Sig. (2-tailed) N 1 .9** .000 .985** .000 16 .999** .000 .227 .398 16 .258 .335 16 .2 .278 .987** .000 16 .984** .000 16 .978** .000 16 .213 .428 .924** .000 16 .930** .000 16 .942** .000 16 .504* .046 16 .882** .000 16 .9** .000 16 .985** .000 16 .227 .398 16 .987** .000 16 .924** .000 16 16 1 x1 Pearson Correlation Sig. (2-tailed) N 16 .999** .000 16 .258 .335 16 .984** .000 16 .930** .000 16 16 1 x2 Pearson Correlation Sig. (2-tailed) N 16 .2 .278 16 .978** .000 16 .942** .000 16 16 1 x3 Pearson Correlation Sig. (2-tailed) N 16 .213 .428 16 .504* .046 16 16 1 x4 Pearson Correlation Sig. (2-tailed) N 16 .882** .000 16 16 1 x5 Pearson Correlation Sig. (2-tailed) N 16 **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed). (1)从相关阵看出,y与x1,x2,x4,x5的相关系数都在0.9以上,说明所选自变量
与y高度线性相关,用y与自变量作多元线性回归是适当的。 (2)y与x3的相关系数ry3=0.227偏小,P值=0.398,x3是铁路客运量,这说明铁路客运量对民航客运量无显著影响。
Model Summary
Adjusted R
Model 1
R .999a R Square
.998 Square
.997 Std. Error of the
Estimate
49.49240 a. Predictors: (Constant), x5, x3, x4, x2, x1
ANOVAb
Model 1
Regression Residual
Sum of Squares
1.382E7 24494.981 df
5 10 Mean Square 2763775.354 2449.498
F 1.128E3 Sig. .000a
Total 1.384E7 15
a. Predictors: (Constant), x5, x3, x4, x2, x1 b. Dependent Variable: y
Coefficientsa
Standardized
Unstandardized Coefficients
Model 1
(Constant) x1 x2 x3 x4 x5
a. Dependent Variable: y
B 450.909 .354 -.561 -.007 21.578 .435 Std. Error
178.078 .085 .125 .002 4.030 .052 2.447 -2.485 -.083 .531 .5 Coefficients
Beta
t 2.532 4.152 -4.478 -3.510 5.354 8.440 Sig.
.030 .002 .001 .006 .000 .000
(3)由上图知:回归方程为
y=450.909+0.354x1-0.561x2-0.007x3+21.578x4+0.435x5
(4)复相关系数R=0.999,决定系数R*R=0.998,由决定系数看回归方程高度显著。 (5)有方差分析表可知:F=1128.303,P值=0.000,表明回归方程高度显著,说明x1,x2,x3,x4,x5整体上对y有高度显著的线性影响。
(6)回归系数的显著性检验。自变x1,x2,x3,x4,x5对y均有显著影响,其中x3铁路客运量的P值=0.006最大,但仍在1%的显著性水平上对y高度显著,这充分说明在多元线性回归中不能仅凭简单相关系数的大小而决定变量的取舍。
(7)疑问:x2的回归系数为-0.561是负的,x2是消费额,负的回归系数说明与y呈负相关,显然是不合理的!故进一步分析;由关于相关系数分析表知:其原因是自变量之间的共线性(x1是国民收入,x2是消费额,二者的简单相关系数r12=0.99,高度相关!),故在多重共线性部分作进一步改进。
多重共线性分析
Coefficientsa Standardized Unstandardized Coefficients Model 1 (Constant) B 450.909 Std. Error 178.078 Coefficients Beta t 2.532 Sig. .030 Collinearity Statistics Tolerance VIF x1 x2 x3 x4 x5 a. Dependent Variable: y .354 -.561 -.007 21.578 .435 .085 .125 .002 4.030 .052 2.447 -2.485 -.083 .531 .5 4.152 -4.478 -3.510 5.354 8.440 .002 .001 .006 .000 .000 .001 .001 .315 .018 .040 1.963E1.741E3.1755.4825.19 Collinearity Diagnosticsa DimensiModel 1 on 1 2 3 4 5 6 Eigenvalue 5.578 .378 .037 .004 .002 8.080E-5 Condition Index 1.000 3.842 12.205 36.431 53.3 262.762 (Constant) .00 .00 .01 .17 .72 .10 x1 .00 .00 .00 .00 .00 .99 Variance Proportions x2 .00 .00 .00 .01 .01 .99 x3 .00 .00 .00 .09 .66 .25 x4 .00 .00 .03 .50 .15 .31 x5 a. Dependent Variable: y (8)有上述图表知:VIF1=1963最大,且远大于十,故剔除x1,建立y对四个自变量x2,x3,x4,x5的回归方程后,再求回归方程。 Coefficientsa Standardized Unstandardized Coefficients Model 1 (Constant) x2 x3 x4 x5 a. Dependent Variable: y B 695.039 -.053 -.012 32.037 .399 Std. Error 2.525 .042 .003 4.951 .080 -.233 -.134 .788 .517 Coefficients Beta t 2.627 -1.262 -4.207 6.471 4.988 Sig. .024 .233 .001 .000 .000 .013 .431 .030 .041 Collinearity Statistics Tolerance VIF 77.542.3133.8124.46 (9)由上述输出结果知:x2的方差扩大因子VIF=77.5为最大,远大于10,并且x2的回归系数B2=-0.053仍然是负值,说明此回归模型仍然存在强多共线性,应继续剔除变量。故再剔除x2,用y与三个自变量x3,x4,x5建立回归方程。
Coefficientsa Standardized Unstandardized Coefficients Model 1 (Constant) x3 x4 x5 a. Dependent Variable: y B 591.876 -.010 26.436 .317 Std. Error 257.730 .003 2.249 .048 -.119 .650 .411 Coefficients Beta t 2.296 -3.934 11.754 6.568 Sig. .040 .002 .000 .000 .504 .150 .117 Collinearity Statistics Tolerance VIF 1.986.658.51 (10)从图表中可看到,3个方差扩大因子都小于10,回归系数也都有合理经济解释,说明此时的回归模型不存在强多重共线性,可以作为最终回归模型! (11)最终回归方程为
y=591.876-0.01037x3+26.436x4+0.317x5
为了避免量纲的影响对其标准化,则标准化回归方程为 y=-0.119x3+0.650x4+0.411x5
(12)由标准化回归系数看到,对民航客运量影响最大因素是民航航运里程,其次是来华旅游入境人数。民航航运里程每增加1%,民航客运量会增加0.650%。来华旅游入境人数每增加1%,名航客运量会增加0.411%。而铁路客运量对民航客运量影响较小,铁路客运量每增加1%,民航客运量会减少0.119%。 Model Summary Adjusted R Model 1 R .997a R Square .994 Square .993 Std. Error of the Estimate 79.78835 a. Predictors: (Constant), x5, x3, x4 (13)由上图表可知:此回归方程的样本决定系数R*R=0.994;调整的样本决定系数R1*R1=0.993,其拟合优度仍保持很高,且回归系数有相应合理的经济解释。
因子分析法
因子分析法
Extraction Sums of Squared Loadings Total 3.991 .932 % of Variance 79.826 18.1 Cumulative % 79.826 98.468 Total Variance Explained Component 1 2 Total 3.991 .932 Initial Eigenvalues % of Variance 79.826 18.1 Cumulative % 79.826 98.468 3 4 5 .065 .011 .000 1.303 .224 .005 99.771 99.995 100.000 .065 .011 .000 1.303 .224 .005 99.771 99.995 100.000 Extraction Method: Principal Component Analysis. (14)由图表知:第一个主成分方差百分比是79.826%,即含有原始五个变量信息量接近80%;第二个主成分方差百分比是18.1%,即含有原始五个变量信息近19%。故二者主成分累计含有原始五个变量信息近98.46%,则可取上述二者主成分足够! Component Matrixa x1 x2 x3 x4 x5 1 .985 .990 .413 .963 .972 2 -.165 -.132 .908 -.214 .128 Component 3 .018 .000 .066 .150 -.195 4 .047 .055 .007 -.0 -.043 5 .012 -.011 .000 -.001 .000 Extraction Method: Principal Component Analysis. a. 5 components extracted. (15)由上表知:
Y1=-2.4+0.00003714x1+0.00005831x2+0.000009394x3+0.01022x4+0.0001957x5 Y2=-8.426-0.00002672x1-0.00003332x2+0.00008851x3-0.009708x4+0.0001105x5 y=1159.125+936.781Y1-185.876Y2
再进行替换知:y=416+0.03976x1+0.06082x2-0.007652x3+11.37x4+0.1628x5. (16)和多重线性分析作比较,我们可以发现,各因素对y影响程度相近,且相应经济解释也合理! Communalities x1 x2 x3 x4 x5 Initial 1.000 1.000 1.000 1.000 1.000 Extraction .997 .997 .996 .973 .960 Extraction Method: Principal Component Analysis. (17)由上述图表知:特殊因子的方差分别为0.03,0.03,0.04,0.27,0.04。故可忽略特殊因子作用。 Total Variance Explained Component 1 2 3 4 5 Total 3.991 .932 .065 .011 .000 Initial Eigenvalues % of Variance 79.826 18.1 1.303 .224 .005 Cumulative % 79.826 98.468 99.771 99.995 100.000 Extraction Sums of Squared Loadings Total 3.991 .932 % of Variance 79.826 18.1 Cumulative % 79.826 98.468 Extraction Method: Principal Component Analysis. Component Matrixa x1 x2 x3 x4 x5 Component 1 .985 .990 .413 .963 .972 2 -.165 -.132 .908 -.214 .128 Extraction Method: Principal Component Analysis. a. 2 components extracted. Component Score Coefficient
Matrix
x1 x2 x3 x4 x5
Component 1 .247 .248 .103 .241 .243 2 -.178 -.142 .975 -.229 .137 Extraction Method: Principal Component Analysis.
(18)X1=0.985F1-0.165F2 X2=0.990F1-0.132F2
X3=0.413F1+0.918F2 X4=0.963F1-0.214F2 X5=0.972F1+0.128F2.
(19)由因子分析模型知,第一个主因子F1主要受国民收入,消费额,名航航线里程,来华旅游入境人数影响;第二个主因子F2主要受铁路客运量影响。 (20)由因子载荷矩阵,我们可进F1=0.985X1+0.990X2+0.413X3+0.963X4+0.972X5; F2=-0.165X1-0.132X2+0.918X3-0.214X4+0.128X5.
把相应X值代入得出一系列值,然后与y建立回归方程。
一步得出