Lewbel 工具变量法

Lewbel 工具变量法

该命令将 lewbel 检验和 ivreghdfe、reghdfe 结合起来,并补充异方差检验

1 命令格式

1
lewbel varlist [if] [in], Absorb(string) [VCE(string) CLuster(string) Z(string) BY(string) first keep opt(string)]
  • absorb():和reghdfe一样,放固定效应即可
  • vce() & cluster():聚类稳健标准误,二者选一个即可,如果是低版本的ivreghdfe建议采用cluster,格式参考reghdfe和ivreghdfe
  • z():指定使用的外生变量,可以是控制变量的子集,也可以是新的变量。没有指定的时候默认采用所有控制变量
  • by(): 指定计算中心化时的均值分组,默认采用全样本均值(正如 ivreg2h 所做的)
  • first:报告第一阶段回归结果
  • keep:保留生成的工具变量,以 _g 结尾
  • opt():其他自定义的ivreghdfe参数,估计也用不上

2 原理

  • 估计方程

$$
Y_{it} = \alpha_0 + \alpha_1 X + \eta’Controls_{i, t} + \delta_i + \lambda_t + \varepsilon_{i, t} \tag{1}
$$

  • 计算残差

$$
X_{it} = \beta_0 + \gamma’Controls_{i, t} + \delta_i + \lambda_t + \mu_{i, t} \tag{2}
$$

从控制变量(包括固定效应中)选取一部分变量作为外生变量 $Z$,这里选取所有的 $Controls$ 作为外生变量 $Z$​

将向量 $Z$ 中所有变量均减去自身的全样本均值,然后乘以方程(2)的残差估计值,即:$Z_{IV} = (Z - \overline{Z}) \times \hat{\mu}_{it}$

  • 2SLS

把 $Z_{IV}$ 和 $Controls$​ 作为工具变量对方程(1)进行二阶段回归,即

第一阶段
$$
X_{it} = \theta_0 + \theta_1 Z_{IV} + \Phi’Controls_{i, t} + \delta_i + \lambda_t + \sigma_{i, t} \tag{3}
$$
根据(3)的拟合值,估计第二阶段
$$
Y_{it} = \psi_0 + \psi_1 \hat{X} + \Omega’Controls_{i, t} + \delta_i + \lambda_t + \epsilon_{i, t} \tag{4}
$$

3 例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
use "lewbel test.dta", clear

xtset id year
gen Ind_year = string(Industry) + " $ " + string(year)

* Use lewbel command
lewbel y x1 x2-x9 , a(Country Ind_year) cl(Ind_year) keep first
est store m3

* First stage
reghdfe x1 x2_g-x9_g x2-x9, a(Country Ind_year) cl(Ind_year)
est store m1

* Predict
qui predict x1_p

* Second stage
reghdfe y x1_p x2-x9, a(Country Ind_year) cl(Ind_year)
est store m2

reg2docx m1 m2 m3 using "lewbel.docx", replace ///
b(%20.4f) t(%20.4f) ///
scalars(N(%20.0fc) r2_a(%20.4f) HetLM(%20.4f) HetLMp(%20.4f) ///
Kleibergen_Paap_rk_LM(%20.4f) Kleibergen_Paap_rk_LM_p(%20.4f) ///
Cragg_Donald_Wald_F(%20.4f) Kleibergen_Paap_rk_Wald_F(%20.4f) ///
Hansen_J(%20.4f) Hansen_J_p(%20.4f)) ///
order(x1 x1_p) ///
addfe("Country=YES" "Industry*Year=YES") ///
mtitles() ///
font("Times New Roman", 6.5) ///
margin(top, 3.17cm) margin(bottom, 3.17cm)

Stata 输出结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
. lewbel y x1 $controls , a(Country Ind_year) cl(Ind_year) keep first
*=============================================================================*
* Part 1 Heteroskedasticity test *
*=============================================================================*
Breusch–Pagan test for heteroskedasticity
H0: Constant variance
Chi2(2651) = 4792.7716
Prob > chi2 = 0.0000

*=============================================================================*
* Part 2 2SLS regression *
*=============================================================================*
(MWFE estimator converged in 8 iterations)

First-stage regressions
-----------------------


First-stage regression of x1:

Statistics robust to heteroskedasticity and clustering on Ind_year
Number of obs = 20537
Number of clusters (Ind_year) = 2629
------------------------------------------------------------------------------
| Robust
x1 | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x2_g | .3403562 .030574 11.13 0.000 .2804287 .4002837
x3_g | .0209829 .0210201 1.00 0.318 -.0202182 .0621839
x4_g | -.1824731 .0274428 -6.65 0.000 -.2362632 -.128683
x5_g | -.0202362 .0187698 -1.08 0.281 -.0570265 .0165541
x6_g | .0300076 .0200092 1.50 0.134 -.0092121 .0692273
x7_g | .0264974 .0227246 1.17 0.244 -.0180446 .0710393
x8_g | .0312092 .0336754 0.93 0.354 -.0347972 .0972156
x9_g | .1917831 .0199648 9.61 0.000 .1526505 .2309157
x2 | .5211517 .0056266 92.62 0.000 .5101231 .5321804
x3 | .0167608 .0049286 3.40 0.001 .0071003 .0264213
x4 | .0049114 .0058779 0.84 0.403 -.0066098 .0164326
x5 | .0670301 .0052558 12.75 0.000 .0567283 .0773319
x6 | -.0271468 .0044682 -6.08 0.000 -.0359047 -.0183888
x7 | .0478391 .0057346 8.34 0.000 .0365988 .0590794
x8 | -.0415018 .0050295 -8.25 0.000 -.0513599 -.0316436
x9 | -.0220721 .0042422 -5.20 0.000 -.0303873 -.013757
------------------------------------------------------------------------------
F test of excluded instruments:
F( 8, 2628) = 78.98
Prob > F = 0.0000
Sanderson-Windmeijer multivariate F test of excluded instruments:
F( 8, 2628) = 78.98
Prob > F = 0.0000



Summary results for first-stage regressions
-------------------------------------------

(Underid) (Weak id)
Variable | F( 8, 2628) P-val | SW Chi-sq( 8) P-val | SW F( 8, 2628)
x1 | 78.98 0.0000 | 633.01 0.0000 | 78.98

NB: first-stage test statistics cluster-robust

Stock-Yogo weak ID F test critical values for single endogenous regressor:
5% maximal IV relative bias 20.25
10% maximal IV relative bias 11.39
20% maximal IV relative bias 6.69
30% maximal IV relative bias 4.99
10% maximal IV size 33.84
15% maximal IV size 18.54
20% maximal IV size 13.24
25% maximal IV size 10.50
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for i.i.d. errors only.

Underidentification test
Ho: matrix of reduced form coefficients has rank=K1-1 (underidentified)
Ha: matrix has rank=K1 (identified)
Kleibergen-Paap rk LM statistic Chi-sq(8)=111.38 P-val=0.0000

Weak identification test
Ho: equation is weakly identified
Cragg-Donald Wald F statistic 455.55
Kleibergen-Paap Wald rk F statistic 78.98

Stock-Yogo weak ID test critical values for K1=1 and L1=8:
5% maximal IV relative bias 20.25
10% maximal IV relative bias 11.39
20% maximal IV relative bias 6.69
30% maximal IV relative bias 4.99
10% maximal IV size 33.84
15% maximal IV size 18.54
20% maximal IV size 13.24
25% maximal IV size 10.50
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.

Weak-instrument-robust inference
Tests of joint significance of endogenous regressors B1 in main equation
Ho: B1=0 and orthogonality conditions are valid
Anderson-Rubin Wald test F(8,2628)= 4.74 P-val=0.0000
Anderson-Rubin Wald test Chi-sq(8)= 38.02 P-val=0.0000
Stock-Wright LM S statistic Chi-sq(8)= 38.09 P-val=0.0000

NB: Underidentification, weak identification and weak-identification-robust
test statistics cluster-robust

Number of clusters N_clust = 2629
Number of observations N = 20537
Number of regressors K = 9
Number of endogenous regressors K1 = 1
Number of instruments L = 16
Number of excluded instruments L1 = 8

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on Ind_year

Number of clusters (Ind_year) = 2629 Number of obs = 20537
F( 9, 2628) = 422.69
Prob > F = 0.0000
Total (centered) SS = 14633.8475 Centered R2 = 0.2665
Total (uncentered) SS = 14633.8475 Uncentered R2 = 0.2665
Residual SS = 10733.40994 Root MSE = .7233

------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | -.0958878 .0328708 -2.92 0.004 -.1603431 -.0314325
x2 | .3531729 .0213118 16.57 0.000 .3113833 .3949625
x3 | -.0767336 .007376 -10.40 0.000 -.0911971 -.0622702
x4 | -.0302413 .0083046 -3.64 0.000 -.0465255 -.0139571
x5 | .0566474 .0073346 7.72 0.000 .0422652 .0710295
x6 | -.134026 .0080818 -16.58 0.000 -.1498734 -.1181787
x7 | .0218564 .0087345 2.50 0.012 .0047292 .0389836
x8 | .042576 .0087935 4.84 0.000 .0253332 .0598188
x9 | -.3142137 .0076184 -41.24 0.000 -.3291524 -.299275
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic): 111.379
Chi-sq(8) P-val = 0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 455.554
(Kleibergen-Paap rk Wald F statistic): 78.984
Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 20.25
10% maximal IV relative bias 11.39
20% maximal IV relative bias 6.69
30% maximal IV relative bias 4.99
10% maximal IV size 33.84
15% maximal IV size 18.54
20% maximal IV size 13.24
25% maximal IV size 10.50
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments): 30.013
Chi-sq(7) P-val = 0.0001
------------------------------------------------------------------------------
Instrumented: x1
Included instruments: x2 x3 x4 x5 x6 x7 x8 x9
Excluded instruments: x2_g x3_g x4_g x5_g x6_g x7_g x8_g x9_g
Partialled-out: _cons
nb: total SS, model F and R2s are after partialling-out;
any small-sample adjustments include partialled-out
variables in regressor count K
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
Country | 15 1 14 |
Ind_year | 2629 2629 0 *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

. est store m3

.
. * First stage
. reghdfe x1 x2_g-x9_g x2-x9, a(Country Ind_year) cl(Ind_year)
(MWFE estimator converged in 8 iterations)

HDFE Linear regression Number of obs = 20,537
Absorbing 2 HDFE groups F( 16, 2628) = 909.28
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.8245
Adj R-squared = 0.7984
Within R-sq. = 0.5411
Number of clusters (Ind_year) = 2,629 Root MSE = 0.4494

(Std. err. adjusted for 2,629 clusters in Ind_year)
------------------------------------------------------------------------------
| Robust
x1 | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x2_g | .3403562 .0305747 11.13 0.000 .2804032 .4003092
x3_g | .0209829 .0210206 1.00 0.318 -.0202358 .0622015
x4_g | -.1824731 .0274435 -6.65 0.000 -.2362861 -.12866
x5_g | -.0202362 .0187703 -1.08 0.281 -.0570422 .0165698
x6_g | .0300076 .0200097 1.50 0.134 -.0092288 .069244
x7_g | .0264974 .0227251 1.17 0.244 -.0180636 .0710583
x8_g | .0312092 .0336762 0.93 0.354 -.0348254 .0972437
x9_g | .1917831 .0199653 9.61 0.000 .1526338 .2309324
x2 | .5211517 .0056268 92.62 0.000 .5101184 .5321851
x3 | .0167608 .0049288 3.40 0.001 .0070962 .0264255
x4 | .0049114 .0058781 0.84 0.403 -.0066147 .0164375
x5 | .0670301 .0052559 12.75 0.000 .056724 .0773363
x6 | -.0271468 .0044683 -6.08 0.000 -.0359084 -.0183851
x7 | .0478391 .0057348 8.34 0.000 .036594 .0590842
x8 | -.0415018 .0050296 -8.25 0.000 -.0513641 -.0316394
x9 | -.0220721 .0042423 -5.20 0.000 -.0303908 -.0137535
_cons | -3.630724 .0837772 -43.34 0.000 -3.795 -3.466448
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
Country | 15 1 14 |
Ind_year | 2629 2629 0 *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

. est store m1

.
. * Predict
. qui predict x1_p

.
. * Second stage
. reghdfe y x1_p x2-x9, a(Country Ind_year) cl(Ind_year)
(MWFE estimator converged in 8 iterations)

HDFE Linear regression Number of obs = 20,537
Absorbing 2 HDFE groups F( 9, 2628) = 423.63
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.4743
Adj R-squared = 0.3964
Within R-sq. = 0.2652
Number of clusters (Ind_year) = 2,629 Root MSE = 0.7754

(Std. err. adjusted for 2,629 clusters in Ind_year)
------------------------------------------------------------------------------
| Robust
y | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x1_p | -.0958878 .0339678 -2.82 0.005 -.1624942 -.0292814
x2 | .3531729 .0214141 16.49 0.000 .3111827 .3951632
x3 | -.0767336 .0073749 -10.40 0.000 -.0911947 -.0622725
x4 | -.0302413 .0082708 -3.66 0.000 -.0464593 -.0140233
x5 | .0566474 .007391 7.66 0.000 .0421546 .0711401
x6 | -.134026 .0080965 -16.55 0.000 -.1499023 -.1181498
x7 | .0218564 .0088265 2.48 0.013 .0045489 .0391639
x8 | .042576 .0087784 4.85 0.000 .0253627 .0597892
x9 | -.3142137 .0076214 -41.23 0.000 -.3291582 -.2992692
_cons | -3.162962 .208469 -15.17 0.000 -3.571741 -2.754182
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
Country | 15 1 14 |
Ind_year | 2629 2629 0 *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

图表导出结果:

表格导出结果

4 命令 & 数据下载

GitHub 仓库:https://github.com/codefoxs/Stata-personal

1
2
3
4
5
* Install
net install command, from("https://raw.githubusercontent.com/codefoxs/Stata-personal/main/lewbel/") replace

* Version
which lewbel

5 参考文献

Lewbel, A. (2012). Using Heteroscedasticity to Identify and Estimate Mismeasured and Endogenous Regressor Models. Journal of Business & Economic Statistics, 30(1), 67–80. https://doi.org/10.1080/07350015.2012.643126
作者

CodeFox

发布于

2024-05-17

更新于

2024-08-27

许可协议

评论