2024-05-17发表2024-08-27更新编程学习10 分钟读完 (大约1529个字)0次访问

Lewbel 工具变量法

该命令将 lewbel 检验和 ivreghdfe、reghdfe 结合起来，并补充异方差检验

1 命令格式

1	lewbel varlist [if] [in], Absorb(string) [VCE(string) CLuster(string) Z(string) BY(string) first keep opt(string)]

absorb()：和reghdfe一样，放固定效应即可
vce() & cluster()：聚类稳健标准误，二者选一个即可，如果是低版本的ivreghdfe建议采用cluster，格式参考reghdfe和ivreghdfe
z()：指定使用的外生变量，可以是控制变量的子集，也可以是新的变量。没有指定的时候默认采用所有控制变量
by(): 指定计算中心化时的均值分组，默认采用全样本均值（正如 ivreg2h 所做的）
first：报告第一阶段回归结果
keep：保留生成的工具变量，以 _g 结尾
opt()：其他自定义的ivreghdfe参数，估计也用不上

2 原理

估计方程

$$
Y_{it} = \alpha_0 + \alpha_1 X + \eta’Controls_{i, t} + \delta_i + \lambda_t + \varepsilon_{i, t} \tag{1}
$$

计算残差

$$
X_{it} = \beta_0 + \gamma’Controls_{i, t} + \delta_i + \lambda_t + \mu_{i, t} \tag{2}
$$

从控制变量（包括固定效应中）选取一部分变量作为外生变量 $Z$，这里选取所有的 $Controls$ 作为外生变量 $Z$

将向量 $Z$ 中所有变量均减去自身的全样本均值，然后乘以方程（2）的残差估计值，即：$Z_{IV} = (Z - \overline{Z}) \times \hat{\mu}_{it}$

2SLS

把 $Z_{IV}$ 和 $Controls$ 作为工具变量对方程（1）进行二阶段回归，即

第一阶段
$$
X_{it} = \theta_0 + \theta_1 Z_{IV} + \Phi’Controls_{i, t} + \delta_i + \lambda_t + \sigma_{i, t} \tag{3}
$$
根据（3）的拟合值，估计第二阶段
$$
Y_{it} = \psi_0 + \psi_1 \hat{X} + \Omega’Controls_{i, t} + \delta_i + \lambda_t + \epsilon_{i, t} \tag{4}
$$

3 例子

use "lewbel test.dta", clear

xtset id year
gen Ind_year = string(Industry) + " $ " + string(year)

* Use lewbel command
lewbel y x1 x2-x9 , a(Country Ind_year) cl(Ind_year) keep first
est store m3

* First stage
reghdfe x1 x2_g-x9_g x2-x9, a(Country Ind_year) cl(Ind_year)
est store m1

* Predict
qui predict x1_p

* Second stage
reghdfe y x1_p x2-x9, a(Country Ind_year) cl(Ind_year)
est store m2

reg2docx m1 m2 m3 using "lewbel.docx", replace ///
                 b(%20.4f) t(%20.4f) ///
                 scalars(N(%20.0fc) r2_a(%20.4f) HetLM(%20.4f) HetLMp(%20.4f) ///
                 Kleibergen_Paap_rk_LM(%20.4f) Kleibergen_Paap_rk_LM_p(%20.4f) ///
                 Cragg_Donald_Wald_F(%20.4f) Kleibergen_Paap_rk_Wald_F(%20.4f) ///
                 Hansen_J(%20.4f) Hansen_J_p(%20.4f)) ///
                 order(x1 x1_p) ///
                 addfe("Country=YES" "Industry*Year=YES") ///
                 mtitles() ///
                 font("Times New Roman", 6.5) ///
                 margin(top, 3.17cm) margin(bottom, 3.17cm)

Stata 输出结果：

. lewbel y x1 $controls , a(Country Ind_year) cl(Ind_year) keep first
*=============================================================================*
*                       Part 1 Heteroskedasticity test                        *
*=============================================================================*
Breusch–Pagan test for heteroskedasticity
H0: Constant variance
  Chi2(2651) = 4792.7716
    Prob > chi2 = 0.0000

*=============================================================================*
*                          Part 2 2SLS regression                             *
*=============================================================================*
(MWFE estimator converged in 8 iterations)

First-stage regressions
-----------------------


First-stage regression of x1:

Statistics robust to heteroskedasticity and clustering on Ind_year
Number of obs =                  20537
Number of clusters (Ind_year) =   2629
------------------------------------------------------------------------------
             |               Robust
          x1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        x2_g |   .3403562    .030574    11.13   0.000     .2804287    .4002837
        x3_g |   .0209829   .0210201     1.00   0.318    -.0202182    .0621839
        x4_g |  -.1824731   .0274428    -6.65   0.000    -.2362632    -.128683
        x5_g |  -.0202362   .0187698    -1.08   0.281    -.0570265    .0165541
        x6_g |   .0300076   .0200092     1.50   0.134    -.0092121    .0692273
        x7_g |   .0264974   .0227246     1.17   0.244    -.0180446    .0710393
        x8_g |   .0312092   .0336754     0.93   0.354    -.0347972    .0972156
        x9_g |   .1917831   .0199648     9.61   0.000     .1526505    .2309157
          x2 |   .5211517   .0056266    92.62   0.000     .5101231    .5321804
          x3 |   .0167608   .0049286     3.40   0.001     .0071003    .0264213
          x4 |   .0049114   .0058779     0.84   0.403    -.0066098    .0164326
          x5 |   .0670301   .0052558    12.75   0.000     .0567283    .0773319
          x6 |  -.0271468   .0044682    -6.08   0.000    -.0359047   -.0183888
          x7 |   .0478391   .0057346     8.34   0.000     .0365988    .0590794
          x8 |  -.0415018   .0050295    -8.25   0.000    -.0513599   -.0316436
          x9 |  -.0220721   .0042422    -5.20   0.000    -.0303873    -.013757
------------------------------------------------------------------------------
F test of excluded instruments:
  F(  8,  2628) =    78.98
  Prob > F      =   0.0000
Sanderson-Windmeijer multivariate F test of excluded instruments:
  F(  8,  2628) =    78.98
  Prob > F      =   0.0000



Summary results for first-stage regressions
-------------------------------------------

                                           (Underid)            (Weak id)
Variable     | F(  8,  2628)  P-val | SW Chi-sq(  8) P-val | SW F(  8,  2628)
x1           |      78.98    0.0000 |      633.01   0.0000 |       78.98

NB: first-stage test statistics cluster-robust

Stock-Yogo weak ID F test critical values for single endogenous regressor:
                                    5% maximal IV relative bias    20.25
                                   10% maximal IV relative bias    11.39
                                   20% maximal IV relative bias     6.69
                                   30% maximal IV relative bias     4.99
                                   10% maximal IV size             33.84
                                   15% maximal IV size             18.54
                                   20% maximal IV size             13.24
                                   25% maximal IV size             10.50
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for i.i.d. errors only.

Underidentification test
Ho: matrix of reduced form coefficients has rank=K1-1 (underidentified)
Ha: matrix has rank=K1 (identified)
Kleibergen-Paap rk LM statistic          Chi-sq(8)=111.38   P-val=0.0000

Weak identification test
Ho: equation is weakly identified
Cragg-Donald Wald F statistic                                     455.55
Kleibergen-Paap Wald rk F statistic                                78.98

Stock-Yogo weak ID test critical values for K1=1 and L1=8:
                                    5% maximal IV relative bias    20.25
                                   10% maximal IV relative bias    11.39
                                   20% maximal IV relative bias     6.69
                                   30% maximal IV relative bias     4.99
                                   10% maximal IV size             33.84
                                   15% maximal IV size             18.54
                                   20% maximal IV size             13.24
                                   25% maximal IV size             10.50
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.

Weak-instrument-robust inference
Tests of joint significance of endogenous regressors B1 in main equation
Ho: B1=0 and orthogonality conditions are valid
Anderson-Rubin Wald test           F(8,2628)=      4.74     P-val=0.0000
Anderson-Rubin Wald test           Chi-sq(8)=     38.02     P-val=0.0000
Stock-Wright LM S statistic        Chi-sq(8)=     38.09     P-val=0.0000

NB: Underidentification, weak identification and weak-identification-robust
    test statistics cluster-robust

Number of clusters             N_clust  =       2629
Number of observations               N  =      20537
Number of regressors                 K  =          9
Number of endogenous regressors      K1 =          1
Number of instruments                L  =         16
Number of excluded instruments       L1 =          8

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on Ind_year

Number of clusters (Ind_year) =   2629                Number of obs =    20537
                                                      F(  9,  2628) =   422.69
                                                      Prob > F      =   0.0000
Total (centered) SS     =   14633.8475                Centered R2   =   0.2665
Total (uncentered) SS   =   14633.8475                Uncentered R2 =   0.2665
Residual SS             =  10733.40994                Root MSE      =    .7233

------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0958878   .0328708    -2.92   0.004    -.1603431   -.0314325
          x2 |   .3531729   .0213118    16.57   0.000     .3113833    .3949625
          x3 |  -.0767336    .007376   -10.40   0.000    -.0911971   -.0622702
          x4 |  -.0302413   .0083046    -3.64   0.000    -.0465255   -.0139571
          x5 |   .0566474   .0073346     7.72   0.000     .0422652    .0710295
          x6 |   -.134026   .0080818   -16.58   0.000    -.1498734   -.1181787
          x7 |   .0218564   .0087345     2.50   0.012     .0047292    .0389836
          x8 |    .042576   .0087935     4.84   0.000     .0253332    .0598188
          x9 |  -.3142137   .0076184   -41.24   0.000    -.3291524    -.299275
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):            111.379
                                                   Chi-sq(8) P-val =    0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):              455.554
                         (Kleibergen-Paap rk Wald F statistic):         78.984
Stock-Yogo weak ID test critical values:  5% maximal IV relative bias    20.25
                                         10% maximal IV relative bias    11.39
                                         20% maximal IV relative bias     6.69
                                         30% maximal IV relative bias     4.99
                                         10% maximal IV size             33.84
                                         15% maximal IV size             18.54
                                         20% maximal IV size             13.24
                                         25% maximal IV size             10.50
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):        30.013
                                                   Chi-sq(7) P-val =    0.0001
------------------------------------------------------------------------------
Instrumented:         x1
Included instruments: x2 x3 x4 x5 x6 x7 x8 x9
Excluded instruments: x2_g x3_g x4_g x5_g x6_g x7_g x8_g x9_g
Partialled-out:       _cons
                      nb: total SS, model F and R2s are after partialling-out;
                          any small-sample adjustments include partialled-out
                          variables in regressor count K
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
     Country |        15           1          14     |
    Ind_year |      2629        2629           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

. est store m3

. 
. * First stage
. reghdfe x1 x2_g-x9_g x2-x9, a(Country Ind_year) cl(Ind_year)
(MWFE estimator converged in 8 iterations)

HDFE Linear regression                            Number of obs   =     20,537
Absorbing 2 HDFE groups                           F(  16,   2628) =     909.28
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.8245
                                                  Adj R-squared   =     0.7984
                                                  Within R-sq.    =     0.5411
Number of clusters (Ind_year) =      2,629        Root MSE        =     0.4494

                           (Std. err. adjusted for 2,629 clusters in Ind_year)
------------------------------------------------------------------------------
             |               Robust
          x1 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        x2_g |   .3403562   .0305747    11.13   0.000     .2804032    .4003092
        x3_g |   .0209829   .0210206     1.00   0.318    -.0202358    .0622015
        x4_g |  -.1824731   .0274435    -6.65   0.000    -.2362861     -.12866
        x5_g |  -.0202362   .0187703    -1.08   0.281    -.0570422    .0165698
        x6_g |   .0300076   .0200097     1.50   0.134    -.0092288     .069244
        x7_g |   .0264974   .0227251     1.17   0.244    -.0180636    .0710583
        x8_g |   .0312092   .0336762     0.93   0.354    -.0348254    .0972437
        x9_g |   .1917831   .0199653     9.61   0.000     .1526338    .2309324
          x2 |   .5211517   .0056268    92.62   0.000     .5101184    .5321851
          x3 |   .0167608   .0049288     3.40   0.001     .0070962    .0264255
          x4 |   .0049114   .0058781     0.84   0.403    -.0066147    .0164375
          x5 |   .0670301   .0052559    12.75   0.000      .056724    .0773363
          x6 |  -.0271468   .0044683    -6.08   0.000    -.0359084   -.0183851
          x7 |   .0478391   .0057348     8.34   0.000      .036594    .0590842
          x8 |  -.0415018   .0050296    -8.25   0.000    -.0513641   -.0316394
          x9 |  -.0220721   .0042423    -5.20   0.000    -.0303908   -.0137535
       _cons |  -3.630724   .0837772   -43.34   0.000       -3.795   -3.466448
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
     Country |        15           1          14     |
    Ind_year |      2629        2629           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

. est store m1

. 
. * Predict
. qui predict x1_p

. 
. * Second stage
. reghdfe y x1_p x2-x9, a(Country Ind_year) cl(Ind_year)
(MWFE estimator converged in 8 iterations)

HDFE Linear regression                            Number of obs   =     20,537
Absorbing 2 HDFE groups                           F(   9,   2628) =     423.63
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.4743
                                                  Adj R-squared   =     0.3964
                                                  Within R-sq.    =     0.2652
Number of clusters (Ind_year) =      2,629        Root MSE        =     0.7754

                           (Std. err. adjusted for 2,629 clusters in Ind_year)
------------------------------------------------------------------------------
             |               Robust
           y | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        x1_p |  -.0958878   .0339678    -2.82   0.005    -.1624942   -.0292814
          x2 |   .3531729   .0214141    16.49   0.000     .3111827    .3951632
          x3 |  -.0767336   .0073749   -10.40   0.000    -.0911947   -.0622725
          x4 |  -.0302413   .0082708    -3.66   0.000    -.0464593   -.0140233
          x5 |   .0566474    .007391     7.66   0.000     .0421546    .0711401
          x6 |   -.134026   .0080965   -16.55   0.000    -.1499023   -.1181498
          x7 |   .0218564   .0088265     2.48   0.013     .0045489    .0391639
          x8 |    .042576   .0087784     4.85   0.000     .0253627    .0597892
          x9 |  -.3142137   .0076214   -41.23   0.000    -.3291582   -.2992692
       _cons |  -3.162962    .208469   -15.17   0.000    -3.571741   -2.754182
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
     Country |        15           1          14     |
    Ind_year |      2629        2629           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

图表导出结果：

4 命令 & 数据下载

GitHub 仓库：https://github.com/codefoxs/Stata-personal

* Install
net install command, from("https://raw.githubusercontent.com/codefoxs/Stata-personal/main/lewbel/") replace

* Version
which lewbel

5 参考文献

Lewbel, A. (2012). Using Heteroscedasticity to Identify and Estimate Mismeasured and Endogenous Regressor Models. Journal of Business & Economic Statistics, 30(1), 67–80. https://doi.org/10.1080/07350015.2012.643126

Lewbel 工具变量法

https://codefoxs.github.io/2024/05/17/3-Lewbel-工具变量法/

作者

CodeFox

发布于

2024-05-17

更新于

2024-08-27

许可协议

Lewbel 工具变量法

1 命令格式

2 原理

3 例子

4 命令 & 数据下载

5 参考文献

作者

发布于

更新于

许可协议

评论

链接

分类

标签

归档

目录