Simulation Setup: Non-Normally Distributed Errors






True DGP:

Estimated Model:




Simulation Results

Set values at left, click 'Simulate!' button to run, and wait 3-15 seconds for results to appear (see loading bar at upper or lower right).



Why do non-normally distributed errors matter?

OLS assumes normally distributed errors for hypothesis testing purposes. If we don't have them, and we incorrectly assume we do, the estimates will be **inefficient**. An estimator that can handle the non-normal errors will always be more efficient, asymptotically (Greene 2012, 73-75).


What should I see?

This is a naive model--as if you opened the dataset and ran an OLS regression using the duration, without transforming the duration at all. The standard errors will be inefficient for the reasons noted above. However, the estimates will also be biased because the true DGP is non-linear in parameters. OLS requires linearity in parameters to obtain unbiased estimates; see the OLS assumption simulation here.


Where should I look to see that?

Biased: Top two rows won't match for all columns; top row won't fall in between values in 3rd and 4th rows for all columns.
Inefficient: 5th row's values will be larger than the Weibull's 5th row (for a very rough heuristic; the Weibull constitutes the 'No Violations' scenario).



Why do non-normally distributed errors matter?

OLS assumes normally distributed errors for hypothesis testing purposes. If we don't have them, and we incorrectly assume we do, the estimates will be **inefficient**. An estimator that can handle the non-normal errors will always be more efficient, asymptotically (Greene 2012, 73-75).


What should I see?

We have transformed the duration by taking the natural log, which gets rid of the non-linearity in parameters. As a consequence, the OLS estimates will now be unbiased. The exception is the constant term, which will still be biased as OLS does not *explicitly* model the shape parameter. (OLS's closest approximation is the RMSE, the inverse of which is reported here.) However, OLS still assumes the stochastic error is distributed normally. The standard errors will still be inefficient.


Where should I look to see that?

Unbiased: Top two rows should match/be close for all columns except intercept and shape; top row should fall in between values in 3rd and 4th rows for all columns.
Inefficient: 5th row's values will be larger than the Weibull's 5th row (for a very rough heuristic; the Weibull constitutes the 'No Violations' scenario).



Why do non-normally distributed errors matter?

OLS assumes normally distributed errors for hypothesis testing purposes. If we don't have them, and we incorrectly assume we do, the estimates will be **inefficient**. An estimator that can handle the non-normal errors will always be more efficient, asymptotically (Greene 2012, 73-75).


What should I see?

The Weibull parametric duration model assumes non-normal errors. Specifically, it assumes Type I Extreme Value (minimum) errors--the same as the simulation's true DGP. As a result, the Weibull estimates will now be unbiased and efficient.


Where should I look to see that?

Unbiased: Top two rows should match/be close for all columns; top row should fall in between values in 3rd and 4th rows for all columns.
Efficient: 5th row will be smallest of all three models' fifth rows (for a very rough heuristic). Additionally, 5th and 6th rows will match/be close for all columns.

NOTE: must click 'Simulate!' on 'Main' tab first.

NOTE: must click 'Simulate!' on 'Main' tab first.

Simulation Setup: Right Censoring






True DGP:

Estimated Model:




Simulation Results

Set values at left, click 'Simulate!' button to run, and wait 3-15 seconds for results to appear (see loading bar at upper or lower right).



Why does right-censored data matter?

Censoring is one of the most frequently mentioned reasons for OLS' inappropriateness. A (right-)censored duration is one where a subject does not fail before our observation period ends. As a result, we do not observe their actual failure time. Instead, we only know they survived *up to* the end of our observation period. Right censoring creates a problem for OLS because the estimator cannot handle it. Instead, OLS treats all subjects as failing at the time we record, which is clearly not true.


What should I see?

From the 'Non-Normal' tab, we've already established that OLS with the untransformed duration performs poorly because of non-linearity in parameters. Right-censored data makes OLS' performance even worse, because OLS' inability to model right censoring properly. As a consequence, the estimates will be biased and inefficient.


Where should I look to see that?

Biased: Top two rows won't match for all columns; top row won't fall in between values in 3rd and 4th rows for all columns. Additionally, these estimates will also differ from the OLS estimates on the 'Non-Normal' tab (provided that all common slider/field values are identical).
Inefficient: 5th row's values will be larger than the Weibull's 5th row (for a very rough heuristic; the Weibull constitutes the 'No Violations' scenario).



Why does right-censored data matter?

Censoring is one of the most frequently mentioned reasons for OLS' inappropriateness. A (right-)censored duration is one where a subject does not fail before our observation period ends. As a result, we do not observe their actual failure time. Instead, we only know they survived *up to* the end of our observation period. Right censoring creates a problem for OLS because the estimator cannot handle it. Instead, OLS treats all subjects as failing at the time we record, which is clearly not true.


What should I see?

Despite our transformation of t to remove the non-linearity in parameters, these OLS estimates will still be biased and inefficient, because OLS's inability to properly model right-censored durations.


Where should I look to see that?

Biased: Top two rows won't match for all columns; top row won't fall in between values in 3rd and 4th rows for all columns.
Inefficient: 5th row's values will be larger than the Weibull's 5th row (for a very rough heuristic; the Weibull constitutes the 'No Violations' scenario).



Why does right-censored data matter?

Censoring is one of the most frequently mentioned reasons for OLS' inappropriateness. A (right-)censored duration is one where a subject does not fail before our observation period ends. As a result, we do not observe their actual failure time. Instead, we only know they survived *up to* the end of our observation period. Right censoring creates a problem for OLS because the estimator cannot handle it. Instead, OLS treats all subjects as failing at the time we record, which is clearly not true.


What should I see?

Censored linear regression is a special type of linear regression that can handle right censoring. Therefore, the estimates will now be unbiased except for the constant term, which will still be biased because OLS does not *explicitly* model the shape parameter. However, censored linear regression assumes the errors are normally distributed, same as OLS (which is potentially problematic, see 'Non-Normal Errors' tab). As a result, the standard errors will still be inefficient.


Where should I look to see that?

Unbiased: Top two rows should match/be close for all columns except intercept and shape; top row should fall in between values in 3rd and 4th rows for all columns.
Inefficient: 5th row's values will be larger than the Weibull's 5th row (for a very rough heuristic; the Weibull constitutes the 'No Violations' scenario).



Why does right-censored data matter?

Censoring is one of the most frequently mentioned reasons for OLS' inappropriateness. A (right-)censored duration is one where a subject does not fail before our observation period ends. As a result, we do not observe their actual failure time. Instead, we only know they survived *up to* the end of our observation period. Right censoring creates a problem for OLS because the estimator cannot handle it. Instead, OLS treats all subjects as failing at the time we record, which is clearly not true.


What should I see?

We already know the Weibull duration model assumes non-normal errors (TIEVmin, specifically; see 'Non-Normal Errors' tab). In addition, though, all duration models can handle right-censored data by modeling it properly (i.e., not treating the censored observations as if they are observed failure times). The Weibull is no exception. As a result, the Weibull estimates will be both unbiased and efficient.


Where should I look to see that?

Unbiased: Top two rows should match/be close for all columns; top row should fall in between values in 3rd and 4th rows for all columns.
Efficient: 5th row will be smallest of all three models' fifth rows (for a very rough heuristic). Additionally, 5th and 6th rows will match/be close for all columns.

NOTE: must click 'Simulate!' on 'Main' tab first.

NOTE: must click 'Simulate!' on 'Main' tab first.