<- read.table("http://cameron.econ.ucdavis.edu/mmabook/mma12p2mslmsm.asc")
data <- data[, 1]
u <- data[, 2]
e <- data[, 3]
y <- length(u)
numobs <- 10000 simreps
Simulated Maximum Likelihood with R
This document details section 12.4.5. Unobserved Heterogeneity Example from Cameron and Trivedi’s book - MICROECONOMETRICS: Methods and Applications. The original source code giving the results from table 12.2 are available from the authors' site here and written for Stata. This is an attempt to translate the code to R. I’d like to thank Reddit user anonemouse2010 for his advice which helped me write the function.
Consult the original source code if you want to read the authors' comments. If you want the R source code without all the commentaries, grab it here. This is not guaranteed to work, nor to be correct. It could set your pet on fire and/or eat your first born. Use at your own risk. I may, or may not, expand this example. Corrections, constructive criticism are welcome.
The model is \(y=\theta+u+\varepsilon\) where \(\theta\) is a scalar parameter equal to 1. \(u\) is extreme value type 1 (Gumbel distribution), \(\varepsilon \leadsto \mathbb{N}(0,1)\). For more details, consult the book.
Import the data
You can consult the original source code to see how the authors simulated the data. To get the same results, and verify that I didn't make mistakes I prefer importing their data directly from their website.
Simulation
In the code below, the following likelihood function:
\[\log{\hat{L}_N(\theta)} = \dfrac{1}{N} \sum_{i=1}^N\log{\big( \dfrac{1}{S}\sum_{s=1}^S \dfrac{1}{\sqrt{2\pi}} \exp \{ -(-y_i-\theta-u_i^s)^2/2 \}\big)}\]
which can be found on page 397 is programmed using the functionsapply
.
<- function(theta) {
denssim <- mean(sapply(y, function(y) log(mean((1/sqrt(2 * pi)) * exp(-(y - theta + log(-log(runif(simreps))))^2/2)))))
loglik return(-loglik)
}
This likelihood is then maximized:
system.time(res <- optim(0.1, denssim, method = "BFGS", control = list(maxit = simreps)))
user system elapsed
1.780 0.231 2.010
Convergence is achieved pretty rapidly, to
res
$par
[1] 1.180305
$value
[1] 1.920577
$counts
function gradient
76 8
$convergence
[1] 0
$message
NULL
which is close to the true value of the parameter 1 (which was used to generate the data).
Let's try again with another parameter value, for example ( ). We have to generate y again:
<- 2.5 + u + e y2
and slightly modify the likelihood:
<- function(theta) {
denssim2 <- mean(
loglik sapply(
y2,function(y2) log(mean((1/sqrt(2 * pi)) * exp(-(y2 - theta + log(-log(runif(simreps))))^2/2)))))
return(-loglik)
}
which can then be maximized:
system.time(res2 <- optim(0.1, denssim2, method = "BFGS", control = list(maxit = simreps)))
user system elapsed
3.164 0.337 3.500
The value that maximizes the likelihood is:
res2
$par
[1] 2.654767
$value
[1] 1.920779
$counts
function gradient
129 18
$convergence
[1] 0
$message
NULL
which is close to the true value of the parameter 2.5 (which was used to generate the data).