Website - Youtube - About - Talks - Books - Packages - RSS

Simulated Maximum Likelihood with R

R
econometrics
Published

January 16, 2013

This document details section 12.4.5. Unobserved Heterogeneity Example from Cameron and Trivedi’s book - MICROECONOMETRICS: Methods and Applications. The original source code giving the results from table 12.2 are available from the authors' site here and written for Stata. This is an attempt to translate the code to R. I’d like to thank Reddit user anonemouse2010 for his advice which helped me write the function.

Consult the original source code if you want to read the authors' comments. If you want the R source code without all the commentaries, grab it here. This is not guaranteed to work, nor to be correct. It could set your pet on fire and/or eat your first born. Use at your own risk. I may, or may not, expand this example. Corrections, constructive criticism are welcome.

The model is \(y=\theta+u+\varepsilon\) where \(\theta\) is a scalar parameter equal to 1. \(u\) is extreme value type 1 (Gumbel distribution), \(\varepsilon \leadsto \mathbb{N}(0,1)\). For more details, consult the book.

Import the data

You can consult the original source code to see how the authors simulated the data. To get the same results, and verify that I didn't make mistakes I prefer importing their data directly from their website.

data <- read.table("http://cameron.econ.ucdavis.edu/mmabook/mma12p2mslmsm.asc")
u <- data[, 1]
e <- data[, 2]
y <- data[, 3]
numobs <- length(u)
simreps <- 10000

Simulation

In the code below, the following likelihood function:

\[\log{\hat{L}_N(\theta)} = \dfrac{1}{N} \sum_{i=1}^N\log{\big( \dfrac{1}{S}\sum_{s=1}^S \dfrac{1}{\sqrt{2\pi}} \exp \{ -(-y_i-\theta-u_i^s)^2/2 \}\big)}\]

which can be found on page 397 is programmed using the function sapply.

denssim <- function(theta) {
    loglik <- mean(sapply(y, function(y) log(mean((1/sqrt(2 * pi)) * exp(-(y - theta + log(-log(runif(simreps))))^2/2)))))
    return(-loglik)
}

This likelihood is then maximized:

system.time(res <- optim(0.1, denssim, method = "BFGS", control = list(maxit = simreps)))
   user  system elapsed 
  1.780   0.231   2.010 

Convergence is achieved pretty rapidly, to

res
$par
[1] 1.180305

$value
[1] 1.920577

$counts
function gradient 
      76        8 

$convergence
[1] 0

$message
NULL

which is close to the true value of the parameter 1 (which was used to generate the data).

Let's try again with another parameter value, for example ( ). We have to generate y again:

y2 <- 2.5 + u + e

and slightly modify the likelihood:

denssim2 <- function(theta) {
  loglik <- mean(
    sapply(
      y2,
      function(y2) log(mean((1/sqrt(2 * pi)) * exp(-(y2 - theta + log(-log(runif(simreps))))^2/2)))))
  return(-loglik)
}

which can then be maximized:

system.time(res2 <- optim(0.1, denssim2, method = "BFGS", control = list(maxit = simreps)))
   user  system elapsed 
  3.164   0.337   3.500 

The value that maximizes the likelihood is:

res2
$par
[1] 2.654767

$value
[1] 1.920779

$counts
function gradient 
     129       18 

$convergence
[1] 0

$message
NULL

which is close to the true value of the parameter 2.5 (which was used to generate the data).