Fit the MacKenzie et al. (2002) Occupancy Model with the penalized likelihood methods of Hutchinson et al. (2015) using cross-validation

This function fits the occupancy model of MacKenzie et al (2002) with the penalized methods of Hutchinson et al (2015) using k-fold cross-validation to choose the penalty weight.

occuPEN_CV(formula, data, knownOcc=numeric(0), starts, method="BFGS",
    engine=c("C", "R"), lambdaVec=c(0,2^seq(-4,4)),
    pen.type = c("Bayes","Ridge"), k = 5, foldAssignments = NA,
    ...)

Arguments

formula	Double right-hand side formula describing covariates of detection and occupancy in that order.
data	An `unmarkedFrameOccu` object
knownOcc	Vector of sites that are known to be occupied. These should be supplied as row numbers of the y matrix, eg, c(3,8) if sites 3 and 8 were known to be occupied a priori.
starts	Vector of parameter starting values.
method	Optimization method used by `optim`.
engine	Either "C" or "R" to use fast C++ code or native R code during the optimization.
lambdaVec	Vector of values to try for lambda.
pen.type	Which form of penalty to use.
k	Number of folds for k-fold cross-validation.
foldAssignments	Vector containing the number of the fold that each site falls into. Length of the vector should be equal to the number of sites, and the vector should contain k unique values. E.g. for 9 sites and 3 folds, c(1,2,3,1,2,3,1,2,3) or c(1,1,1,2,2,2,3,3,3).
...	Additional arguments to optim, such as lower and upper bounds

Details

See unmarkedFrame and unmarkedFrameOccu for a description of how to supply data to the data argument.

This function wraps k-fold cross-validation around occuPEN_CV for the "Bayes" and "Ridge" penalties of Hutchinson et al. (2015). The user may specify the number of folds (k), the values to try (lambdaVec), and the assignments of sites to folds (foldAssignments). If foldAssignments is not provided, the assignments are done pseudo-randomly, and the function attempts to put some sites with and without positive detections in each fold. This randomness introduces variability into the results of this function across runs; to eliminate the randomness, supply foldAssignments.

Value

unmarkedFitOccuPEN_CV object describing the model fit.

References

Hutchinson, R. A., J. V. Valente, S. C. Emerson, M. G. Betts, and T. G. Dietterich. 2015. Penalized Likelihood Methods Improve Parameter Estimates in Occupancy Models. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12368

MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.

Author

Rebecca A. Hutchinson

Examples


# Simulate occupancy data
set.seed(646)
nSites <- 60
nReps <- 2
covariates <- data.frame(veght=rnorm(nSites),
    habitat=factor(c(rep('A', 30), rep('B', 30))))

psipars <- c(-1, 1, -1)
ppars <- c(1, -1, 0)
X <- model.matrix(~veght+habitat, covariates) # design matrix
psi <- plogis(X %*% psipars)
p <- plogis(X %*% ppars)

y <- matrix(NA, nSites, nReps)
z <- rbinom(nSites, 1, psi)       # true occupancy state
for(i in 1:nSites) {
    y[i,] <- rbinom(nReps, 1, z[i]*p[i])
    }

# Organize data and look at it
umf <- unmarkedFrameOccu(y = y, siteCovs = covariates)
obsCovs(umf) <- covariates
head(umf)
#> Data frame representation of unmarkedFrame object.
#>    y.1 y.2      veght habitat     veght.1    veght.2 habitat.1 habitat.2
#> 1    0   0 -1.6687888       A -1.66878885  0.7455212         A         A
#> 2    0   0  0.7455212       A  1.24762342  0.8403205         A         A
#> 3    0   0  1.2476234       A -1.19034635  0.1656374         A         A
#> 4    0   0  0.8403205       A -0.73071649 -1.0039114         A         A
#> 5    0   0 -1.1903463       A  0.21088185 -0.3374724         A         A
#> 6    0   0  0.1656374       A  0.46944148 -0.8216920         A         A
#> 7    0   0 -0.7307165       A  0.14141572 -2.5152728         A         A
#> 8    0   0 -1.0039114       A  1.99828108  1.4945894         A         A
#> 9    0   0  0.2108819       A  0.06644414  0.7487672         A         A
#> 10   0   0 -0.3374724       A  1.27194112  0.7402943         A         A
summary(umf)
#> unmarkedFrame Object
#> 
#> 60 sites
#> Maximum number of observations per site: 2 
#> Mean number of observations per site: 2 
#> Sites with at least one detection: 6 
#> 
#> Tabulation of y observations:
#>   0   1 
#> 112   8 
#> 
#> Site-level covariates:
#>      veght          habitat
#>  Min.   :-2.51527   A:30   
#>  1st Qu.:-0.66953   B:30   
#>  Median : 0.13211          
#>  Mean   : 0.00694          
#>  3rd Qu.: 0.70320          
#>  Max.   : 1.99828          
#> 
#> Observation-level covariates:
#>      veght          habitat
#>  Min.   :-2.51527   A:30   
#>  1st Qu.:-0.66953   B:30   
#>  Median : 0.13211          
#>  Mean   : 0.00694          
#>  3rd Qu.: 0.70320          
#>  Max.   : 1.99828          

if (FALSE) {

# Fit some models
fmMLE <- occu(~veght+habitat ~veght+habitat, umf)
fmMLE@estimates

fm1penCV <- occuPEN_CV(~veght+habitat ~veght+habitat,
 umf,pen.type="Ridge", foldAssignments=rep(1:5,ceiling(nSites/5))[1:nSites])
fm1penCV@lambdaVec
fm1penCV@chosenLambda
fm1penCV@estimates

fm2penCV <- occuPEN_CV(~veght+habitat ~veght+habitat,
umf,pen.type="Bayes",foldAssignments=rep(1:5,ceiling(nSites/5))[1:nSites])
fm2penCV@lambdaVec
fm2penCV@chosenLambda
fm2penCV@estimates

# nonparametric bootstrap for uncertainty analysis:
# bootstrap is wrapped around the cross-validation
fm2penCV <- nonparboot(fm2penCV,B=10) # should use more samples
vcov(fm2penCV,method="nonparboot")

# Mean squared error of parameters:
mean((c(psipars,ppars)-c(fmMLE[1]@estimates,fmMLE[2]@estimates))^2)
mean((c(psipars,ppars)-c(fm1penCV[1]@estimates,fm1penCV[2]@estimates))^2)
mean((c(psipars,ppars)-c(fm2penCV[1]@estimates,fm2penCV[2]@estimates))^2)
}