occuPEN_CV.Rd
This function fits the occupancy model of MacKenzie et al (2002) with the penalized methods of Hutchinson et al (2015) using k-fold cross-validation to choose the penalty weight.
occuPEN_CV(formula, data, knownOcc=numeric(0), starts, method="BFGS", engine=c("C", "R"), lambdaVec=c(0,2^seq(-4,4)), pen.type = c("Bayes","Ridge"), k = 5, foldAssignments = NA, ...)
formula | Double right-hand side formula describing covariates of detection and occupancy in that order. |
---|---|
data | An |
knownOcc | Vector of sites that are known to be occupied. These should be supplied as row numbers of the y matrix, eg, c(3,8) if sites 3 and 8 were known to be occupied a priori. |
starts | Vector of parameter starting values. |
method | Optimization method used by |
engine | Either "C" or "R" to use fast C++ code or native R code during the optimization. |
lambdaVec | Vector of values to try for lambda. |
pen.type | Which form of penalty to use. |
k | Number of folds for k-fold cross-validation. |
foldAssignments | Vector containing the number of the fold that each site falls into. Length of the vector should be equal to the number of sites, and the vector should contain k unique values. E.g. for 9 sites and 3 folds, c(1,2,3,1,2,3,1,2,3) or c(1,1,1,2,2,2,3,3,3). |
... | Additional arguments to optim, such as lower and upper bounds |
See unmarkedFrame
and unmarkedFrameOccu
for a
description of how to supply data to the data
argument.
This function wraps k-fold cross-validation around occuPEN_CV
for the "Bayes" and "Ridge" penalties of Hutchinson et al. (2015). The
user may specify the number of folds (k
), the values to try
(lambdaVec
), and the assignments of sites to folds
(foldAssignments
). If foldAssignments
is not provided,
the assignments are done pseudo-randomly, and the function attempts to
put some sites with and without positive detections in each fold. This
randomness introduces variability into the results of this function
across runs; to eliminate the randomness, supply foldAssignments.
unmarkedFitOccuPEN_CV object describing the model fit.
Hutchinson, R. A., J. V. Valente, S. C. Emerson, M. G. Betts, and T. G. Dietterich. 2015. Penalized Likelihood Methods Improve Parameter Estimates in Occupancy Models. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12368
MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.
Rebecca A. Hutchinson
# Simulate occupancy data set.seed(646) nSites <- 60 nReps <- 2 covariates <- data.frame(veght=rnorm(nSites), habitat=factor(c(rep('A', 30), rep('B', 30)))) psipars <- c(-1, 1, -1) ppars <- c(1, -1, 0) X <- model.matrix(~veght+habitat, covariates) # design matrix psi <- plogis(X %*% psipars) p <- plogis(X %*% ppars) y <- matrix(NA, nSites, nReps) z <- rbinom(nSites, 1, psi) # true occupancy state for(i in 1:nSites) { y[i,] <- rbinom(nReps, 1, z[i]*p[i]) } # Organize data and look at it umf <- unmarkedFrameOccu(y = y, siteCovs = covariates) obsCovs(umf) <- covariates head(umf)#> Data frame representation of unmarkedFrame object. #> y.1 y.2 veght habitat veght.1 veght.2 habitat.1 habitat.2 #> 1 0 0 -1.6687888 A -1.66878885 0.7455212 A A #> 2 0 0 0.7455212 A 1.24762342 0.8403205 A A #> 3 0 0 1.2476234 A -1.19034635 0.1656374 A A #> 4 0 0 0.8403205 A -0.73071649 -1.0039114 A A #> 5 0 0 -1.1903463 A 0.21088185 -0.3374724 A A #> 6 0 0 0.1656374 A 0.46944148 -0.8216920 A A #> 7 0 0 -0.7307165 A 0.14141572 -2.5152728 A A #> 8 0 0 -1.0039114 A 1.99828108 1.4945894 A A #> 9 0 0 0.2108819 A 0.06644414 0.7487672 A A #> 10 0 0 -0.3374724 A 1.27194112 0.7402943 A A#> unmarkedFrame Object #> #> 60 sites #> Maximum number of observations per site: 2 #> Mean number of observations per site: 2 #> Sites with at least one detection: 6 #> #> Tabulation of y observations: #> 0 1 #> 112 8 #> #> Site-level covariates: #> veght habitat #> Min. :-2.51527 A:30 #> 1st Qu.:-0.66953 B:30 #> Median : 0.13211 #> Mean : 0.00694 #> 3rd Qu.: 0.70320 #> Max. : 1.99828 #> #> Observation-level covariates: #> veght habitat #> Min. :-2.51527 A:30 #> 1st Qu.:-0.66953 B:30 #> Median : 0.13211 #> Mean : 0.00694 #> 3rd Qu.: 0.70320 #> Max. : 1.99828if (FALSE) { # Fit some models fmMLE <- occu(~veght+habitat ~veght+habitat, umf) fmMLE@estimates fm1penCV <- occuPEN_CV(~veght+habitat ~veght+habitat, umf,pen.type="Ridge", foldAssignments=rep(1:5,ceiling(nSites/5))[1:nSites]) fm1penCV@lambdaVec fm1penCV@chosenLambda fm1penCV@estimates fm2penCV <- occuPEN_CV(~veght+habitat ~veght+habitat, umf,pen.type="Bayes",foldAssignments=rep(1:5,ceiling(nSites/5))[1:nSites]) fm2penCV@lambdaVec fm2penCV@chosenLambda fm2penCV@estimates # nonparametric bootstrap for uncertainty analysis: # bootstrap is wrapped around the cross-validation fm2penCV <- nonparboot(fm2penCV,B=10) # should use more samples vcov(fm2penCV,method="nonparboot") # Mean squared error of parameters: mean((c(psipars,ppars)-c(fmMLE[1]@estimates,fmMLE[2]@estimates))^2) mean((c(psipars,ppars)-c(fm1penCV[1]@estimates,fm1penCV[2]@estimates))^2) mean((c(psipars,ppars)-c(fm2penCV[1]@estimates,fm2penCV[2]@estimates))^2) }