Fits hierarchical models of animal occurrence and abundance to data collected on species that may be detected imperfectly. Models include single- and multi-season site occupancy models, binomial N-mixture models, and multinomial N-mixture models. The data can arise from survey methods such as occurrence sampling, temporally replicated counts, removal sampling, double observer sampling, and distance sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. General treatment of these models can be found in MacKenzie et al. (2006) and Royle and Dorazio (2008). The primary reference for the package is Fiske and Chandler (2011).


Overview of Model-fitting Functions:

occu fits occurrence models with no linkage between abundance and detection (MacKenzie et al. 2002).

occuRN fits abundance models to presence/absence data by exploiting the link between detection probability and abundance (Royle and Nichols 2003).

occuFP fits occupancy models to data characterized by false negatives and false positive detections (e.g., Royle and Link [2006] and Miller et al. [2011]).

occuMulti fits multi-species occupancy model of Rota et al. [2016].

colext fits the mutli-season occupancy model of MacKenzie et al. (2003).

pcount fits N-mixture models (aka binomial mixture models) to repeated count data (Royle 2004a, Kery et al 2005).

distsamp fits the distance sampling model of Royle et al. (2004) to distance data recorded in discrete intervals.

gdistsamp fits the generalized distance sampling model described by Chandler et al. (2011) to distance data recorded in discrete intervals.

gpcount fits the generalized N-mixture model described by Chandler et al. (2011) to repeated count data collected using the robust design.

multinomPois fits the multinomial-Poisson model of Royle (2004b) to data collected using methods such as removal sampling or double observer sampling.

gmultmix fits a generalized form of the multinomial-mixture model of Royle (2004b) that allows for estimating availability and detection probability.

pcountOpen fits the open population model of Dail and Madsen (2011) to repeated count data. This is a genearlized form of the Royle (2004a) N-mixture model that includes parameters for recruitment and apparent survival.

Data: All data are passed to unmarked's estimation functions as a formal S4 class called an unmarkedFrame, which has child classes for each model type. This allows metadata (eg as distance interval cut points, measurement units, etc...) to be stored with the response and covariate data. See unmarkedFrame for a detailed description of unmarkedFrames and how to create them.

Model Specification: unmarked's model-fitting functions allow specification of covariates for both the state process and the detection process. For two-level hierarchical models, (eg occu, occuRN, pcount, multinomPois, distsamp) covariates for the detection process (at the site or observation level) and the state process (at the site level) are specified with a double right-hand sided formula, in that order. Such a formula looks like

\(~ x1 + x2 + \ldots + x_n ~ x_1 + x_2 + \ldots + x_n\)

where \(x_1\) through \(x_n\) are additive covariates of the process of interest. Using two tildes in a single formula differs from standard R convention, but it is informative about the model being fit. The meaning of these covariates, or what they model, is full described in the help files for the individual functions and is not the same for all functions. For models with more than two processes (eg colext, gmultmix, pcountOpen), single right-hand sided formulas (only one tilde) are used to model each parameter.

Utility Functions: unmarked contains several utility functions for organizing data into the form required by its model-fitting functions. csvToUMF converts an appropriately formated comma-separated values (.csv) file to a list containing the components required by model-fitting functions.


Chandler, R. B., J. A. Royle, and D. I. King. 2011. Inference about density and temporary emigration in unmarked populations. Ecology 92:1429-1435.

Dail, D. and L. Madsen. 2011. Models for estimating abundance from repeated counts of an open metapopulation. Biometrics 67:577-587.

Fiske, I. and R. B. Chandler. 2011. unmarked: An R package for fitting hierarchical models of wildlife occurrence and abundance. Journal of Statistical Software 43:1--23.

Kery, M., Royle, J. A., and Schmid, H. 2005 Modeling avian abundance from replicated counts using binomial mixture models. Ecological Applications 15:1450--1461.

MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. A. Royle, and C. A. Langtimm. 2002. Estimating site occupancy rates when detection probabilities are less than one. Ecology 83: 2248--2255.

MacKenzie, D. I., J. D. Nichols, J. E. Hines, M. G. Knutson, and A. B. Franklin. 2003. Estimating site occupancy, colonization, and local extinction when a species is detected imperfectly. Ecology 84:2200--2207.

MacKenzie, D. I., J. D. Nichols, J. A. Royle, K. H. Pollock, L. L. Bailey, and J. E. Hines. 2006. Occupancy Estimation and Modeling. Amsterdam: Academic Press.

Miller, D.A., J.D. Nichols, B.T. McClintock, E.H.C. Grant, L.L. Bailey, and L.A. Weir. 2011. Improving occupancy estimation when two types of observational error occur: non-detection and species misidentification. Ecology 92:1422-1428.

Rota, C.T., et al. 2016. A multi-species occupancy model for two or more interacting species. Methods in Ecology and Evolution 7: 1164-1173.

Royle, J. A. 2004a. N-Mixture models for estimating population size from spatially replicated counts. Biometrics 60:108--105.

Royle, J. A. 2004b. Generalized estimators of avian abundance from count survey data. Animal Biodiversity and Conservation 27:375--386.

Royle, J. A., D. K. Dawson, and S. Bates. 2004. Modeling abundance effects in distance sampling. Ecology 85:1591--1597.

Royle, J. A., and R. M. Dorazio. 2006. Hierarchical models of animal abundance and occurrence. Journal Of Agricultural Biological And Environmental Statistics 11:249--263.

Royle, J.A., and W.A. Link. 2006. Generalized site occupancy models allowing for false positive and false negative errors. Ecology 87:835-841.

Royle, J. A. and R. M. Dorazio. 2008. Hierarchical Modeling and Inference in Ecology. Academic Press.

Royle, J. A. and J. D. Nichols. 2003. Estimating Abundance from Repeated Presence-Absence Data or Point Counts. Ecology, 84:777--790.

Sillett, S. and Chandler, R.B. and Royle, J.A. and Kery, M. and Morrison, S.A. In Press. Hierarchical distance sampling models to estimate population size and habitat-specific abundance of an island endemic. Ecological Applications


Ian Fiske, Richard Chandler, Andy Royle, Marc K\'ery, David Miller, and Rebecca Hutchinson


## An example site-occupancy analysis # Simulate occupancy data set.seed(344) nSites <- 100 nReps <- 5 covariates <- data.frame(veght=rnorm(nSites), habitat=factor(c(rep('A', 50), rep('B', 50)))) psipars <- c(-1, 1, -1) ppars <- c(1, -1, 0) X <- model.matrix(~veght+habitat, covariates) # design matrix psi <- plogis(X %*% psipars) p <- plogis(X %*% ppars) y <- matrix(NA, nSites, nReps) z <- rbinom(nSites, 1, psi) # true occupancy state for(i in 1:nSites) { y[i,] <- rbinom(nReps, 1, z[i]*p[i]) } # Organize data and look at it umf <- unmarkedFrameOccu(y = y, siteCovs = covariates) head(umf)
#> Data frame representation of unmarkedFrame object. #> y.1 y.2 y.3 y.4 y.5 veght habitat #> 1 0 0 0 0 1 1.0733096 A #> 2 0 0 0 0 0 3.3024986 A #> 3 0 0 0 0 0 -0.7308712 A #> 4 0 0 0 0 0 1.1855582 A #> 5 1 1 1 1 1 0.3037686 A #> 6 0 1 1 1 1 0.3758026 A #> 7 0 0 0 0 0 -1.0129665 A #> 8 0 0 0 0 0 -1.6261948 A #> 9 0 0 0 0 0 -0.7386708 A #> 10 0 0 0 0 0 -0.4478443 A
#> unmarkedFrame Object #> #> 100 sites #> Maximum number of observations per site: 5 #> Mean number of observations per site: 5 #> Sites with at least one detection: 25 #> #> Tabulation of y observations: #> 0 1 #> 415 85 #> #> Site-level covariates: #> veght habitat #> Min. :-1.80746 A:50 #> 1st Qu.:-0.75747 B:50 #> Median :-0.05057 #> Mean : 0.07205 #> 3rd Qu.: 0.62237 #> Max. : 3.32295
# Fit some models fm1 <- occu(~1 ~1, umf) fm2 <- occu(~veght+habitat ~veght+habitat, umf) fm3 <- occu(~veght ~veght+habitat, umf) # Model selection fms <- fitList(m1=fm1, m2=fm2, m3=fm3) modSel(fms)
#> nPars AIC delta AICwt cumltvWt #> m3 5 244.81 0.00 6.1e-01 0.61 #> m2 6 245.68 0.87 3.9e-01 1.00 #> m1 2 273.01 28.20 4.6e-07 1.00
# Empirical Bayes estimates of the number of sites occupied sum(bup(ranef(fm3), stat="mode")) # Sum of posterior modes
#> [1] 29
sum(z) # Actual
#> [1] 29
# Model-averaged prediction and plots # psi in each habitat type newdata1 <- data.frame(habitat=c('A', 'B'), veght=0) Epsi1 <- predict(fms, type="state", newdata=newdata1) with(Epsi1, { plot(1:2, Predicted, xaxt="n", xlim=c(0.5, 2.5), ylim=c(0, 0.5), xlab="Habitat", ylab=expression(paste("Probability of occurrence (", psi, ")")), cex.lab=1.2, pch=16, cex=1.5) axis(1, 1:2, c('A', 'B')) arrows(1:2, Predicted-SE, 1:2, Predicted+SE, angle=90, code=3, length=0.05) })
# psi and p as functions of vegetation height newdata2 <- data.frame(habitat=factor('A', levels=c('A','B')), veght=seq(-2, 2, length=50)) Epsi2 <- predict(fms, type="state", newdata=newdata2, appendData=TRUE) Ep <- predict(fms, type="det", newdata=newdata2, appendData=TRUE) op <- par(mfrow=c(2, 1), mai=c(0.9, 0.8, 0.2, 0.2)) plot(Predicted~veght, Epsi2, type="l", lwd=2, ylim=c(0,1), xlab="Vegetation height (standardized)", ylab=expression(paste("Probability of occurrence (", psi, ")"))) lines(lower ~ veght, Epsi2, col=gray(0.7)) lines(upper ~ veght, Epsi2, col=gray(0.7)) plot(Predicted~veght, Ep, type="l", lwd=2, ylim=c(0,1), xlab="Vegetation height (standardized)", ylab=expression(paste("Detection probability (", italic(p), ")")))
lines(lower~veght, Ep, col=gray(0.7))
lines(upper~veght, Ep, col=gray(0.7))