#Introduction

Creel surveys allow fisheries scientists and managers to collect data on catch and harvest, an angler population (including effort expended), and, depending on survey design, biological data on fish populations. Though important methods of collecting data on the user base of the fishery, creel surveys are difficult to implement and, in graduate fisheries programs, creel surveys are paid little attention. As a result, fisheries managers–the first job for many fisheries-program graduates–often inherit old surveys or are told to institute new surveys with little knowledge of how to do so.

Fisheries can cover large spatial extents: large reservoirs, coast-lines, and river systems. A creel survey has to be statistically valid, adaptable to the geographic challenges of the fishery, and cost efficient. Limited budgets can prevent agencies from implementing creel surveys; the AnglerCreelSurveySimulation was designed to help managers explore the type of creel survey that is most appropriate for their fishery, including fisheries with multiple access points, access points that are more popular than others, variation in catch rate, the number of surveyors, and seasonal variation in day-lengths.

The `AnglerCreelSurveySimulation`

package does require
that users know something about their fishery and the human dimensions
of that fishery. *A prior* knowledge includes mean trip length
for a party (or individual), the mean catch rate of the

The `AnglerCreelSurveySimulation`

package is simple, but
powerful. Four functions provide the means for users to create a
population of anglers, limit the length of the fishing day to any value,
and provide a mean trip length for the population. Ultimately, the user
only needs to know the final function
`ConductMultipleSurveys`

but because I’d rather this
*not* be a *black box* of functions, this brief
introduction will be a step-by-step process through the package.

##A walk-through of the package

This tutorial assumes that we have a very simple, small fishery with only one access point that, on any given day, is visited by 100 anglers. The fishing day length for our theoretical fishery is 12 hours (say, from 6 am to 6pm) and all anglers are required to have completed their trip by 6pm. Lastly, the mean trip length is known to be 3.5 hours.

*For the purposes of this package, all times are functions of the
fishing day. In other words, if a fishing day length is 12 hours (e.g.,
from 6 am to 6pm) and an angler starts their trip at 2 and
ends at 4 that means that they started their trip at 8 am
and ended at 10 am.*

The `make_anglers()`

function builds a population of
anglers:

```
library(AnglerCreelSurveySimulation)
anglers <- make_anglers(n_anglers = 100, mean_trip_length = 3.5, fishing_day_length = 12)
```

`make_anglers()`

returns a dataframe with
`start_time`

, `trip_length`

, and
`departure_time`

for all anglers.

```
head(anglers)
#> start_time trip_length departure_time
#> 1 7.238159 1.325826 8.563985
#> 2 2.953622 2.235586 5.189208
#> 3 7.676914 3.436153 11.113067
#> 4 3.097302 4.638618 7.735920
#> 5 1.284245 3.321098 4.605344
#> 6 7.974742 1.633840 9.608582
```

In the `head(anglers)`

statement, you can see that
`starttime`

, `triplength`

, and
`departureTime`

are all available for each angler. The first
angler started their trip roughly 7.24 hours into the fishing day,
continued to fish for 1.33 hours, and left the access point at 8.56
hours into the fishing day. Angler start times are assigned by the
`uniform`

distribution and trip lengths are assigned by the
`gamma`

distribution. To get true effort of all the anglers
for this angler population, summing `trip_length`

is all
that’s needed: 0.

The distribution of angler trip lengths can be easily visualized:

```
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
# Histogram overlaid with kernel density curve
anglers %>%
ggplot(aes(x=trip_length)) +
geom_histogram(aes(y=..density..),
binwidth=.1,
colour="black", fill="white") +
geom_density(alpha=.2, fill="#FF6666")
```

Once the population of anglers has been created, the next function to
apply is the `get_total_values()`

function. In
`get_total_values()`

, the user specifies the start time of
the creel surveyor, the end time of the surveyor, and the wait time of
the surveyor. Here is where the user also specifies the sampling
probability of the anglers (in most cases, equal to \(\frac{waitTime}{fishingDayLength}\)) and
the mean catch rate of the fishery. There are a number of a default
settings in the `get_total_values()`

function; see
`?get_total_values`

for a description of how the function
handles `NULL`

values for `startTime`

,
`endTime`

, and `waitTime`

. `startTime`

and `waitTime`

are the times that the surveyor started and
waited at the access point. `totalCatch`

and
`trueEffort`

are the total (or *real*) values for
catch and effort. `meanLambda`

is the mean catch rate for all
anglers. Even though we assigned `meanCatchRate`

to
`get_total_values()`

, individual mean catch rates are
simulated by `rgamma()`

with shape equal to
`meanCatchRate`

and rate equal to `1`

.

For this walk through, we’ll schedule the surveyor to work for a total of eight hours at the sole access point in our fishery:

```
anglers %>%
get_total_values(start_time = 0, wait_time = 8, sampling_prob = 8/12, mean_catch_rate = 2.5)
#> n_observed_trips total_observed_trip_effort n_completed_trips
#> 1 91 378.9607 61
#> total_completed_trip_effort total_completed_trip_catch start_time wait_time
#> 1 275.0163 670.2935 0 8
#> total_catch true_effort mean_lambda
#> 1 900.8161 318.3179 2.700503
```

`get_total_values()`

returns a single row data frame with
several columns. The output of `get_total_values()`

is the
catch and effort data observed by the surveyor during their wait at the
access point along with the “true” values for catch and effort.
(Obviously, we can’t simulate biological data but, if an agency’s
protocol directed the surveyor to collect biological data, that could be
analyzed with other `R`

functions.)

In the output from `get_total_values()`

,
`n_observed_trips`

is the number of trips that the surveyor
observed, including anglers that arrived after she started her day and
anglers that were there for the duration of her time at the access
point. `total_observed_trip_effort`

is the effort expended by
those parties; because the observed trips were not complete, she did not
count their catch. `n_completed_trips`

is the number of
anglers that completed their trips while she was onsite,
`total_completed_trip_effort`

is the effort expended by those
anglers, and `total_completed_trip_catch`

is the number of
fish caught by those parties. Catch is both the number of fish harvested
and those caught and released.

Effort and catch are estimated from the Bus Route Estimator:

\[ \widehat{E} = T\sum\limits_{i=1}^n{\frac{1}{w_{i}}}\sum\limits_{j=1}^m{\frac{e_{ij}}{\pi_{j}}} \]

where

*E*= estimated total party-hours of effort;

*T*= total time to complete a full circuit of the route, including traveling and waiting;

*w*= waiting time at the_{i}*i*site (where^{th}*i*= 1, …,*n*sites);

and

*e*= total time that the_{ij}*j*car (or trailer) is parked at the^{th}*i*site while the agent is at that site (where^{th}*j*= 1, …,*n*sites).

Catch rate is calculated from the Ratio of Means equation:

\[ \widehat{R_1} = \frac{\sum\limits_{i=1}^n{c_i/n}}{\sum\limits_{i=1}^n{L_i/n}} \]

where

*c*is the catch for the_{i}*i*sampling unit^{th}

and

* *L _{i}* is the length of the fishing trip at the tie of
the interview.

For incomplete surveys, *L _{i}* represents an
incomplete trip.

`simulate_bus_route()`

calculates effort and catch based
upon these equations. See `?simulate_bus_route`

for
references that include a more detailed discussion of these
equations.

`simulate_bus_route()`

calls `make_anglers()`

and `get_total_values()`

so many of the same arguments we
passed in the previous functions will need to be passed to
`simulate_bus_route()`

. The new argument,
`nsites`

, is the number of sites visited by the surveyor. In
more advanced simulations (see the examples in
`?simulate_bus_route`

), you can pass strings of values for
`startTime`

, `waitTime`

, `nsites`

, and
`nanglers`

to simulate a bus route-type survey rather than
just a single access-point survey.

```
sim <- simulate_bus_route(start_time = 0, wait_time = 8, n_sites = 1, n_anglers = 100,
sampling_prob = 8/12, mean_catch_rate = 2.5, fishing_day_length = 12)
sim
#> Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1 409.936 2.297963 891.6496 346.0502 2.613442
```

The output from `simulate_bus_route()`

is a dataframe with
values for `Ehat`

, `catchRateROM`

(the ratio of
means catch rate), `trueCatch`

, `trueEffort`

, and
`meanLambda`

. `Ehat`

is the estimated total effort
from the Bus Route Estimator above and `catchRateROM`

is
catch rate estimated from the Ratio of Means equation.
`trueCatch`

, `trueEffort`

, and
`meanLambda`

are the same as before. Multiplying
`Ehat`

by `catchRateROM`

gives an estimate of
total catch: 942.0178635.

###Conducting multiple simulations

With information about the fishery, the start and wait times of the
surveyor, the sampling probability, mean catch rate, and fishing day
length, we can run multiple simulations with
`conduct_multiple_surveys()`

.
`conduct_multiple_surveys()`

is a wrapper that calls the
other three functions in turn and compiles the values into a data frame
for easy plotting or analysis. The only additional argument needed is
the `nsims`

value which tells the function how many
simulations to conduct. For the sake of this simple simulation, let’s
assume that the creel survey works five days a week for four weeks
(i.e. 20 days):

```
sim <- conduct_multiple_surveys(n_sims = 20, start_time = 0, wait_time = 8, n_sites = 1,
n_anglers = 100, sampling_prob = 8/12,
mean_catch_rate = 2.5, fishing_day_length = 12)
sim
#> Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1 376.6182 2.519242 846.7200 344.3344 2.472871
#> 2 389.6775 2.601564 854.0753 357.5443 2.383879
#> 3 384.5418 2.328529 762.5164 355.8506 2.238003
#> 4 388.2216 2.436591 861.6960 366.4591 2.354099
#> 5 371.1078 2.477710 814.2975 331.1943 2.340942
#> 6 370.7990 2.658424 903.5185 331.8732 2.625328
#> 7 384.1850 2.612345 906.0439 354.0076 2.578438
#> 8 407.0480 2.607874 948.4486 377.0258 2.550014
#> 9 361.4820 2.483589 858.0846 334.9782 2.564525
#> 10 375.5050 2.380012 956.6329 332.0240 2.811077
#> 11 341.9553 2.172521 910.1723 323.4882 2.643072
#> 12 386.8595 2.416742 915.9138 351.6712 2.603038
#> 13 389.1150 2.312806 854.3322 343.8923 2.520057
#> 14 409.8340 2.698748 1017.0248 376.4917 2.739232
#> 15 348.4828 2.288811 826.0451 326.5319 2.552013
#> 16 360.1970 2.405245 688.5132 313.6174 2.265548
#> 17 319.0463 3.075032 889.4199 321.9518 2.715001
#> 18 402.2221 2.758371 920.2525 356.7515 2.559893
#> 19 429.0461 2.642780 1034.2358 371.3453 2.782377
#> 20 400.7477 2.478225 958.3914 353.0512 2.764598
```

With the output from multiple simulations, an analyst can evaluate
how closely the creel survey they’ve designed mirrors reality. A
`lm()`

of estimated catch as a function of
`trueCatch`

can tell us if the survey will over or under
estimate reality:

```
mod <-
sim %>%
lm((Ehat * catch_rate_ROM) ~ true_catch, data = .)
summary(mod)
#>
#> Call:
#> lm(formula = (Ehat * catch_rate_ROM) ~ true_catch, data = .)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -231.14 -34.39 18.92 53.95 128.04
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 306.7721 210.0432 1.461 0.1614
#> true_catch 0.7331 0.2361 3.106 0.0061 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 83.31 on 18 degrees of freedom
#> Multiple R-squared: 0.3489, Adjusted R-squared: 0.3127
#> F-statistic: 9.646 on 1 and 18 DF, p-value: 0.006101
```

Plotting the data and the model provide a good visual means of evaluating how close our estimates are to reality:

```
#Create a new vector of the estimated effort multiplied by estimated catch rate
sim <-
sim %>%
mutate(est_catch = Ehat * catch_rate_ROM)
sim %>%
ggplot(aes(x = true_catch, y = est_catch)) +
geom_point() +
geom_abline(intercept = mod$coefficients[1], slope = mod$coefficients[2],
colour = "red", size = 1.01)
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
```

The closer the slope parameter estimate is to 1 and the intercept parameter estimate is to 0, the closer our estimate of catch is to reality.

We can create a model and plot of our effort estimates, too:

```
mod <-
sim %>%
lm(Ehat ~ true_effort, data = .)
summary(mod)
#>
#> Call:
#> lm(formula = Ehat ~ true_effort, data = .)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -32.311 -7.028 -2.247 10.501 19.691
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -26.6792 55.8474 -0.478 0.639
#> true_effort 1.1742 0.1611 7.289 8.98e-07 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 13.19 on 18 degrees of freedom
#> Multiple R-squared: 0.7469, Adjusted R-squared: 0.7329
#> F-statistic: 53.13 on 1 and 18 DF, p-value: 8.982e-07
#Create a new vector of the estimated effort multiplied by estimated catch rate
sim %>%
ggplot(aes(x = true_effort, y = Ehat)) +
geom_point() +
geom_abline(intercept = mod$coefficients[1], slope = mod$coefficients[2],
colour = "red", size = 1.01)
```

If the start and wait time equals 0 and the length of the fishing day, respectively, the creel surveyor can observe all completed trips, though she’d likely be unhappy having to work 12 hours. The inputs have to be adjusted to allow her to arrive at time 0, stay for all 12 hours, and have a probability of 1.0 at catching everyone:

```
start_time <- 0
wait_time <- 12
sampling_prob <- 1
sim <- conduct_multiple_surveys(n_sims = 20, start_time = start_time, wait_time = wait_time,
n_sites = 1, n_anglers = 100, sampling_prob = 1,
mean_catch_rate = 2.5, fishing_day_length = 12)
sim
#> Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1 331.5126 2.258124 748.5965 331.5126 2.226366
#> 2 335.3444 2.284950 766.2451 335.3444 2.418218
#> 3 333.6399 2.478427 826.9022 333.6399 2.440359
#> 4 317.0329 2.325350 737.2123 317.0329 2.357007
#> 5 297.0303 2.117019 628.8189 297.0303 2.143518
#> 6 324.3639 2.299249 745.7935 324.3639 2.378997
#> 7 322.9534 2.628027 848.7302 322.9534 2.683393
#> 8 340.1983 2.882697 980.6887 340.1983 2.686978
#> 9 370.5029 2.368574 877.5637 370.5029 2.436295
#> 10 344.4279 2.440527 840.5856 344.4279 2.305974
#> 11 336.1316 2.317123 778.8581 336.1316 2.203803
#> 12 356.4815 2.628166 936.8928 356.4815 2.489206
#> 13 323.4428 2.317091 749.4464 323.4428 2.336160
#> 14 332.7596 2.510335 835.3380 332.7596 2.552702
#> 15 330.2695 2.576715 851.0104 330.2695 2.616000
#> 16 326.6618 2.331027 761.4574 326.6618 2.240301
#> 17 336.2574 2.279683 766.5603 336.2574 2.311405
#> 18 351.5997 2.561353 900.5711 351.5997 2.507581
#> 19 351.4495 2.212520 777.5890 351.4495 2.193988
#> 20 319.4902 2.598769 830.2811 319.4902 2.577867
```

```
#> Warning in summary.lm(mod): essentially perfect fit: summary may be unreliable
#>
#> Call:
#> lm(formula = Ehat ~ true_effort, data = .)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4.668e-13 -2.000e-15 2.544e-14 4.251e-14 8.211e-14
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -4.576e-13 5.617e-13 -8.150e-01 0.426
#> true_effort 1.000e+00 1.679e-15 5.954e+14 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.173e-13 on 18 degrees of freedom
#> Multiple R-squared: 1, Adjusted R-squared: 1
#> F-statistic: 3.545e+29 on 1 and 18 DF, p-value: < 2.2e-16
```

If our hypothetical fishery suddenly gained another access point and the original 100 anglers were split between the two access points equally, what kind of information would a creel survey capture? We could ask our surveyor to split her eight-hour work day between both access points, but she’ll have to drive for 0.5 hours to get from one to another. Of course, that 0.5 hour of drive time will be a part of her work day so she’ll effectively have 7.5 hours to spend at access points counting anglers and collecting data.

```
start_time <- c(0, 4.5)
wait_time <- c(4, 3.5)
n_sites = 2
n_anglers <- c(50, 50)
fishing_day_length <- 12
sampling_prob <- sum(wait_time)/fishing_day_length
sim <- conduct_multiple_surveys(n_sims = 20, start_time = start_time, wait_time = wait_time,
n_sites = n_sites, n_anglers = n_anglers,
sampling_prob = sampling_prob, mean_catch_rate = 2.5,
fishing_day_length = fishing_day_length)
sim
#> Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1 492.8639 1.722646 715.5206 329.7062 2.294514
#> 2 528.2219 2.519027 800.9620 348.0565 2.309510
#> 3 499.8952 2.689286 792.5064 329.5931 2.426154
#> 4 511.1506 2.264724 822.7999 338.4422 2.507654
#> 5 563.1956 2.537964 827.5075 356.8933 2.256187
#> 6 530.4618 2.721845 900.7757 354.8364 2.618954
#> 7 531.3834 2.564764 922.5139 334.9887 2.728519
#> 8 466.0805 2.553252 792.5782 323.5327 2.496906
#> 9 455.9357 2.632786 916.5352 346.0207 2.700286
#> 10 493.1137 2.051125 874.1247 344.6554 2.563315
#> 11 550.3458 2.304537 860.5337 349.7014 2.322299
#> 12 610.5815 2.582332 827.6661 342.4361 2.471889
#> 13 528.3076 2.117544 965.1213 353.1666 2.676699
#> 14 599.3215 2.707865 904.5747 357.7262 2.553898
#> 15 634.6256 1.912975 947.1593 374.6526 2.510612
#> 16 456.4338 2.544216 829.7986 349.9623 2.279834
#> 17 598.5900 2.847658 869.0655 352.3398 2.483211
#> 18 561.8719 2.807523 872.9477 339.3835 2.503308
#> 19 492.0098 3.105794 924.5582 378.8442 2.386399
#> 20 399.4990 1.967737 753.2112 314.8620 2.365910
```

```
#>
#> Call:
#> lm(formula = Ehat ~ true_effort, data = .)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -100.297 -17.269 1.153 34.035 92.647
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -181.5670 261.5396 -0.694 0.4964
#> true_effort 2.0427 0.7552 2.705 0.0145 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 51.25 on 18 degrees of freedom
#> Multiple R-squared: 0.289, Adjusted R-squared: 0.2495
#> F-statistic: 7.317 on 1 and 18 DF, p-value: 0.0145
```

Ultimately, the creel survey simulation can be as complicated as a creel survey. If a survey requires multiple clerks, several simulations can be coupled together to act as multiple surveyors. To accommodate weekends or holidays (i.e., increased angler pressure), additional simulations with different wait times and more anglers (to simulate higher pressure) can be built into the simulation. For example, if we know that angler pressure is 50% higher at the two access points on weekends, we can hire a second clerk to sample 8 hours a day on the weekends–one day at each access point–and add the weekend data to the weekday data.

```
#Weekend clerks
start_time_w <- 2
wait_time_w <- 10
n_sites <- 1
n_anglers_w <- 75
fishing_day_length <- 12
sampling_prob <- 8/12
sim_w <- conduct_multiple_surveys(n_sims = 8, start_time = start_time_w,
wait_time = wait_time_w, n_sites = n_sites,
n_anglers = n_anglers_w, sampling_prob = sampling_prob,
mean_catch_rate = 2.5, fishing_day_length = fishing_day_length)
sim_w
#> Ehat catch_rate_ROM true_catch true_effort mean_lambda
#> 1 388.6103 2.667906 692.7806 260.4893 2.623400
#> 2 369.6578 2.410091 595.8737 247.4235 2.453790
#> 3 384.4043 2.516820 644.9844 256.2695 2.540813
#> 4 363.7742 2.508243 608.2893 242.5161 2.558225
#> 5 387.8980 2.826352 732.8723 259.5569 2.788018
#> 6 411.5632 2.243179 616.6394 275.0917 2.221008
#> 7 327.9949 2.410986 529.8627 220.0829 2.331251
#> 8 337.2557 2.694240 614.5451 229.6741 2.578197
#Add the weekday survey and weekend surveys to the same data frame
mon_survey <-
sim_w %>%
bind_rows(sim)
mod <-
mon_survey %>%
lm(Ehat ~ true_effort, data = .)
summary(mod)
#>
#> Call:
#> lm(formula = Ehat ~ true_effort, data = .)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -87.279 -9.576 0.543 18.509 90.195
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -33.6183 56.2029 -0.598 0.555
#> true_effort 1.6178 0.1747 9.258 1.03e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 43.06 on 26 degrees of freedom
#> Multiple R-squared: 0.7673, Adjusted R-squared: 0.7583
#> F-statistic: 85.72 on 1 and 26 DF, p-value: 1.028e-09
```

Hopefully, this vignette has shown you how to build and simulate your
own creel survey. It’s flexible enough to estimate monthly or seasonal
changes in fishing day length, changes in the mean catch rate, increased
angler pressure on weekends, and any number of access sites, start
times, wait times, and sampling probabilities. The output from
`conduct_multiple_surveys()`

allows the user to estimate
variability in the catch and effort estimates (e.g., relative standard
error) to evaluate the most efficient creel survey for *their*
fishery.