This vignette describes a scoring method introduced by Greenwald, Nosek, and Banaji (2003); the improved d-score for Implicit Association Task (IATs) that require a correct response in order to continue to the next trial. This version of the d-score algorithm adds up all response times of all responses per trial. As this algorithm also specifies which participants to keep and to drop, functions from the dplyr package will be used to produce relevant summary statistics. Note that this vignette is more advanced that the others included in the splithalfr package, so it is not recommended as a first introduction on to how to use the splithalfr.

Load the included IAT dataset and inspect its documentation.

```
data("ds_iat", package = "splithalfr")
?ds_iat
```

The columns used in this example are:

- UserID, which identifies participants
- block_type, which specifies IAT blocks relevant to calculate the d-score
- attempt, in order to add RTs for trials
- response, in order to select correct responses only
- rt, in order to drop RTs outside of the range [200, 520] and calculate means per level of patt

First, the RT of timeout responses is set to the IAT timeout of 4000 ms.

`ds_iat[ds_iat$response == 3, ]$rt <- 4000`

The improved d-score algorithm specifies that participants whose RTs for over 10% of reponses are below 300 ms should be dropped. The R-script below identifies such participants. Note that none of the participants meet this criterion, so all can be kept for successive analyses.

```
ds_summary <- ds_iat %>%
dplyr::group_by(UserID) %>%
dplyr::summarize(
too_fast = sum(rt < 300) / dplyr::n() > 0.1,
)
```

Delete any attempts with RTs > 10,000 ms. These do not exist in the JASMIN2 IAT because a response window of 4000 ms was used, but the R-script is still added below for demonstration purposes.

`ds_iat <- subset(ds_iat, rt <= 10000)`

Finally, RTs for each participant, block, and trial are summed. The block_type variable is also included, since it is required for further processing.

```
ds_iat <- ds_iat %>%
dplyr::group_by(UserID, block, trial) %>%
summarise(
block_type = first(block_type),
rt = sum(rt)
)
```

Writing a scoring method for the splithalfr requires implementing two functions; a **sets** function that describes which sets of data should be split into halves and a **score** function that calculates a score.

The sets function receives data from a single participant and returns a list of datasets for each condition. In this case, we will generate four data frames, containing the trials from:

- target 1 with attitude 1, practice block
- target 1 with attitude 1, test block
- target 1 with attitude 2, practice block
- target 1 with attitude 2, test block

```
iat_fn_sets <- function (ds) {
return (list(
tar1att1_1 = subset(ds, block_type == "tar1att1_1"),
tar1att1_2 = subset(ds, block_type == "tar1att1_2"),
tar1att2_1 = subset(ds, block_type == "tar1att2_1"),
tar1att2_2 = subset(ds, block_type == "tar1att2_2")
))
}
```

The score function receives these four data frames from a single participant. For both the pair of practice and test blocks, the following ‘block score’ is calculated:

- Mean RT of target 1 with attribute 1 is calculated
- Mean RT of target 1 with attribute 2 is calculated
- The difference in mean RTs of both blocks is divided by the inclusive standard deviation (SD)

The d-score is the mean of the block scores for practice and test blocks.

As the block score needs to be calculated twice, we first write function for it.

```
iat_fn_block <- function(ds_tar1att1, ds_tar1att2) {
m_tar1att1 <- mean(ds_tar1att1$rt)
m_tar1att2 <- mean(ds_tar1att2$rt)
inclusive_sd <- sd(c(ds_tar1att1$rt, ds_tar1att2$rt))
return ((m_tar1att2 - m_tar1att1) / inclusive_sd)
}
```

Next, we write the score function for the splithalfr.

```
iat_fn_score = function(sets) {
d1 <- iat_fn_block(sets$tar1att1_1, sets$tar1att2_1)
d2 <- iat_fn_block(sets$tar1att1_2, sets$tar1att2_2)
return (mean(c(d1, d2)))
}
```

By combining the sets and score functions, a score for a single participant can be calculated. For instance, the score of UserID 1 can be calculated via the statement below.

`iat_fn_score(iat_fn_sets(subset(ds_iat, UserID == 1)))`

To calculate scores for each participant, call sh_apply with four arguments:

- the dataset
- the column that identifies participants in the dataset
- the sets function
- the score function

The sh_apply function will return a data frame with one row per participant, and two columns: one that identifies participants (“UserID” in this example) and a column “score”, that contains the output of the score function.

`iat_scores <- sh_apply(ds_iat, "UserID", iat_fn_sets, iat_fn_score)`

It is recommended to check your scoring method by calculating the score of a representative participant via a different approach. For splithalfr tests, the author has done so via Excel.

To calculate split-half scores for each participant, call sh_apply with an additional split_count argument, which specifies how many splits should be calculated. For each participant and split, the splithalfr will randomly divide the dataset of each element of sets into two halves that differ at most by one in size. When called with a split_count argument that is higher than zero, sh_apply returns a data frame with the following columns:

- UserID, which identifies participants
- split, which counts splits
- score_1, and score_2, which are the scores calculated for each of the split datasets

`iat_splits <- sh_apply(ds_iat, "UserID", iat_fn_sets, iat_fn_score, 1000)`

Next, the output of sh_apply can be analyzed in order to estimate reliability. By default, functions are provided that automatically calculate mean Spearman-Brown (mean_sb_by_split) and Flanagan-Rulon (mean_fr_by_split) coefficients. If any missing values were encountered in the data provided to these functions, they give a warning, and then pairwise remove the missing data before calculating reliability.

```
# Spearman-Brown
mean_sb_by_split(iat_splits)
```