Codebook Demo

library(faux)

Codebooks in faux follow the Psych-DS 0.1.0 format, which is still in development.

Simulated Data

When you simulate data with sim_design, the codebook function reads some parameters of the design.

pet_data <- sim_design(
  between = list(pet = c(cat = "Cat Owners", 
                         dog = "Dog Owners")),
  n = c(4, 6),
  dv = list(happy = "Happiness Score"),
  id = list(id = "Subject ID"),
  mu = c(10, 12),
  sd = 4
)

You can set up a codebook with the function codebook(). If you don’t specify the name, it defaults to the variable name (pet_data).

cb <- codebook(pet_data)
#> id set to dataType string
#> pet set to dataType string
#> happy set to dataType float

If you just type the codebook object into the console, you’ll see the info in JSON format like this.

cb
#> {
#>     "@context": "https://schema.org/",
#>     "@type": "Dataset",
#>     "name": "pet_data",
#>     "schemaVersion": "Psych-DS 0.1.0",
#>     "variableMeasured": [
#>         {
#>             "@type": "PropertyValue",
#>             "name": "id",
#>             "description": "Subject ID",
#>             "dataType": "string"
#>         },
#>         {
#>             "@type": "PropertyValue",
#>             "name": "pet",
#>             "description": "pet",
#>             "levels": {
#>                 "cat": "Cat Owners",
#>                 "dog": "Dog Owners"
#>             },
#>             "dataType": "string",
#>             "levelsOrdered": false
#>         },
#>         {
#>             "@type": "PropertyValue",
#>             "name": "happy",
#>             "description": "Happiness Score",
#>             "dataType": "float"
#>         }
#>     ]
#> }
#> 

If you set return to “list”, you get the codebook in an R list that prints like this.

cb <- codebook(pet_data, return = "list")
#> id set to dataType string
#> pet set to dataType string
#> happy set to dataType float
cb
#> Codebook for pet_data (Psych-DS 0.1.0)
#> 
#> Dataset Parameters
#> 
#> * name: pet_data
#> * schemaVersion: Psych-DS 0.1.0
#> 
#> Column Parameters
#> 
#> * id (string): Subject ID
#> * pet (string)
#>   * Levels
#>     * cat: Cat Owners
#>     * dog: Dog Owners
#>   * Ordered: FALSE
#> * happy (float): Happiness Score

But the codebook is actually a nested list formatted like this:

str(cb)
#> List of 5
#>  $ @context        : chr "https://schema.org/"
#>  $ @type           : chr "Dataset"
#>  $ name            : chr "pet_data"
#>  $ schemaVersion   : chr "Psych-DS 0.1.0"
#>  $ variableMeasured:List of 3
#>   ..$ :List of 4
#>   .. ..$ @type      : chr "PropertyValue"
#>   .. ..$ name       : chr "id"
#>   .. ..$ description: chr "Subject ID"
#>   .. ..$ dataType   : chr "string"
#>   ..$ :List of 6
#>   .. ..$ @type        : chr "PropertyValue"
#>   .. ..$ name         : chr "pet"
#>   .. ..$ description  : chr "pet"
#>   .. ..$ levels       :List of 2
#>   .. .. ..$ cat: chr "Cat Owners"
#>   .. .. ..$ dog: chr "Dog Owners"
#>   .. ..$ dataType     : chr "string"
#>   .. ..$ levelsOrdered: logi FALSE
#>   ..$ :List of 4
#>   .. ..$ @type      : chr "PropertyValue"
#>   .. ..$ name       : chr "happy"
#>   .. ..$ description: chr "Happiness Score"
#>   .. ..$ dataType   : chr "float"
#>  - attr(*, "class")= chr [1:2] "psychds_codebook" "list"

Variable parameters

Above you saw messages about the data type that codebook guesses for each column. You can override this by setting the values manually. Below, we’ll create a new column called followup consisting of 0 and 1 values, change the type of the column from integer to boolean (T/F) and also set descriptions for id, pet and followup. The id column had a description of “Subject ID” from the sim_design function, but properties set in using vardesc will override this. You can also add unobserved levels to a factor.


pet_data$followup <- sample(0:1, nrow(pet_data), TRUE)

vardesc <- list(
  type = list(followup = "b"),
  description = c(id = "Pet ID",
                  pet = "Pet Type",
                  followup = "Followed up 2 weeks later"
                  ),
  levels = list(pet = c(cat = "Cat Owners",
                        dog = "Dog Owners", 
                        ferret = "Ferret Owners"),
                followup = c("0" = "No", "1" = "Yes")
            )
)
cb <- codebook(pet_data, name = "pets", vardesc, return = "list")
#> Warning in codebook(pet_data, name = "pets", vardesc, return = "list"): The following variable properties are not standard: type
#> id set to dataType string
#> pet set to dataType string
#> happy set to dataType float
#> followup set to dataType int

cb
#> Codebook for pets (Psych-DS 0.1.0)
#> 
#> Dataset Parameters
#> 
#> * name: pets
#> * schemaVersion: Psych-DS 0.1.0
#> 
#> Column Parameters
#> 
#> * id (string): Pet ID
#> * pet (string): Pet Type
#>   * Levels
#>     * cat: Cat Owners
#>     * dog: Dog Owners
#>     * ferret: Ferret Owners
#>   * Ordered: FALSE
#> * happy (float): Happiness Score
#> * followup (int): Followed up 2 weeks later
#>   * Levels
#>     * 0: No
#>     * 1: Yes
#>   * Ordered: FALSE

You can change the type of a column in the codebook, but this won’t affect the data itself unless you set the return argument to “data”. This runs type conversion on each column and gives you a warning if type can’t be converted.

Note how we had to change the levels for the variable followup because we’re converting them to boolean (logical) values and how the names in the levels vector have to be strings.

vardesc$levels$followup <- c("FALSE" = "No", "TRUE" = "Yes")
converted_data <- codebook(pet_data, "pets", vardesc, return = "data")
#> Warning in codebook(pet_data, "pets", vardesc, return = "data"): The following variable properties are not standard: type
#> id set to dataType string
#> pet set to dataType string
#> happy set to dataType float
#> followup set to dataType int
#> Converting pet from int to string

head(converted_data)
#>      id pet     happy followup
#> 1.1 S01 cat  9.646541        0
#> 1.2 S02 cat 10.324561        0
#> 1.3 S03 cat 15.552581        0
#> 1.4 S04 cat  9.906358        1
#> 1.5 S05 dog 15.950371        1
#> 1.6 S06 dog 11.421215        0

The codebook is attached to the returned converted data as an attribute and can be accessed as follows.

attr(converted_data, "codebook")
#> Codebook for pets (Psych-DS 0.1.0)
#> 
#> Dataset Parameters
#> 
#> * name: pets
#> * schemaVersion: Psych-DS 0.1.0
#> 
#> Column Parameters
#> 
#> * id (string): Pet ID
#> * pet (string): Pet Type
#>   * Levels
#>     * cat: Cat Owners
#>     * dog: Dog Owners
#>     * ferret: Ferret Owners
#>   * Ordered: FALSE
#> * happy (float): Happiness Score
#> * followup (int): Followed up 2 weeks later
#>   * Levels
#>     * FALSE: No
#>     * TRUE: Yes
#>   * Ordered: FALSE

You can set other variable parameters than name, type, description, and levels. The variable parameters that Psych-DS currently supports are: “description”, “privacy”, “type”, “propertyID”, “minValue”, “maxValue”, “levels”, “ordered”, “na”, “naValues”, “alternateName”, and “unitCode”. See the technical specs for descriptions of these properties. You can add custom parameters, but will get a warning.

cb <- codebook(pet_data, vardesc = list(new_param = c(id = "YES")))
#> Warning in codebook(pet_data, vardesc = list(new_param = c(id = "YES"))): The following variable properties are not standard: new_param
#> id set to dataType string
#> pet set to dataType string
#> happy set to dataType float
#> followup set to dataType int

If you have a column that is an ordered factor, the codebook will look like this:

dat <- data.frame(
  initial = sample(LETTERS, 10)
)

dat$initial <- factor(dat$initial, levels = LETTERS, ordered = TRUE)

alevels <- paste("The letter", LETTERS)
names(alevels) <- LETTERS

codebook(dat, vardesc = list(levels = list(initial = alevels)), return = "list")
#> initial set to dataType string
#> Codebook for dat (Psych-DS 0.1.0)
#> 
#> Dataset Parameters
#> 
#> * name: dat
#> * schemaVersion: Psych-DS 0.1.0
#> 
#> Column Parameters
#> 
#> * initial (string)
#>   * Levels
#>     * A: The letter A
#>     * B: The letter B
#>     * C: The letter C
#>     * D: The letter D
#>     * E: The letter E
#>     * F: The letter F
#>     * G: The letter G
#>     * H: The letter H
#>     * I: The letter I
#>     * J: The letter J
#>     * K: The letter K
#>     * L: The letter L
#>     * M: The letter M
#>     * N: The letter N
#>     * O: The letter O
#>     * P: The letter P
#>     * Q: The letter Q
#>     * R: The letter R
#>     * S: The letter S
#>     * T: The letter T
#>     * U: The letter U
#>     * V: The letter V
#>     * W: The letter W
#>     * X: The letter X
#>     * Y: The letter Y
#>     * Z: The letter Z
#>   * Ordered: TRUE

Dataset Parameters

You can add extra parameters to the whole data set. Psych-DS supports the following: “license”, “author”, “citation”, “funder”, “url”, “sameAs”, “keywords”, “temporalCoverage”, “spatialCoverage”, “datePublished”, “dateCreated”. As with variable parameters, you can add custom parameters and will just get a warning.

codebook(pet_data, license = "CC-BY 3.0", author = "Lisa DeBruine", source = "faux")
#> Warning in codebook(pet_data, license = "CC-BY 3.0", author = "Lisa DeBruine", : The following dataset properties are not standard: source
#> id set to dataType string
#> pet set to dataType string
#> happy set to dataType float
#> followup set to dataType int
#> {
#>     "@context": "https://schema.org/",
#>     "@type": "Dataset",
#>     "name": "pet_data",
#>     "schemaVersion": "Psych-DS 0.1.0",
#>     "license": "CC-BY 3.0",
#>     "author": "Lisa DeBruine",
#>     "source": "faux",
#>     "variableMeasured": [
#>         {
#>             "@type": "PropertyValue",
#>             "name": "id",
#>             "description": "Subject ID",
#>             "dataType": "string"
#>         },
#>         {
#>             "@type": "PropertyValue",
#>             "name": "pet",
#>             "description": "pet",
#>             "levels": {
#>                 "cat": "Cat Owners",
#>                 "dog": "Dog Owners"
#>             },
#>             "dataType": "string",
#>             "levelsOrdered": false
#>         },
#>         {
#>             "@type": "PropertyValue",
#>             "name": "happy",
#>             "description": "Happiness Score",
#>             "dataType": "float"
#>         },
#>         {
#>             "@type": "PropertyValue",
#>             "name": "followup",
#>             "description": "followup",
#>             "dataType": "int"
#>         }
#>     ]
#> }
#> 

You can also add parameters as lists.

dat <- sim_design(plot= FALSE)

author_list <- list(
  list(
    "@type" = "Person",
    "name" = "Melissa Kline"
  ),
  list(
    "@type" = "Person",
    "name" = "Lisa DeBruine"
  )
)


codebook(dat, return = "list", 
         author = author_list,
         keywords = c("test", "demo"))
#> id set to dataType string
#> y set to dataType float
#> Codebook for dat (Psych-DS 0.1.0)
#> 
#> Dataset Parameters
#> 
#> * name: dat
#> * schemaVersion: Psych-DS 0.1.0
#> * author: 
#>   * 1: 
#>      * @type: Person
#>      * name: Melissa Kline
#>   * 2: 
#>      * @type: Person
#>      * name: Lisa DeBruine
#> * keywords: 
#>   * test
#>   * demo
#> 
#> Column Parameters
#> 
#> * id (string)
#> * y (float): value

Existing Data

You can run the codebook function on existing data not created in faux, but will need to manually input column descriptions and factor levels.


vardesc <- list(
  description = list(
    mpg = "Miles/(US) gallon",
    cyl = "Number of cylinders",
    disp = "Displacement (cu.in.)",
    hp = "Gross horsepower",
    drat = "Rear axle ratio",
    wt = "Weight (1000 lbs)",
    qsec = "1/4 mile time",
    vs = "Engine",
    am = "Transmission",
    gear = "Number of forward gears",
    carb = "Number of carburetors"
  ),
  # min and max values can be set manually or from data
  # min and max are often outside the observed range
  minValue = list(mpg = 0, cyl = min(mtcars$cyl)),
  maxValue = list(cyl = max(mtcars$cyl)),
  type = list(
    cyl = "integer",
    hp = "integer",
    vs = "integer",
    am = "integer", 
    gear = "integer",
    carb = "integer"
  ),
  # supply levels to mark factors
  levels = list(
    vs = c("0" = "V-shaped", "1" = "straight"),
    am = c("0" = "automatic", "1" = "manual")
  )
)

codebook(mtcars, "Motor Trend Car Road Tests", 
         vardesc, return = "list")
#> Warning in codebook(mtcars, "Motor Trend Car Road Tests", vardesc, return = "list"): The following variable properties are not standard: type
#> mpg set to dataType float
#> cyl set to dataType int
#> disp set to dataType float
#> hp set to dataType int
#> drat set to dataType float
#> wt set to dataType float
#> qsec set to dataType float
#> vs set to dataType int
#> am set to dataType int
#> gear set to dataType int
#> carb set to dataType int
#> Codebook for Motor Trend Car Road Tests (Psych-DS 0.1.0)
#> 
#> Dataset Parameters
#> 
#> * name: Motor Trend Car Road Tests
#> * schemaVersion: Psych-DS 0.1.0
#> 
#> Column Parameters
#> 
#> * mpg (float): Miles/(US) gallon
#> * cyl (int): Number of cylinders
#> * disp (float): Displacement (cu.in.)
#> * hp (int): Gross horsepower
#> * drat (float): Rear axle ratio
#> * wt (float): Weight (1000 lbs)
#> * qsec (float): 1/4 mile time
#> * vs (int): Engine
#>   * Levels
#>     * 0: V-shaped
#>     * 1: straight
#>   * Ordered: FALSE
#> * am (int): Transmission
#>   * Levels
#>     * 0: automatic
#>     * 1: manual
#>   * Ordered: FALSE
#> * gear (int): Number of forward gears
#> * carb (int): Number of carburetors

Interactive Editing

There is an experimental argument to edit the codebook interactively. It runs the codebook function, then asks you to confirm types and edit descriptions. Only run this in the console, not in an RMarkdown file or a script meant to be run non-interactively.


cb <- codebook(mtcars, interactive = TRUE)