How to use the volkeR package?

First, load the package, set the plot theme and get some data.

# Load the package
library(volker)

# Set the basic plot theme
theme_set(theme_vlkr())

# Load an example dataset ds from the package
ds <- volker::chatgpt

How to generate tables and plots?

Decide whether your data is categorical or metric and choose the appropriate function:

The column selection determines whether to analyse single variables, item lists or to compare and correlate multiple variables.

Try it out!

Categorical variables

# A single variable
tab_counts(ds, use_private)
Usage: in private context n p
never 12 12%
rarely 40 40%
several times a month 30 30%
several times a week 15 15%
almost daily 4 4%
total 101 100%
# A list of variables
tab_counts(ds, c(use_private, use_work))
Usage never rarely several times a month several times a week almost daily total
in private context 12% (12) 40% (40) 30% (30) 15% (15) 4% (4) 100% (101)
in professional context 38% (38) 21% (21) 15% (15) 17% (17) 10% (10) 100% (101)
# Variables matched by a pattern
tab_counts(ds, starts_with("use_"))
Usage never rarely several times a month several times a week almost daily total
in private context 12% (12) 40% (40) 30% (30) 15% (15) 4% (4) 100% (101)
in professional context 38% (38) 21% (21) 15% (15) 17% (17) 10% (10) 100% (101)

Metric variables

# One metric variable
tab_metrics(ds, sd_age)
Age value
min 18
q1 27
median 38
q3 52
max 68
mean 39.7
sd 13.8
n 101
# Multiple metric items
tab_metrics(ds, starts_with("cg_adoption_"))
Expectations min q1 median q3 max mean sd n
ChatGPT has clear advantages compared to similar offerings. 1 3 4 4 5 3.4 1.0 97
Using ChatGPT brings financial benefits. 1 2 3 4 5 2.7 1.2 97
Using ChatGPT is advantageous in many tasks. 1 3 4 4 5 3.6 1.1 97
Compared to other systems, using ChatGPT is more fun. 1 3 4 4 5 3.5 1.0 97
Much can go wrong when using ChatGPT. 1 2 3 4 5 3.1 1.1 97
There are legal issues with using ChatGPT. 1 2 3 4 5 3.1 1.2 97
The security of user data is not guaranteed with ChatGPT. 1 3 3 4 5 3.2 1.0 97
Using ChatGPT could bring personal disadvantages. 1 2 3 3 5 2.7 1.1 97
In my environment, using ChatGPT is standard. 1 2 2 3 5 2.5 1.1 97
Almost everyone in my environment uses ChatGPT. 1 1 2 3 5 2.4 1.2 97
Not using ChatGPT is considered being an outsider. 1 1 2 3 5 2.0 1.2 97
Using ChatGPT brings me recognition from my environment. 1 1 2 3 5 2.3 1.2 97

5 missing case(s) ommited.

Cross tabulation and group comparison

Provide a grouping column in the third parameter to compare different groups.

tab_counts(ds, adopter, sd_gender)
Innovator type total female male diverse
I try new offers immediately 15%
(15)
2%
(2)
12%
(12)
1%
(1)
I try new offers rather quickly 62%
(63)
25%
(25)
38%
(38)
0%
(0)
I wait until offers establish themselves 22%
(22)
13%
(13)
9%
(9)
0%
(0)
I only use new offers when I have no other choice 1%
(1)
0%
(0)
1%
(1)
0%
(0)
total 100%
(101)
40%
(40)
59%
(60)
1%
(1)

For metric variables, you can compare the mean values.

# Compare the means of one grouping variable  (including the confidence interval)
tab_metrics(ds, sd_age, sd_gender, ci = TRUE)
Gender min q1 median q3 max mean sd ci.low ci.high n
female 18 25.8 38.0 44.2 63 37.5 13.4 33.2 41.8 40
male 19 32.5 38.5 52.0 68 41.2 14.0 37.6 44.8 60
diverse 33 33.0 33.0 33.0 33 33.0 1
total 18 27.0 38.0 52.0 68 39.7 13.8 37.0 42.4 101

By default, the crossing variable is treated as categorical. You can change this behaviour using the metric-parameter to calculate correlations:

# Correlate two metric variables
tab_metrics(ds, sd_age, use_work, metric = TRUE, ci = TRUE)
Item 1 Item 2 n Pearson’s r ci.low ci.high
Age Usage: in professional context 101 -0.2 -0.38 0

Each table function has a corresponding plot function with parameters to pimp the result. See the function help (F1 key) to learn the options.

For example, you can use the prop parameter to grow bars to 100%. The numbers parameter prints the percentages onto the bars.

ds |> 
  filter(sd_gender != "diverse") |> 
  plot_counts(adopter, sd_gender, prop="rows", numbers="p")

Further, the effect-functions implement statistical tests:

ds |> 
  filter(sd_gender != "diverse") |> 
  effect_counts(adopter, sd_gender)
Statistic Value
Number of cases 100
Cramer’s V 0.05
Degrees of freedom
Chi-squared 7.87
p value 0.030
stars *

Automatically generate reports

Step by step

Reports combine plots, tables and effect calculations. Optionally, for item batteries, an index is calculated and reported.

To see an example or develop own reports, use the volker report template in RStudio:

  • Create a new R Markdown document from the main menu
  • In the popup select the “From Template” option
  • Select the volker template.
  • The template contains a working example. Just click knit to see the result.

Have fun with developing own reports!

Alternatively, manually add volker::html_report to the output options of your Markdown document:

---
title: "How to create reports?"
output: 
  volker::html_report
---

Then, you can generate combined outputs using the report-functions. One advantage of the report-functions is that plots are automatically scaled to fit the page.

The main entry point for reports are the report-functions. See the function help for further options (F1 key).

ds %>% 
  filter(sd_gender != "diverse") %>% 
  report_metrics(starts_with("cg_adoption_"), sd_gender, index=TRUE, box=TRUE, ci=TRUE)

Expectations

Plot

5 missing case(s) ommited.

Table
Expectations total female male
ChatGPT has clear advantages compared to similar offerings. 3.4
(1.0)
3.6
(1.0)
3.3
(1.0)
Using ChatGPT brings financial benefits. 2.7
(1.2)
2.6
(1.2)
2.7
(1.2)
Using ChatGPT is advantageous in many tasks. 3.6
(1.1)
3.7
(1.0)
3.5
(1.1)
Compared to other systems, using ChatGPT is more fun. 3.5
(1.0)
3.6
(1.0)
3.5
(1.0)
Much can go wrong when using ChatGPT. 3.1
(1.1)
3.1
(1.0)
3.1
(1.2)
There are legal issues with using ChatGPT. 3.1
(1.2)
3.0
(1.0)
3.1
(1.3)
The security of user data is not guaranteed with ChatGPT. 3.2
(1.0)
3.0
(1.0)
3.3
(1.1)
Using ChatGPT could bring personal disadvantages. 2.7
(1.1)
2.5
(0.9)
2.8
(1.2)
In my environment, using ChatGPT is standard. 2.5
(1.1)
2.5
(0.9)
2.5
(1.3)
Almost everyone in my environment uses ChatGPT. 2.4
(1.2)
2.4
(1.0)
2.3
(1.3)
Not using ChatGPT is considered being an outsider. 2.0
(1.2)
1.8
(1.0)
2.1
(1.3)
Using ChatGPT brings me recognition from my environment. 2.3
(1.2)
2.4
(1.2)
2.3
(1.3)

5 missing case(s) ommited.

Index: Plot

5 missing case(s) ommited.

Index: Table
Gender min q1 median q3 max mean sd ci.low ci.high n items alpha
female 2 2.4 2.9 3.1 3.8 2.9 0.5 2.7 3.0 37 12 0.81
male 1 2.5 2.8 3.2 5.0 2.9 0.7 2.7 3.1 59 12 0.81
total 1 2.4 2.8 3.2 5.0 2.9 0.6 2.7 3.0 96 12 0.81

5 missing case(s) ommited.

Custom content

By default, a header and tabsheets are automatically created. You can mis in custom content.

  • If you want to add content before the report outputs, set the title parameter to FALSE and add your own title.
  • A good place for methodological details is a custom tabsheet next to the “Plot” and the “Table” buttons. You can add a tab by setting the close-parameter to FALSE and adding a new header on the fifth level (5 x # followed by the tab name). Close your custom new tabsheet with #### {-} (4 x #).

All together, the following report output is generated by the pattern:

#> ### Adoption types
#> 
#> ```{r echo=FALSE}
#> ds %>% 
#>   filter(sd_gender != "diverse") %>% 
#>   report_counts(adopter, sd_gender, prop="rows", title=FALSE, close=FALSE, box=TRUE, ci=TRUE)
#> ```
#>
#> ##### Method
#> Basis: Only male and female respondents.
#> 
#> #### {-}

Adoption types

Plot

Table
Innovator type total female male
I try new offers immediately 100%
(14)
14%
(2)
86%
(12)
I try new offers rather quickly 100%
(63)
40%
(25)
60%
(38)
I wait until offers establish themselves 100%
(22)
59%
(13)
41%
(9)
I only use new offers when I have no other choice 100%
(1)
0%
(0)
100%
(1)
total 100%
(100)
40%
(40)
60%
(60)
Method

Basis: Only male and female respondents.

Customizing outputs

Plot and table functions share a number of parameters that can be used to customize the outputs. Lookup the available parameters in the help of the specific function.

The theme_vlkr()-function lets you customise colors:

theme_set(theme_vlkr(
  base_fill = c("#F0983A","#3ABEF0","#95EF39","#E35FF5","#7A9B59"),
  base_gradient = c("#FAE2C4","#F0983A")
))

Custom labels: Where do they come from?

Labels used in plots and tables are stored in the comment attribute of the variable. You can inspect all labels using the codebook()-function:

codebook(ds)
# A tibble: 94 × 6
   item_name     item_group item_class item_label         value_name value_label
   <chr>         <chr>      <chr>      <chr>              <chr>      <chr>      
 1 case          case       <NA>       case               <NA>       <NA>       
 2 sd_age        sd         <NA>       Age                <NA>       <NA>       
 3 cg_activities cg         <NA>       Activities with C… <NA>       <NA>       
 4 use_private   use        <NA>       Usage: in private… 1          never      
 5 use_private   use        <NA>       Usage: in private… 2          rarely     
 6 use_private   use        <NA>       Usage: in private… 3          several ti…
 7 use_private   use        <NA>       Usage: in private… 4          several ti…
 8 use_private   use        <NA>       Usage: in private… 5          almost dai…
 9 use_work      use        <NA>       Usage: in profess… 1          never      
10 use_work      use        <NA>       Usage: in profess… 2          rarely     
# ℹ 84 more rows

You can set custom or new labels with labs_apply() by providing a tibble with item names in the first column and item labels in the second column.

newlabels <- tribble(
  ~item_name, ~item_label,
  "cg_adoption_advantage_01", "Allgemeine Vorteile",
  "cg_adoption_advantage_02", "Finanzielle Vorteile",
  "cg_adoption_advantage_03", "Vorteile bei der Arbeit",
  "cg_adoption_advantage_04", "Macht mehr Spaß"
)

ds %>%
  labs_apply(newlabels) %>%
  tab_metrics_items(starts_with("cg_adoption_advantage_"))
Item min q1 median q3 max mean sd n
Allgemeine Vorteile 1 3 4 4 5 3.5 1.0 99
Finanzielle Vorteile 1 2 3 4 5 2.7 1.2 99
Vorteile bei der Arbeit 1 3 4 4 5 3.6 1.1 99
Macht mehr Spaß 1 3 4 4 5 3.5 1.0 99

2 missing case(s) ommited.

Alternatively, save the result of codebook(ds) to an Excel file, change the labels and then call labs_apply() with your new codebook.

Index calculation for item batteries

You can calculate mean indexes from a bunch of items using idx_add(). A new column is created with the average value of all selected columns for each case.

Reliability and number of items are calculated with psych::alpha() and stored as column attribute named “psych.alpha”. The reliability values are printed by tab_metrics().

Add a single index

ds %>%
  idx_add(starts_with("cg_adoption_")) %>%
  tab_metrics(idx_cg_adoption)
Index: cg_adoption value
min 1
q1 2.4
median 2.8
q3 3.2
max 5
mean 2.9
sd 0.6
n 97
items 12
alpha 0.81

5 missing case(s) ommited.

Compare the index values by group

ds %>%
  idx_add(starts_with("cg_adoption_")) %>%
  tab_metrics(idx_cg_adoption, adopter)
Innovator type min q1 median q3 max mean sd n items alpha
I try new offers immediately 1.5 3.2 3.3 4.1 5.0 3.5 0.9 15 12 0.81
I try new offers rather quickly 1.8 2.5 2.8 3.1 3.8 2.8 0.5 61 12 0.81
I wait until offers establish themselves 1.0 2.4 2.7 3.0 3.8 2.7 0.6 20 12 0.81
I only use new offers when I have no other choice 2.4 2.4 2.4 2.4 2.4 2.4 1 12 0.81
total 1.0 2.4 2.8 3.2 5.0 2.9 0.6 97 12 0.81

5 missing case(s) ommited.

Add multiple indizes and summarize them

ds %>%
  idx_add(starts_with("cg_adoption_")) %>%
  idx_add(starts_with("cg_adoption_advantage")) %>%
  idx_add(starts_with("cg_adoption_fearofuse")) %>%
  idx_add(starts_with("cg_adoption_social")) %>%
  tab_metrics(starts_with("idx_cg_adoption"))
Item min q1 median q3 max mean sd n items alpha
Index: cg_adoption 1 2.4 2.8 3.2 5 2.9 0.6 97 12 0.81
Index: cg_adoption_advantage_0 1 3.0 3.5 3.8 5 3.3 0.9 97 4 0.8
Index: cg_adoption_fearofuse_0 1 2.5 3.0 3.5 5 3.0 0.8 97 4 0.7
Index: cg_adoption_social_0 1 1.5 2.0 3.0 5 2.3 1.0 97 4 0.84

5 missing case(s) ommited.

What’s behind the scenes?

The volker-package is based on standard methods for data handling and visualisation. You can produce all outputs with a handful of functions. The package just makes your code dry - don’t repeat yourself - and wraps often used snippets into a simple interface.

Basically, all table values are calculated two tidyverse functions:

To shape the data frames, two essential functions come into play:

Plots are generated by ggplot().

The package provides print- and knit-functions that pimp console and markdown output. To make this work, the cleaned data, produced plots, tables and markdown snippets gain new classes (vlkr_df, vlkr_plt, vlkr_tbl, vlkr_list, vlkr_rprt).