How to use the volkeR package?

First, load the package, set the plot theme and get some data.

# Load the package
library(volker)

# Set the basic plot theme
theme_set(theme_vlkr())

# Load an example dataset ds from the package
ds <- volker::chatgpt

How to generate tables and plots?

Decide whether your data is categorical or metric and choose the appropriate function:

tab_counts() shows frequency tables.
tab_metrics() creates tables with distribution parameters.
plot_counts() generated simple and stacked bar charts.
plot_metrics() visualised distributions in density plots, box plots or scatter plots.
effect_counts() calculates test statistics for categorical data.
effect_metrics() calculates test statistics for metric data.

The column selection determines whether to analyse single variables, item lists or to compare and correlate multiple variables.

Try it out!

Categorical variables

# A single variable
tab_counts(ds, use_private)

Usage: in private context	n	p
never	12	12%
rarely	40	40%
several times a month	30	30%
several times a week	15	15%
almost daily	4	4%
total	101	100%

# A list of variables
tab_counts(ds, c(use_private, use_work))

Usage	never	rarely	several times a month	several times a week	almost daily	total
in private context	12% (12)	40% (40)	30% (30)	15% (15)	4% (4)	100% (101)
in professional context	38% (38)	21% (21)	15% (15)	17% (17)	10% (10)	100% (101)

# Variables matched by a pattern
tab_counts(ds, starts_with("use_"))

Usage	never	rarely	several times a month	several times a week	almost daily	total
in private context	12% (12)	40% (40)	30% (30)	15% (15)	4% (4)	100% (101)
in professional context	38% (38)	21% (21)	15% (15)	17% (17)	10% (10)	100% (101)

Metric variables

# One metric variable
tab_metrics(ds, sd_age)

Age	value
min	18
q1	27
median	38
q3	52
max	68
mean	39.7
sd	13.8
n	101

# Multiple metric items
tab_metrics(ds, starts_with("cg_adoption_"))

Expectations	min	q1	median	q3	max	mean	sd	n
ChatGPT has clear advantages compared to similar offerings.	1	3	4	4	5	3.4	1.0	97
Using ChatGPT brings financial benefits.	1	2	3	4	5	2.7	1.2	97
Using ChatGPT is advantageous in many tasks.	1	3	4	4	5	3.6	1.1	97
Compared to other systems, using ChatGPT is more fun.	1	3	4	4	5	3.5	1.0	97
Much can go wrong when using ChatGPT.	1	2	3	4	5	3.1	1.1	97
There are legal issues with using ChatGPT.	1	2	3	4	5	3.1	1.2	97
The security of user data is not guaranteed with ChatGPT.	1	3	3	4	5	3.2	1.0	97
Using ChatGPT could bring personal disadvantages.	1	2	3	3	5	2.7	1.1	97
In my environment, using ChatGPT is standard.	1	2	2	3	5	2.5	1.1	97
Almost everyone in my environment uses ChatGPT.	1	1	2	3	5	2.4	1.2	97
Not using ChatGPT is considered being an outsider.	1	1	2	3	5	2.0	1.2	97
Using ChatGPT brings me recognition from my environment.	1	1	2	3	5	2.3	1.2	97

5 missing case(s) ommited.

Cross tabulation and group comparison

Provide a grouping column in the third parameter to compare different groups.

tab_counts(ds, adopter, sd_gender)

Innovator type	total	female	male	diverse
I try new offers immediately	15% (15)	2% (2)	12% (12)	1% (1)
I try new offers rather quickly	62% (63)	25% (25)	38% (38)	0% (0)
I wait until offers establish themselves	22% (22)	13% (13)	9% (9)	0% (0)
I only use new offers when I have no other choice	1% (1)	0% (0)	1% (1)	0% (0)
total	100% (101)	40% (40)	59% (60)	1% (1)

For metric variables, you can compare the mean values.

# Compare the means of one grouping variable  (including the confidence interval)
tab_metrics(ds, sd_age, sd_gender, ci = TRUE)

Gender	min	q1	median	q3	max	mean	sd	ci.low	ci.high	n
female	18	25.8	38.0	44.2	63	37.5	13.4	33.2	41.8	40
male	19	32.5	38.5	52.0	68	41.2	14.0	37.6	44.8	60
diverse	33	33.0	33.0	33.0	33	33.0				1
total	18	27.0	38.0	52.0	68	39.7	13.8	37.0	42.4	101

By default, the crossing variable is treated as categorical. You can change this behaviour using the metric-parameter to calculate correlations:

# Correlate two metric variables
tab_metrics(ds, sd_age, use_work, metric = TRUE, ci = TRUE)

Item 1	Item 2	n	Pearson’s r	ci.low	ci.high
Age	Usage: in professional context	101	-0.2	-0.38	0

Each table function has a corresponding plot function with parameters to pimp the result. See the function help (F1 key) to learn the options.

For example, you can use the prop parameter to grow bars to 100%. The numbers parameter prints the percentages onto the bars.

ds |> 
  filter(sd_gender != "diverse") |> 
  plot_counts(adopter, sd_gender, prop="rows", numbers="p")

Further, the effect-functions implement statistical tests:

ds |> 
  filter(sd_gender != "diverse") |> 
  effect_counts(adopter, sd_gender)

Statistic	Value
Number of cases	100
Cramer’s V	0.05
Degrees of freedom
Chi-squared	7.87
p value	0.030
stars	*

Automatically generate reports

Step by step

Reports combine plots, tables and effect calculations. Optionally, for item batteries, an index is calculated and reported.

To see an example or develop own reports, use the volker report template in RStudio:

Create a new R Markdown document from the main menu
In the popup select the “From Template” option
Select the volker template.
The template contains a working example. Just click knit to see the result.

Have fun with developing own reports!

Alternatively, manually add volker::html_report to the output options of your Markdown document:

---
title: "How to create reports?"
output: 
  volker::html_report
---

Then, you can generate combined outputs using the report-functions. One advantage of the report-functions is that plots are automatically scaled to fit the page.

The main entry point for reports are the report-functions. See the function help for further options (F1 key).

ds %>% 
  filter(sd_gender != "diverse") %>% 
  report_metrics(starts_with("cg_adoption_"), sd_gender, index=TRUE, box=TRUE, ci=TRUE)

Expectations

Plot

5 missing case(s) ommited.

Table

Expectations	total	female	male
ChatGPT has clear advantages compared to similar offerings.	3.4 (1.0)	3.6 (1.0)	3.3 (1.0)
Using ChatGPT brings financial benefits.	2.7 (1.2)	2.6 (1.2)	2.7 (1.2)
Using ChatGPT is advantageous in many tasks.	3.6 (1.1)	3.7 (1.0)	3.5 (1.1)
Compared to other systems, using ChatGPT is more fun.	3.5 (1.0)	3.6 (1.0)	3.5 (1.0)
Much can go wrong when using ChatGPT.	3.1 (1.1)	3.1 (1.0)	3.1 (1.2)
There are legal issues with using ChatGPT.	3.1 (1.2)	3.0 (1.0)	3.1 (1.3)
The security of user data is not guaranteed with ChatGPT.	3.2 (1.0)	3.0 (1.0)	3.3 (1.1)
Using ChatGPT could bring personal disadvantages.	2.7 (1.1)	2.5 (0.9)	2.8 (1.2)
In my environment, using ChatGPT is standard.	2.5 (1.1)	2.5 (0.9)	2.5 (1.3)
Almost everyone in my environment uses ChatGPT.	2.4 (1.2)	2.4 (1.0)	2.3 (1.3)
Not using ChatGPT is considered being an outsider.	2.0 (1.2)	1.8 (1.0)	2.1 (1.3)
Using ChatGPT brings me recognition from my environment.	2.3 (1.2)	2.4 (1.2)	2.3 (1.3)

5 missing case(s) ommited.

Index: Plot

5 missing case(s) ommited.

Index: Table

Gender	min	q1	median	q3	max	mean	sd	ci.low	ci.high	n	items	alpha
female	2	2.4	2.9	3.1	3.8	2.9	0.5	2.7	3.0	37	12	0.81
male	1	2.5	2.8	3.2	5.0	2.9	0.7	2.7	3.1	59	12	0.81
total	1	2.4	2.8	3.2	5.0	2.9	0.6	2.7	3.0	96	12	0.81

5 missing case(s) ommited.

Custom content

By default, a header and tabsheets are automatically created. You can mis in custom content.

If you want to add content before the report outputs, set the title parameter to FALSE and add your own title.
A good place for methodological details is a custom tabsheet next to the “Plot” and the “Table” buttons. You can add a tab by setting the close-parameter to FALSE and adding a new header on the fifth level (5 x # followed by the tab name). Close your custom new tabsheet with #### {-} (4 x #).

All together, the following report output is generated by the pattern:

#> ### Adoption types
#> 
#> ```{r echo=FALSE}
#> ds %>% 
#>   filter(sd_gender != "diverse") %>% 
#>   report_counts(adopter, sd_gender, prop="rows", title=FALSE, close=FALSE, box=TRUE, ci=TRUE)
#> ```
#>
#> ##### Method
#> Basis: Only male and female respondents.
#> 
#> #### {-}

Adoption types

Plot

Table

Innovator type	total	female	male
I try new offers immediately	100% (14)	14% (2)	86% (12)
I try new offers rather quickly	100% (63)	40% (25)	60% (38)
I wait until offers establish themselves	100% (22)	59% (13)	41% (9)
I only use new offers when I have no other choice	100% (1)	0% (0)	100% (1)
total	100% (100)	40% (40)	60% (60)

Method

Basis: Only male and female respondents.

Customizing outputs

Plot and table functions share a number of parameters that can be used to customize the outputs. Lookup the available parameters in the help of the specific function.

The theme_vlkr()-function lets you customise colors:

theme_set(theme_vlkr(
  base_fill = c("#F0983A","#3ABEF0","#95EF39","#E35FF5","#7A9B59"),
  base_gradient = c("#FAE2C4","#F0983A")
))

Custom labels: Where do they come from?

Labels used in plots and tables are stored in the comment attribute of the variable. You can inspect all labels using the codebook()-function:

codebook(ds)

# A tibble: 94 × 6
   item_name     item_group item_class item_label         value_name value_label
   <chr>         <chr>      <chr>      <chr>              <chr>      <chr>      
 1 case          case       <NA>       case               <NA>       <NA>       
 2 sd_age        sd         <NA>       Age                <NA>       <NA>       
 3 cg_activities cg         <NA>       Activities with C… <NA>       <NA>       
 4 use_private   use        <NA>       Usage: in private… 1          never      
 5 use_private   use        <NA>       Usage: in private… 2          rarely     
 6 use_private   use        <NA>       Usage: in private… 3          several ti…
 7 use_private   use        <NA>       Usage: in private… 4          several ti…
 8 use_private   use        <NA>       Usage: in private… 5          almost dai…
 9 use_work      use        <NA>       Usage: in profess… 1          never      
10 use_work      use        <NA>       Usage: in profess… 2          rarely     
# ℹ 84 more rows

You can set custom or new labels with labs_apply() by providing a tibble with item names in the first column and item labels in the second column.

newlabels <- tribble(
  ~item_name, ~item_label,
  "cg_adoption_advantage_01", "Allgemeine Vorteile",
  "cg_adoption_advantage_02", "Finanzielle Vorteile",
  "cg_adoption_advantage_03", "Vorteile bei der Arbeit",
  "cg_adoption_advantage_04", "Macht mehr Spaß"
)

ds %>%
  labs_apply(newlabels) %>%
  tab_metrics_items(starts_with("cg_adoption_advantage_"))

Item	min	q1	median	q3	max	mean	sd	n
Allgemeine Vorteile	1	3	4	4	5	3.5	1.0	99
Finanzielle Vorteile	1	2	3	4	5	2.7	1.2	99
Vorteile bei der Arbeit	1	3	4	4	5	3.6	1.1	99
Macht mehr Spaß	1	3	4	4	5	3.5	1.0	99

2 missing case(s) ommited.

Alternatively, save the result of codebook(ds) to an Excel file, change the labels and then call labs_apply() with your new codebook.

Index calculation for item batteries

You can calculate mean indexes from a bunch of items using idx_add(). A new column is created with the average value of all selected columns for each case.

Reliability and number of items are calculated with psych::alpha() and stored as column attribute named “psych.alpha”. The reliability values are printed by tab_metrics().

Add a single index

ds %>%
  idx_add(starts_with("cg_adoption_")) %>%
  tab_metrics(idx_cg_adoption)

Index: cg_adoption	value
min	1
q1	2.4
median	2.8
q3	3.2
max	5
mean	2.9
sd	0.6
n	97
items	12
alpha	0.81

5 missing case(s) ommited.

Compare the index values by group

ds %>%
  idx_add(starts_with("cg_adoption_")) %>%
  tab_metrics(idx_cg_adoption, adopter)

Innovator type	min	q1	median	q3	max	mean	sd	n	items	alpha
I try new offers immediately	1.5	3.2	3.3	4.1	5.0	3.5	0.9	15	12	0.81
I try new offers rather quickly	1.8	2.5	2.8	3.1	3.8	2.8	0.5	61	12	0.81
I wait until offers establish themselves	1.0	2.4	2.7	3.0	3.8	2.7	0.6	20	12	0.81
I only use new offers when I have no other choice	2.4	2.4	2.4	2.4	2.4	2.4		1	12	0.81
total	1.0	2.4	2.8	3.2	5.0	2.9	0.6	97	12	0.81

5 missing case(s) ommited.

Add multiple indizes and summarize them

ds %>%
  idx_add(starts_with("cg_adoption_")) %>%
  idx_add(starts_with("cg_adoption_advantage")) %>%
  idx_add(starts_with("cg_adoption_fearofuse")) %>%
  idx_add(starts_with("cg_adoption_social")) %>%
  tab_metrics(starts_with("idx_cg_adoption"))

Item	min	q1	median	q3	max	mean	sd	n	items	alpha
Index: cg_adoption	1	2.4	2.8	3.2	5	2.9	0.6	97	12	0.81
Index: cg_adoption_advantage_0	1	3.0	3.5	3.8	5	3.3	0.9	97	4	0.8
Index: cg_adoption_fearofuse_0	1	2.5	3.0	3.5	5	3.0	0.8	97	4	0.7
Index: cg_adoption_social_0	1	1.5	2.0	3.0	5	2.3	1.0	97	4	0.84

5 missing case(s) ommited.

What’s behind the scenes?

The volker-package is based on standard methods for data handling and visualisation. You can produce all outputs with a handful of functions. The package just makes your code dry - don’t repeat yourself - and wraps often used snippets into a simple interface.

Basically, all table values are calculated two tidyverse functions:

count() is used to produce counts
skim() is used to produce metrics

To shape the data frames, two essential functions come into play:

group_by() is used to calculate grouped outputs
pivot_longer() brings multiple items into a format where the item name becomes a grouping variable.

Plots are generated by ggplot().

The package provides print- and knit-functions that pimp console and markdown output. To make this work, the cleaned data, produced plots, tables and markdown snippets gain new classes (vlkr_df, vlkr_plt, vlkr_tbl, vlkr_list, vlkr_rprt).

Introduction

How to use the volkeR package?

How to generate tables and plots?

Categorical variables

Metric variables

Cross tabulation and group comparison

Automatically generate reports

Step by step

Expectations

Plot

Table

Index: Plot

Index: Table

Custom content

Adoption types

Plot

Table

Method

Customizing outputs

Custom labels: Where do they come from?

Index calculation for item batteries

What’s behind the scenes?