callback: getting started

We will use the callback package to analyze hiring discrimination based on gender and origin in France. The experiment was conducted in 2009 for the jobs of software developers (Petit et al., 2013). The data is available in the data frame inter1. Let’s examine its contents.

library(callback)
data(inter1)
str(inter1)
#> 'data.frame':    2480 obs. of  11 variables:
#>  $ offer    : Factor w/ 310 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 2 2 ...
#>  $ firstn   : Factor w/ 8 levels "Abdallah","Amadou",..: 7 4 5 6 1 2 3 8 1 2 ...
#>  $ lastn    : Factor w/ 8 levels "Bertrand","Diallo",..: 5 3 4 7 8 2 1 6 8 2 ...
#>  $ origin   : Factor w/ 4 levels "F","M","S","V": 1 3 2 4 2 3 1 4 2 3 ...
#>  $ sentorder: int  3 7 6 2 1 5 4 8 8 4 ...
#>  $ gender   : Factor w/ 2 levels "Man","Woman": 2 2 2 2 1 1 1 1 1 1 ...
#>  $ callback : logi  TRUE TRUE TRUE TRUE FALSE FALSE ...
#>  $ paris    : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ cont     : Factor w/ 2 levels "LTC","STC": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ ansorder : int  1 2 3 4 9 9 9 9 9 9 ...
#>  $ date     : Factor w/ 3 levels "April 2009","February 2009",..: 2 2 2 2 2 2 2 2 2 2 ...

The first variable offer is very important. It indicates the job offer identification. It is important because, in order to test discrimination, the workers must candidate on the same job offer. This is the cluster parameter of the callback() function. With cluster="offer" we are sure that all the computations will be paired, which means that we will always compare the candidates on the very same job offer. This is essential to produce meaningful results since otherwise the difference of answers could come from the differences of recruiters and not from the differences in gender or origin.

The second important variables are the one that define the candidates. Here, there are two variables : gender and origin. These are factors and the reference levels of these factors define a reference candidate. Which one? By convention, it is the candidate that is the least susceptible of being discriminated against. Here, the reference candidate would be male because male candidates should not be discriminated against because of their gender, and from a French origin because French origin candidates should not be discriminated against in the French labour market because they have a French origin. In practice, we wil check that this candidate really had the highest callback rate. We can find the reference levels of our two factors by looking at the first level given by the levels() function.

levels(inter1$gender)
#> [1] "Man"   "Woman"
levels(inter1$origin)
#> [1] "F" "M" "S" "V"

There are two genders : “Man” (reference) and “Woman”. There are four origins : French (F, reference), Moroccan (M), Senegalese (S) and Vietnamese (V). You do not need to aggregate the two candidates’ variables gender and origin to use callback(), it will do it for you.

The last element we need is, obviously, the outcome of the job hiring application. It is given by the callback variable. It is a Boolean variable, TRUE when the recruiter gives a non negative callback, and FALSE otherwise.

We can know launch the callback() function, which prepares the data for statistical analysis. Here we need to choose the comp parameter. Indeed, we realize that there are 8 candidates so that \(8\times 7/2=28\) possible comparisons are possible. This is a large number and this is why callback() performs the statistical analysis according to the reference candidate by default with comp="ref". This reduces our analysis to 7 comparisons. You can get the 28 comparisons by choosing comp="all" instead.

m <- callback(data = inter1, cluster = "offer", candid = c("gender","origin"), callback = "callback")

The m object contains for formatted data needed for the analysis. Using ‘print()’ gives the mains characteristics of the experiment :

print(m)
#> 
#>  Structure of the experiment 
#>  ---------------------------
#>  
#>  Candidates defined by: gender origin 
#>  Callback variable: callback 
#>  
#>  Number of tests for each candidate:
#> 
#>   Man.F Woman.F   Man.M Woman.M   Man.S Woman.S   Man.V Woman.V 
#>     310     310     310     310     310     310     310     310 
#> 
#>  
#>  Number of tests for each pair of candidates:
#> 
#>  Man.F.vs.Woman.F Man.F.vs.Man.M Man.F.vs.Woman.M Man.F.vs.Man.S
#>               310            310              310            310
#>  Man.F.vs.Woman.S Man.F.vs.Man.V Man.F.vs.Woman.V
#>               310            310              310

We find that the experiment is standard, in the sense that all the candidates were sent to all the applications. Notice that this is not needed to use callback.

In order to get the result of the discrimination tests, we will use the ‘stat_count’ function. It can be saved into an object for further exports, or printed. The following instruction:

s <- stat_count(m)

does not produce any printed output, by saves an object stat_count into s. We can get the statistics with:

print(s)
#> 
#>  Callback counts:
#>  ----------------
#>                  tests callback callback1 callback2 Neither Only 1 Only 2 Both
#> Man.F.vs.Woman.F   310      113        86        70     197     43     27   43
#> Man.F.vs.Man.M     310      106        86        65     204     41     20   45
#> Man.F.vs.Woman.M   310       96        86        32     214     64     10   22
#> Man.F.vs.Man.S     310      100        86        43     210     57     14   29
#> Man.F.vs.Woman.S   310       97        86        26     213     71     11   15
#> Man.F.vs.Man.V     310       97        86        38     213     59     11   27
#> Man.F.vs.Woman.V   310      111        86        62     199     49     25   37
#>                  Difference
#> Man.F.vs.Woman.F         16
#> Man.F.vs.Man.M           21
#> Man.F.vs.Woman.M         54
#> Man.F.vs.Man.S           43
#> Man.F.vs.Woman.S         60
#> Man.F.vs.Man.V           48
#> Man.F.vs.Woman.V         24

The callback counts describe the results of the paired experiments. The first column defines the comparison under the form “candidate 1 vs candidate 2”. Here “Man.F vs Woman.F” means that we compare French origin men and women. Out of \(310\) tests, \(113\) got at least one callback. The men got \(86\) callbacks and the women \(70\). The difference, called net discrimination, equals \(16\) callbacks. We can go further in the details thanks to the next block. Out of \(310\) tests, neither candidate was called back in \(197\) of the job offers, \(43\) called only men, \(27\) called only women and \(43\) called both. Discrimination only occurs when a single candidate is called back. The net discrimination is thus \(43-27=16\).

Now, we can pass to the proportions analysis. We can save hte output or print it, like in the previous example. Printing is the default. There are two ways to compute proportions in discrimination studies. First, you can divide the number of callbacks by the number of tests. Second, you can restrict your analysis to the tests where only one candidate has been called back, and give his share of the callbacks. We start with the first convention.

stat_prop(m)
#> 
#>  
#>  Analysis of proportions - fractions of tests:
#>  --------------------------------------------- 
#>  
#>  At least one callback:
#> 
#>                  tests L95_p_callback p_callback U95_p_callback
#> Man.F.vs.Woman.F   310          0.311      0.365          0.421
#> Man.F.vs.Man.M     310          0.289      0.342          0.398
#> Man.F.vs.Woman.M   310          0.259      0.310          0.364
#> Man.F.vs.Man.S     310          0.271      0.323          0.378
#> Man.F.vs.Woman.S   310          0.262      0.313          0.368
#> Man.F.vs.Man.V     310          0.262      0.313          0.368
#> Man.F.vs.Woman.V   310          0.305      0.358          0.414
#> (Clopper-Pearson confidence intervals at the  95 % level)
#> 
#> 
#>  Callback rates of the candidates (cand1 vs cand2): 
#> 
#>                  L95_p_cand1 p_cand1 U95_p_cand1 L95_p_cand2 p_cand2
#> Man.F.vs.Woman.F       0.228   0.277       0.331      0.1805  0.2258
#> Man.F.vs.Man.M         0.228   0.277       0.331      0.1657  0.2097
#> Man.F.vs.Woman.M       0.228   0.277       0.331      0.0717  0.1032
#> Man.F.vs.Man.S         0.228   0.277       0.331      0.1022  0.1387
#> Man.F.vs.Woman.S       0.228   0.277       0.331      0.0555  0.0839
#> Man.F.vs.Man.V         0.228   0.277       0.331      0.0882  0.1226
#> Man.F.vs.Woman.V       0.228   0.277       0.331      0.1569  0.2000
#>                  U95_p_cand2
#> Man.F.vs.Woman.F       0.276
#> Man.F.vs.Man.M         0.259
#> Man.F.vs.Woman.M       0.143
#> Man.F.vs.Man.S         0.182
#> Man.F.vs.Woman.S       0.120
#> Man.F.vs.Man.V         0.164
#> Man.F.vs.Woman.V       0.249
#> (Clopper-Pearson confidence intervals at the  95 % level)
#> 
#> 
#>  Analysis of proportions - fractions of exclusive callbacks:
#>  -----------------------------------------------------------
#> 
#>  Callback rates of the candidates (cand1 vs cand2): 
#> 
#>                  callback L95_r_cand1 r_cand_1 U95_r_cand1 L95_r_cand2 r_cand_2
#> Man.F.vs.Woman.F      113       0.490    0.614       0.728      0.2717    0.386
#> Man.F.vs.Man.M        106       0.540    0.672       0.787      0.2131    0.328
#> Man.F.vs.Woman.M       96       0.765    0.865       0.933      0.0668    0.135
#> Man.F.vs.Man.S        100       0.691    0.803       0.888      0.1122    0.197
#> Man.F.vs.Woman.S       97       0.773    0.866       0.931      0.0689    0.134
#> Man.F.vs.Man.V         97       0.736    0.843       0.919      0.0811    0.157
#> Man.F.vs.Woman.V      111       0.543    0.662       0.768      0.2319    0.338
#>                  U95_r_cand2
#> Man.F.vs.Woman.F       0.510
#> Man.F.vs.Man.M         0.460
#> Man.F.vs.Woman.M       0.235
#> Man.F.vs.Man.S         0.309
#> Man.F.vs.Woman.S       0.227
#> Man.F.vs.Man.V         0.264
#> Man.F.vs.Woman.V       0.457
#> (Clopper-Pearson confidence intervals at the  95 % level)

The overall callback rate measures the tension in the labour market: the more firms need workers, the higher it is. For software developers, we are above \(30\%\), which is a high callback rate. The next block give the callback rates of pairs of candidates with their confidence interval at the \(95\%\) level (you can change it with the option “level”). French origin men were called back in \(27.7\%\) of the 310 applications (\(95\%\) interval: \(22.8\%-33.1\%\)), while French origin women were called back for \(22.6\%\) of the tests (\(95\%\) interval : \(18.1\%-27.6\%\)).

Let’s apply the second convention. Out of the 310 tests, 113 were discriminatory (only one of the two candidates had a callback). French origin men got \(61,4\%\) of these callbacks and the French origin women \(1-61,4\%=38,6\%\). The confidence intervals are also provided.

We may then wonder: are these differences statistically significant? We enter:

stat_comp(m)
#> 
#>  
#>  Equality of proportions - fractions of tests:
#>  --------------------------------------------- 
#>                  p_callback1-p_callback2 Student    p-value
#> Man.F vs Woman.F                0.051613  1.9206 5.5697e-02
#> Man.F vs Man.M                  0.067742  2.7163 6.9735e-03
#> Man.F vs Woman.M                0.174194  6.7081 9.4042e-11
#> Man.F vs Man.S                  0.138710  5.3234 1.9619e-07
#> Man.F vs Woman.S                0.193548  7.1401 6.7049e-12
#> Man.F vs Man.V                  0.154839  6.0585 3.9907e-09
#> Man.F vs Woman.V                0.077419  2.8211 5.0960e-03
#> (Asymptotic Student tests)
#> 
#>  
#>  Equality of proportions - fractions of exclusive callbacks:
#>  ----------------------------------------------------------- 
#>                  r_cand1 r_cand1-0.5    p-value
#> Man.F vs Woman.F 0.61429     0.11429 7.2238e-02
#> Man.F vs Man.M   0.67213     0.17213 9.8534e-03
#> Man.F vs Woman.M 0.86486     0.36486 8.9574e-11
#> Man.F vs Man.S   0.80282     0.30282 2.6659e-07
#> Man.F vs Woman.S 0.86585     0.36585 6.8138e-12
#> Man.F vs Man.V   0.84286     0.34286 4.4662e-09
#> Man.F vs Woman.V 0.66216     0.16216 7.0843e-03
#> (Exact binomial tests)

Considering the first convention, we give the Student test of proportions equality. In our example, we compare \(27.7\%\) with \(22.6\%\). The Student statistic is \(1.92\) and its p-value is \(5.6\%\) so that the difference is not significant at the \(5\%\) level. There would be no significant discrimination between the French origin candidates. However, a rapid look at the other p-values reveals that discrimination is significant at the \(5\%\) level when we compare the French origin man to all the other candidates. This confirm the choice of our reference candidate as the least susceptible of beign discriminated against.

Do this conclusions extend to the second convention? Here we compare \(61.4\%\) to \(50\%\) because, in the latter case, both candidates receive the same share of callbacks and there is no discrimination. The p-value of the exact binomial test is \(7,2\%\) so that the difference is not significant between the French origin candidates at the \(5\%\) level. However, there is always a significant discrimination against all the other candidates at the \(5\%\) level. The existence of discrimination is confirmed.