German Zweites Juristisches Staatsexamen (2nd State Exam in Laws) is said to be tough. Let’s have a look at how hard it really is by visualising the distribution of grades from the Berlin 2017/IV campaign. The written part of the final exam consists of 7 handwritten 5-hour length cases.

Notice that you can score 0-18 points, where a final score of 8 allows you to become a judge and 10 means outstanding…

1 2017/IV Campaign

1.1 Import the Data

Let’s fetch the date from the official page of the Berlin Senate. You’ll get a PDF which you have to destill with the tabulizer package (or by hand) in order to get a CSV. I will follow up with a post on using tabulizer from within RStudio anytime soon.

library(tidyverse)
# Datenquelle: https://www.berlin.de/sen/justiz/juristenausbildung/juristische-pruefungen/artikel.264039.php
data_path <- here::here("static", "data", "/")
noten_raw <- read_csv2(str_c(data_path, "noten_201704.csv")) # nach PDF -> Tabulizer
head(noten_raw) %>% knitr::kable("html", 2)
AZ Z_I Z_II S_I S_II OR_I OR_II WPF Dur
1014/17 8 10 11 15 10 14 9 11.00
1039/17 8 11 13 7 12 11 13 10.71
1055/17 11 11 10 16 6 8 13 10.71
1097/17 15 6 7 16 6 8 13 10.14
0983/17 10 7 8 11 13 10 11 10.00
1008/17 6 14 9 10 11 7 13 10.00

1.2 Skim / Preview the Data

The skimr-Package - among others - is great for quickly inspecting what kind of data (variables, data type, NAs etc.) you get.

skimr::skim_to_wide(noten_raw) %>% knitr::kable("html", 2)
type variable missing complete n min max empty n_unique mean sd p0 p25 p50 p75 p100 hist
character AZ 0 263 263 7 7 0 263 NA NA NA NA NA NA NA NA
numeric Dur 0 263 263 NA NA NA NA 5.83 2.01 1.42 4.28 5.57 7.28 11 ▁▅▇▆▆▅▂▁
numeric OR_I 0 263 263 NA NA NA NA 5.92 2.77 2 4 5 8 14 ▆▇▃▅▂▂▁▁
numeric OR_II 0 263 263 NA NA NA NA 5.57 2.57 1 4 5 7 14 ▃▇▅▇▅▁▁▁
numeric S_I 0 263 263 NA NA NA NA 5.9 2.91 1 3 6 8 15 ▂▇▆▅▂▃▁▁
numeric S_II 0 263 263 NA NA NA NA 5.58 2.95 0 3 5 7 16 ▃▇▆▅▂▁▁▁
numeric WPF 0 263 263 NA NA NA NA 5.95 2.77 1 4 6 8 14 ▂▇▃▆▅▁▁▁
numeric Z_I 0 263 263 NA NA NA NA 6.18 2.87 1 4 6 8 15 ▃▆▇▆▂▂▁▁
numeric Z_II 0 263 263 NA NA NA NA 5.72 2.77 1 4 5 7 14 ▃▇▅▆▃▁▂▁

(In RStudio / R Markdown the hist column is rendered properly. You get a nice histogram per (numeric) variable. There seems to be an issue with Knitr & UTF-8 encoding on MS Windows systems.)

(Screenshot of skimr from RStudio)

(Screenshot of skimr from RStudio)

1.3 Long -> Short with gather()

Now we need to tidy the data. We first drop the AZ column (Student id) and then “pivot” all the exam subjects into a single column (= variable) named “Fach” (GER for subject).

median_2017 <- median(noten_raw$Dur, na.rm = TRUE)
noten_raw %>%
  select(-AZ) %>% 
  gather(key = Fach, value = noten_raw[,2:8]) %>% 
  rename(Punkte = `noten_raw[, 2:8]`) -> noten_long

head(noten_long) %>% knitr::kable("html", 2)
Fach Punkte
Z_I 8
Z_I 8
Z_I 11
Z_I 15
Z_I 10
Z_I 6

1.4 2017/IV Joyplot

1.4.1 ggridges-Pkg + colors

We load the ggridges Pkg and the beautiful Viridis color palette

library(ggridges)
library(viridis)

1.4.2 Labels

# Beschriftungen
title_a <- c("2. Juristisches Staatsexamen, GJPA Berlin/Brandenburg")
subtitle_a = paste0("Notenverteilung Kampagne 4/17; n = ",nrow(noten_raw),
                    "; \"Dur\" = durchschnittl. Examensnote\r\nLinien: rot  = \"bestanden (ausreichend)\", blau  = \"vollbefriedigend\",\r\nschwarz  = Median Gesamtnote (",median_2017,")")
caption_a = c("@fubits; Daten: GJPA 2018")
# Plot
noten_long %>% 
  ggplot() +
  geom_density_ridges(aes(x = Punkte, y = Fach, fill = Fach),
                      rel_min_height = 0.025,
                      scale = 1.75) +
  # Linie: Vollbefriedigend
  geom_vline(xintercept = 10, color = "blue", linetype = 4, size = 1) + 
  # Linie: Bestanden
  geom_vline(xintercept = 4, color = "red", linetype = 4, size = 1) +
  # Linie: Median Gesamtnote
  geom_vline(xintercept = median_2017, color = "black", size = 1) +
  labs(title = title_a, subtitle = subtitle_a, caption = caption_a) +
  scale_x_continuous(breaks = c(0:18), limits = c(0,18)) +
  scale_y_discrete(expand = c(0.01,0.0)) +
  scale_fill_viridis(option = "D", name = "Frequency n",
                     direction = -1, discrete = TRUE) +
  # theme(legend.position = "none")
  theme_minimal() +
  guides(fill = FALSE)

1.5 2017/IV Boxplot

(Dur = overall result / final grade)

noten_long %>% 
ggplot() +
  geom_boxplot(aes(x = Fach, y = Punkte, fill = Fach)) +
  scale_y_continuous(breaks = c(0:18), limits = c(0, 18)) +
  scale_fill_viridis(
    option = "C",
    direction = -1, discrete = TRUE
  ) +
  labs(title = title_a, subtitle = subtitle_a, caption = caption_a) +
  # theme(legend.position = "none")
  theme_minimal() +
  guides(fill = FALSE) +
  # Linien zur Orientierung
  geom_hline(yintercept = 10, color = "blue", linetype = 4, size = 1) +
  geom_hline(yintercept = 4, color = "red", linetype = 4, size = 1) +
  geom_hline(yintercept = median_2017, color = "black", size = 1)

2 Update: 2018/I Campaign

Grades from the 2018/01 campaign just have been released. Let’s plot them for comparison:

# Datenquelle: https://www.berlin.de/sen/justiz/juristenausbildung/juristische-pruefungen/artikel.264039.php
noten_raw_2018 <- read_csv2(str_c(data_path, "noten_201801.csv")) # nach PDF -> Tabulizer
head(noten_raw_2018) %>% knitr::kable("html", 2)
AZ Z_I Z_II S_I S_II ÖR_I ÖR_II WPF Dur
0874/17 7 7 7 4 5 6 10 6.57
0959/17 3 6 2 4 2 4 2 3.28
0968/17 1 3 4 2 3 8 6 3.85
1001/17 3 3 4 2 2 2 2 2.57
1012/17 3 4 2 2 3 2 5 3.00
1058/17 6 10 6 5 7 5 7 6.57
skimr::skim_to_wide(noten_raw_2018) %>% knitr::kable("html", 2)
type variable missing complete n min max empty n_unique mean sd p0 p25 p50 p75 p100 hist
character AZ 0 277 277 7 7 0 277 NA NA NA NA NA NA NA NA
numeric Dur 1 276 277 NA NA NA NA 6.01 1.9 2.42 4.57 5.85 7.28 12.14 ▃▇▇▇▅▂▁▁
numeric ÖR_I 0 277 277 NA NA NA NA 6.3 2.93 2 4 6 8 16 ▆▇▇▆▂▂▁▁
numeric ÖR_II 0 277 277 NA NA NA NA 5.97 2.33 1 4 6 8 13 ▂▆▆▇▃▃▁▁
numeric S_I 1 276 277 NA NA NA NA 5.81 2.49 1 4 6 7 14 ▂▆▃▇▃▁▁▁
numeric S_II 0 277 277 NA NA NA NA 6.12 2.82 1 4 6 8 14 ▂▇▅▇▆▂▁▁
numeric WPF 0 277 277 NA NA NA NA 6 2.91 1 4 6 8 16 ▃▇▇▇▅▁▁▁
numeric Z_I 0 277 277 NA NA NA NA 5.54 2.71 1 3 5 7 14 ▃▇▃▆▃▁▁▁
numeric Z_II 0 277 277 NA NA NA NA 6.32 3.15 0 4 6 8 14 ▁▆▇▇▃▃▃▁
median_2018 <- median(noten_raw_2018$Dur, na.rm = TRUE)
noten_raw_2018 %>%
  select(-AZ) %>% 
  gather(key = Fach, value = noten_raw_2018[,2:8]) %>% 
  rename(Punkte = `noten_raw_2018[, 2:8]`) -> noten_long_2018

head(noten_long_2018, 1) %>% knitr::kable("html", 2)
Fach Punkte
Z_I 7
# Beschriftungen
title_a <- c("2. Juristisches Staatsexamen, GJPA Berlin/Brandenburg")
subtitle_a = paste0("Notenverteilung Kampagne 1/18; n = ",nrow(noten_raw_2018),
                    "; \"Dur\" = durchschnittl. Examensnote\r\nLinien: rot  = \"bestanden (ausreichend)\", blau  = \"vollbefriedigend\",\r\nschwarz  = Median Gesamtnote (",median_2018,")")
caption_a = c("@fubits; Daten: GJPA 2018")

2.1 2018/I Joyplot

(Dur = overall result / final grade)

# Plot
noten_long_2018 %>% 
  ggplot() +
  geom_density_ridges(aes(x = Punkte, y = Fach, fill = Fach),
                      rel_min_height = 0.025,
                      scale = 1.75) +
  # Linie: Vollbefriedigend
  geom_vline(xintercept = 10, color = "blue", linetype = 4, size = 1) + 
  # Linie: Bestanden
  geom_vline(xintercept = 4, color = "red", linetype = 4, size = 1) +
  # Linie: Median Gesamtnote
  geom_vline(xintercept = median_2018, color = "black", size = 1) +
  labs(title = title_a, subtitle = subtitle_a, caption = caption_a) +
  scale_x_continuous(breaks = c(0:18), limits = c(0,18)) +
  scale_y_discrete(expand = c(0.01,0.0)) +
  scale_fill_viridis(option = "D", name = "Frequency n",
                     direction = -1, discrete = TRUE) +
  # theme(legend.position = "none")
  theme_minimal() +
  guides(fill = FALSE)
## Warning: Removed 2 rows containing non-finite values (stat_density_ridges).

2.2 2018/I Boxplot

noten_long_2018 %>% 
ggplot() +
  geom_boxplot(aes(x = Fach, y = Punkte, fill = Fach)) +
  scale_y_continuous(breaks = c(0:18), limits = c(0, 18)) +
  scale_fill_viridis(
    option = "C",
    direction = -1, discrete = TRUE
  ) +
  labs(title = title_a, subtitle = subtitle_a, caption = caption_a) +
  # theme(legend.position = "none")
  theme_minimal() +
  guides(fill = FALSE) +
  # Linien zur Orientierung
  geom_hline(yintercept = 10, color = "blue", linetype = 4, size = 1) +
  geom_hline(yintercept = 4, color = "red", linetype = 4, size = 1) +
  geom_hline(yintercept = median_2018, color = "black", size = 1)
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).