Sandbox Document

Author

Jason Locklin

This example is taken from DevOps for Data Science.

Penguin Size and Mass by Sex and Species

library(palmerpenguins)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)
log <- log4r::logger()

df <- palmerpenguins::penguins
log4r::info(log, "data loaded")
INFO  [2025-01-20 18:55:09] data loaded
df %>%
  group_by(species, sex) %>%
  summarise(
    across(
      where(is.numeric), 
      \(x) mean(x, na.rm = TRUE)
      )
    ) %>%
  knitr::kable()
`summarise()` has grouped output by 'species'. You can override using the
`.groups` argument.
species sex bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year
Adelie female 37.25753 17.62192 187.7945 3368.836 2008.055
Adelie male 40.39041 19.07260 192.4110 4043.493 2008.055
Adelie NA 37.84000 18.32000 185.6000 3540.000 2007.000
Chinstrap female 46.57353 17.58824 191.7353 3527.206 2007.971
Chinstrap male 51.09412 19.25294 199.9118 3938.971 2007.971
Gentoo female 45.56379 14.23793 212.7069 4679.741 2008.069
Gentoo male 49.47377 15.71803 221.5410 5484.836 2008.066
Gentoo NA 45.62500 14.55000 215.7500 4587.500 2008.400
log4r::info(log, "Table produced")
INFO  [2025-01-20 18:55:09] Table produced

Penguin Size vs Mass by Species

df %>%
  ggplot(aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() + 
  geom_smooth(method = "lm")
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

log4r::info(log, "plot produced")
INFO  [2025-01-20 18:55:11] plot produced