This example is taken from DevOps for Data Science.
Penguin Size and Mass by Sex and Species
library(palmerpenguins)
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2)
log <- log4r::logger()
df <- palmerpenguins::penguins
log4r::info(log, "data loaded")
INFO [2025-01-20 18:55:09] data loaded
df %>%
group_by(species, sex) %>%
summarise(
across(
where(is.numeric),
\(x) mean(x, na.rm = TRUE)
)
) %>%
knitr::kable()
`summarise()` has grouped output by 'species'. You can override using the
`.groups` argument.
Adelie |
female |
37.25753 |
17.62192 |
187.7945 |
3368.836 |
2008.055 |
Adelie |
male |
40.39041 |
19.07260 |
192.4110 |
4043.493 |
2008.055 |
Adelie |
NA |
37.84000 |
18.32000 |
185.6000 |
3540.000 |
2007.000 |
Chinstrap |
female |
46.57353 |
17.58824 |
191.7353 |
3527.206 |
2007.971 |
Chinstrap |
male |
51.09412 |
19.25294 |
199.9118 |
3938.971 |
2007.971 |
Gentoo |
female |
45.56379 |
14.23793 |
212.7069 |
4679.741 |
2008.069 |
Gentoo |
male |
49.47377 |
15.71803 |
221.5410 |
5484.836 |
2008.066 |
Gentoo |
NA |
45.62500 |
14.55000 |
215.7500 |
4587.500 |
2008.400 |
log4r::info(log, "Table produced")
INFO [2025-01-20 18:55:09] Table produced
Penguin Size vs Mass by Species
df %>%
ggplot(aes(x = bill_length_mm, y = body_mass_g, color = species)) +
geom_point() +
geom_smooth(method = "lm")
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
log4r::info(log, "plot produced")
INFO [2025-01-20 18:55:11] plot produced