---
title: "Migrating from ggmosaic to marimekko"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Migrating from ggmosaic to marimekko}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 4
)
library(ggplot2)
library(marimekko)

titanic <- as.data.frame(Titanic)
```

## Why migrate?

[ggmosaic](https://github.com/haleyjeppson/ggmosaic) was archived from CRAN
on 2025-11-10 due to uncorrected issues. Key problems included reliance on
internal ggplot2 APIs that broke on updates and a dependency on deprecated
tidyr functions.

**marimekko** is a from-scratch replacement that uses only public ggplot2
APIs, depends only on ggplot2 and rlang, and follows standard ggplot2
aesthetic conventions.

The examples below are adapted from
[Jeppson & Hofmann (2023)](https://journal.r-project.org/articles/RJ-2023-013/),
showing each ggmosaic pattern and its marimekko equivalent.

## Function mapping

| ggmosaic | marimekko | Notes |
|----------|-----------|-------|
| `geom_mosaic()` | `geom_marimekko()` | Standard `aes()`, no `product()` |
| `geom_mosaic_text()` | `geom_marimekko_text()` | |
| `theme_mosaic()` | `theme_marimekko()` | |
| `scale_x_productlist()` | *not needed* | Axis labels are automatic; use `show_percentages = TRUE` in `geom_marimekko()` |
| `scale_y_productlist()` | *not needed* | Axis labels are automatic |
| `product()` | *not needed* | Use `formula = ~ a \| b` |
| — | `geom_marimekko_label()` | New: text with background box |
| — | `fortify_marimekko()` | New: extract tile data as data frame |

## Side-by-side examples

The ggmosaic paper (Jeppson & Hofmann, 2023, *The R Journal* 14(4)) uses the
`fly` dataset from ggmosaic and the built-in `Titanic` dataset. Since ggmosaic
is no longer on CRAN, we recreate the `fly`-based examples using `Titanic` and
`HairEyeColor` — both built-in R datasets that need no extra packages.

### One dimensional — spine plot and bar chart

The paper starts with single-variable plots showing different divider types
(Figure 1 in Jeppson & Hofmann, 2023):

```r
# ggmosaic — spine plot (one variable, default divider)
ggplot(data = titanic) +
  geom_mosaic(aes(x = product(Class), fill = Class, weight = Freq))
```

```{r one-var}
# marimekko — one variable
ggplot(titanic) +
  geom_marimekko(aes(fill = Class, weight = Freq), formula = ~Class) +
  theme_marimekko() +
  labs(title = "One variable: f(Class)")
```


```r
# ggmosaic — bar chart (one variable, hbar divider)
ggplot(data = titanic) +
  geom_mosaic(
    aes(x = product(Class), fill = Class, weight = Freq),
    divider = "hbar"
  )
```

```{r one-var-bar}
# ggplot — barchart
# todo: needs percentage y scale
ggplot(titanic) +
  geom_bar(aes(Class, weight = Freq, fill = Class)) +
  theme_marimekko() +
  labs(title = "One variable: f(Class)")
```


The main difference: **no `product()` wrapper** — use `formula = ~ Var`
instead. marimekko does not have a `divider` argument; it always produces
vertical columns with horizontal stacking.

### Two dimensional — mosaic plot and stacked bar chart

The paper shows two-dimensional mosaic plots where variable order in `product()`
controls the partitioning hierarchy (Figure 2):

```r
# ggmosaic — two variables
ggplot(data = titanic) +
  geom_mosaic(aes(
    x = product(Class),
    fill = Survived,
    weight = Freq
  ))
```

```{r two-var}
# marimekko — two variables via formula
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  theme_marimekko() +
  labs(title = "Two variables: f(Survived | Class) f(Class)")
```

### Three dimensional 

The paper shows two-dimensional mosaic plots where variable order in `product()`
controls the partitioning hierarchy (Figure 3):

```r
# ggmosaic — double-decker variables
ggplot(data = titanic) +
  geom_mosaic(aes(
    x = product(Survived, Class, Sex),
    fill = Survived,
    weight = Freq,
  ), divider = ddecker())
```

```{r three-var-multi}
# marimekko — three variables via formula
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Sex + Class | Survived
  ) +
  theme_marimekko() +
  labs(title = "Three variables: f(Survived | Class, Sex)")
```


### Three variables — nested mosaic

The paper demonstrates three-variable mosaics where a third variable adds
another level of partitioning (Figure 4):

```r
# ggmosaic — three variables via product()
ggplot(data = titanic) +
  geom_mosaic(aes(
    x = product(Sex, Survived, Class),
    fill = Survived,
    weight = Freq
  ))
```

```{r three-var}
# marimekko — three variables via formula
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived | Sex
  ) +
  theme_marimekko() +
  labs(title = "Three variables: Class / Sex / Survived")
```

### Conditioning with `conds`

The paper shows how `conds` creates conditional distributions (Section 3.2):

```r
# ggmosaic — conditioning
ggplot(data = titanic) +
  geom_mosaic(aes(
    x = product(Class),
    fill = Survived,
    weight = Freq,
    conds = product(Sex)
  ))
```

marimekko does not have a `conds` aesthetic. Use `facet_wrap()` instead:

```{r conds}
# marimekko — conditioning via faceting
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  facet_wrap(~Sex) +
  theme_marimekko() +
  labs(title = "f(Survived | Class) conditioned on Sex")
```

### Faceting

The paper demonstrates `facet_grid()` as an alternative to conditioning
(Section 3.3):

```r
# ggmosaic — faceting
ggplot(data = titanic) +
  geom_mosaic(aes(
    x = product(Class), fill = Survived, weight = Freq
  )) +
  facet_grid(~Sex)
```

```{r facet}
# marimekko — faceting works the same way
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  facet_wrap(~Sex) +
  theme_marimekko() +
  labs(title = "Mosaic faceted by Sex")
```

### Divider types and `standardize`

The paper describes four fundamental partition types: `hspine`, `vspine`,
`hbar`, and `vbar` (Section 4, Figure 5). These are combined to produce
spine plots, stacked bar charts, mosaic plots, and double decker plots.

marimekko simplifies this — `geom_marimekko()` always starts with
horizontal splits and alternates direction with each `|` in the formula.
Use `coord_flip()` if you need vertical-first orientation.

For equal-width columns (spine-plot style), `standardize = TRUE` is
available in `geom_marimekko_text()` and `fortify_marimekko()`.

```r
# ggmosaic — spine plot via divider
ggplot(titanic) +
  geom_mosaic(aes(x = product(Class), fill = Survived, weight = Freq),
    divider = c("vspine", "hspine")
  )
```

```{r default-width}
# marimekko — proportional-width columns (default)
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  theme_marimekko() +
  labs(title = "Mosaic plot (proportional-width columns)")
```

### Offset / spacing

The paper shows how `offset` controls gaps between tiles (Section 5):

```r
# ggmosaic — single offset parameter
ggplot(data = titanic) +
  geom_mosaic(aes(
    x = product(Class), fill = Survived, weight = Freq
  ), offset = 0.02)
```

marimekko provides independent `gap_x` and `gap_y` parameters:

```{r gaps}
# marimekko — no gaps
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived, gap_x = 0.02, gap_y = 0.02
  ) +
  theme_marimekko() +
  labs(title = "No gaps")
```

### Text labels

The paper shows `geom_mosaic_text()` for adding labels (Section 7):

```r
# ggmosaic — automatic text labels
ggplot(titanic) +
  geom_mosaic(aes(x = product(Class), fill = Survived, weight = Freq)) +
  geom_mosaic_text(aes(x = product(Class), fill = Survived, weight = Freq))
```

```{r text}
# marimekko — automatic tile positions, only label needed
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  geom_marimekko_text(aes(
    label = after_stat(paste(Class, Survived, sep = "\n"))
  )) +
  theme_marimekko()
```

`geom_marimekko_text()` reads tile positions from the preceding
`geom_marimekko()` layer — only the `label` aesthetic is needed.
Use `after_stat()` for computed variables like `weight` (count),
`.proportion`, or the original variable columns.

### Axis labels and percentages

The paper uses `scale_x_productlist()` for axis labels:

```r
# ggmosaic
ggplot(titanic) +
  geom_mosaic(aes(x = product(Class), fill = Survived, weight = Freq)) +
  scale_x_productlist()
```

```{r axis}
# marimekko — with optional marginal percentages
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived,
    show_percentages = TRUE
  ) +
  theme_marimekko()
```

### HairEyeColor — a larger example

The paper uses multi-variable examples to show how mosaic plots reveal
associations. Here we use the built-in `HairEyeColor` dataset:

```{r hair-setup, include = FALSE}
hair <- as.data.frame(HairEyeColor)
```

```{r hair-mosaic}
ggplot(hair) +
  geom_marimekko(aes(fill = Eye, weight = Freq),
    formula = ~ Hair | Eye,
    show_percentages = TRUE
  ) +
  theme_marimekko() +
  labs(title = "Hair color vs Eye color")
```

Three-variable version with Sex:

```{r hair-multi}
ggplot(hair) +
  geom_marimekko(aes(fill = Eye, weight = Freq),
    formula = ~ Hair | Sex | Eye
  ) +
  theme_marimekko() +
  labs(title = "Hair / Sex / Eye")
```

## Key differences to be aware of

### 1. No `product()` wrapper

ggmosaic required wrapping variables in `product()`. marimekko uses a
formula-based API:

- `formula = ~ a | b` — specifies variable hierarchy (`|` alternates direction, `+` groups)
- `fill` — tile colour (defaults to last formula variable)
- `weight` — observation weights / counts

### 2. Automatic axis labels

ggmosaic handled axis labels internally. marimekko also adds axis labels
automatically via `geom_marimekko()`. Use `show_percentages = TRUE` in
`geom_marimekko()` to append marginal percentages to the x-axis labels.

### 3. No `divider` argument

ggmosaic used `divider = c("vspine", "hspine", ...)` to control partitioning
direction. marimekko encodes direction in the formula — `|` alternates
between horizontal and vertical, always starting horizontal. Use
`coord_flip()` for vertical-first orientation.

### 4. `inherit.aes` defaults to `TRUE`

ggmosaic set `inherit.aes = FALSE` by default, requiring you to repeat
aesthetics in every layer. marimekko follows ggplot2 convention —
`inherit.aes = TRUE` — so aesthetics set in `ggplot(aes(...))` are inherited.

### 5. Text labels need explicit `label` aesthetic

`geom_marimekko_text()` requires `label = after_stat(weight)` or similar.
ggmosaic's `geom_mosaic_text()` auto-generated labels.

## plotly support

marimekko plots work with `plotly::ggplotly()` out of the box — simply
pass your plot object and get an interactive version:

```{r plotly, eval=FALSE}
library(plotly)

p <- ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  )

ggplotly(p)
```

This also works with `geom_marimekko_text()`,
`geom_marimekko_label()`

## Search-and-replace checklist

For a quick mechanical migration, apply these replacements:

```
geom_mosaic(          →  geom_marimekko(
geom_mosaic_text(     →  geom_marimekko_text(
theme_mosaic(         →  theme_marimekko(
library(ggmosaic)     →  library(marimekko)
```

Then manually:

1. Remove all `product()` wrappers -- add `formula = ~ Var1 | Var2` instead
2. Remove `conds` aesthetic -- use `facet_wrap()` or a multi-variable formula
3. Remove `scale_x_productlist()` / `scale_y_productlist()` -- axis labels are now automatic
4. Remove `divider = ...` -- direction is encoded in the formula
5. Replace `offset = ...` with `gap` / `gap_x` / `gap_y`
6. Add explicit `label = after_stat(weight)` to text layers
7. Set `inherit.aes = FALSE` only where deliberately needed
8. Move `show_percentages = TRUE` from scale to `geom_marimekko(..., show_percentages = TRUE)`

## References

Jeppson, H. and Hofmann, H. (2023). Generalized Mosaic Plots in the ggplot2
Framework. *The R Journal*, 14(4), 50–73.
[doi:10.32614/RJ-2023-013](https://doi.org/10.32614/RJ-2023-013)
