---
title: "Analyzing Health Data from POF with healthbR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Analyzing Health Data from POF with healthbR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

## Overview

The **POF (Pesquisa de Orçamentos Familiares)** is a household budget survey conducted by IBGE that investigates household expenditures, living conditions, and nutritional profiles of the Brazilian population. It is conducted in partnership with the Ministry of Health.

The `healthbR` package provides access to POF microdata with a focus on **health-related data**:

| Module | Description | Available editions |
|--------|-------------|-------------------|
| **Food Security (EBIA)** | Brazilian Food Insecurity Scale | 2017-2018 |
| **Food Consumption** | Detailed personal food intake | 2008-2009, 2017-2018 |
| **Anthropometry** | Weight, height, BMI | 2008-2009 |
| **Health Expenses** | Medications, insurance, consultations | All editions |

## Getting started

```{r setup}
library(healthbR)
library(dplyr)
```

### Check available editions

```{r}
pof_years()
#> [1] "2002-2003" "2008-2009" "2017-2018"
```

### Survey information

Use `pof_info()` to see which health modules are available for each edition:

```{r}
pof_info("2017-2018")
```

### List available registers

Each POF edition contains multiple data registers. Use `pof_registers()` to see them:

```{r}
# all registers
pof_registers("2017-2018")

# only health-related registers
pof_registers("2017-2018", health_only = TRUE)
```

### Explore variables

Before downloading data, you can browse available variables:

```{r}
# list all variables in the domicilio register
pof_variables("2017-2018", "domicilio")

# search for food security variables
pof_variables("2017-2018", search = "ebia")

# search for weight-related variables
pof_variables("2017-2018", "morador", search = "peso")
```

## Food Security Analysis (EBIA)

The **EBIA (Escala Brasileira de Insegurança Alimentar)** is available in the 2017-2018 edition through the `domicilio` register. The variable `V6199` contains the food security classification.

### Download domicilio data

```{r}
domicilio <- pof_data("2017-2018", "domicilio")
```

### EBIA classification

The EBIA classifies households into four levels:

| Code | Classification |
|------|---------------|
| 1 | Food security |
| 2 | Mild food insecurity |
| 3 | Moderate food insecurity |
| 4 | Severe food insecurity |

### Create EBIA categories

```{r}
domicilio <- domicilio |>
  mutate(
    ebia = factor(
      V6199,
      levels = 1:4,
      labels = c(
        "Food security",
        "Mild insecurity",
        "Moderate insecurity",
        "Severe insecurity"
      )
    )
  )

# frequency table
domicilio |>
  count(ebia) |>
  mutate(pct = n / sum(n) * 100)
```

### Weighted estimates with survey design

For proper population estimates, use the survey design:

```{r}
library(srvyr)

domicilio_svy <- pof_data("2017-2018", "domicilio", as_survey = TRUE)

# add EBIA categories
domicilio_svy <- domicilio_svy |>
  mutate(
    ebia = factor(
      V6199,
      levels = 1:4,
      labels = c(
        "Food security",
        "Mild insecurity",
        "Moderate insecurity",
        "Severe insecurity"
      )
    )
  )

# weighted prevalence
domicilio_svy |>
  group_by(ebia) |>
  summarize(
    prevalence = survey_mean(na.rm = TRUE, vartype = "ci"),
    n = unweighted(n())
  )
```

### EBIA by region (UF)

```{r}
# food insecurity by state
domicilio_svy |>
  group_by(UF, ebia) |>
  summarize(
    prevalence = survey_mean(na.rm = TRUE, vartype = "ci"),
    n = unweighted(n())
  ) |>
  filter(ebia == "Severe insecurity") |>
  arrange(desc(prevalence))
```

## Food Consumption Analysis

The `consumo_alimentar` register contains detailed personal food intake data from a subsample. This data is available for the 2008-2009 and 2017-2018 editions.

### Download food consumption data

```{r}
consumo <- pof_data("2017-2018", "consumo_alimentar")
```

### Key variables

| Variable | Description |
|----------|-------------|
| `V9001` | Food item code |
| `V9005` | Amount consumed |
| `V9007` | Unit of measure |
| `ENERGIA_KCAL` | Energy (kcal) |
| `PROTEINA` | Protein (g) |
| `CARBOIDRATO` | Carbohydrate (g) |
| `LIPIDIO` | Total lipids (g) |

### Average caloric intake

```{r}
# total daily caloric intake per person
consumo |>
  group_by(COD_UPA, NUM_DOM, NUM_UC, COD_INFORMANTE) |>
  summarize(
    total_kcal = sum(ENERGIA_KCAL, na.rm = TRUE),
    total_protein = sum(PROTEINA, na.rm = TRUE),
    total_carb = sum(CARBOIDRATO, na.rm = TRUE),
    total_fat = sum(LIPIDIO, na.rm = TRUE),
    .groups = "drop"
  ) |>
  summarize(
    mean_kcal = mean(total_kcal, na.rm = TRUE),
    mean_protein = mean(total_protein, na.rm = TRUE),
    mean_carb = mean(total_carb, na.rm = TRUE),
    mean_fat = mean(total_fat, na.rm = TRUE)
  )
```

## Health Expenses

The `despesa_individual` register contains individual expenses, including health-related spending such as medications, health insurance, and medical consultations.

### Download expense data

```{r}
despesas <- pof_data("2017-2018", "despesa_individual")
```

### Filter health expenses

Health-related expenses can be identified by product group codes:

```{r}
# explore expense categories
despesas |>
  count(QUADRO) |>
  arrange(desc(n))
```

## Combining registers

For many analyses you need to combine data from multiple registers. Use the household identifier variables (`COD_UPA`, `NUM_DOM`, `NUM_UC`) to merge:

```{r}
# download morador (demographic data) and domicilio (household data)
morador <- pof_data("2017-2018", "morador")
domicilio <- pof_data("2017-2018", "domicilio")

# merge: add household-level EBIA to individual-level data
morador_ebia <- morador |>
  left_join(
    domicilio |> select(COD_UPA, NUM_DOM, NUM_UC, V6199),
    by = c("COD_UPA", "NUM_DOM", "NUM_UC")
  ) |>
  mutate(
    ebia = factor(
      V6199,
      levels = 1:4,
      labels = c(
        "Food security",
        "Mild insecurity",
        "Moderate insecurity",
        "Severe insecurity"
      )
    )
  )

# food insecurity by age group
morador_ebia |>
  mutate(age_group = cut(V0403, breaks = c(0, 5, 12, 18, 30, 60, Inf))) |>
  count(age_group, ebia) |>
  group_by(age_group) |>
  mutate(pct = n / sum(n) * 100)
```

## Comparing editions

The POF has been conducted in different years, and data structure may vary. Use `pof_info()` to check what is available in each edition:

```{r}
# check health modules by edition
pof_info("2017-2018")  # EBIA + food consumption
pof_info("2008-2009")  # anthropometry + food consumption
pof_info("2002-2003")  # expenses only
```

## Cache management

POF data files are large. healthbR caches downloaded files locally so you only download once:

```{r}
# check cached files
pof_cache_status()

# clear cache if needed
pof_clear_cache()
```

If the `arrow` package is installed, data is cached in Parquet format for faster loading:

```{r}
# install arrow for optimized caching (recommended)
install.packages("arrow")
```

## Additional resources

- POF official page (`www.ibge.gov.br/estatisticas/sociais/saude/24786-pesquisa-de-orcamentos-familiares-2`)
- POF 2017-2018 Food Security publication (`biblioteca.ibge.gov.br`)
- POF 2017-2018 Food Consumption publication (`biblioteca.ibge.gov.br`)
- [srvyr package documentation](https://cran.r-project.org/package=srvyr)
