---
title: "Investigating Inmate Incidents"
output: rmarkdown::html_vignette
author: "Isley Jean-Pierre"
vignette: >
  %\VignetteIndexEntry{Investigating Inmate Incidents}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(nycOpenData)
library(ggplot2)
library(dplyr)
```

## Introduction

This dataset contains information regarding the [Inmate Incidents - Slashing and Stabbing](https://data.cityofnewyork.us/Public-Safety/Inmate-Incidents-Slashing-and-Stabbing/gakf-suji/about_data) dataset on the NYC Open Data portal. Each row provides the incident ID, the date the incident was reported, the incident type, and the facility where it occurred. A researcher might want to use this dataset to investigate the kind of crimes inmates commit while they are incarcerated.

The `nycOpenData` package provides a streamlined interface for accessing New York City's vast open data resources. It connects directly to the NYC Open Data Portal. It is currently utilized as a primary tool for teaching data acquisition in [Reproducible Research Using R](https://martinezc1-reproducible-research-using-r.share.connect.posit.cloud/), helping students bridge the gap between raw city APIs and tidy data analysis.

## Let's take a look at the dataset.

To start, let's pull a small sample to see what the data looks like. By default, the function pulls in the *10,000 most recent* requests, however, let's change that to only see the latest 3 requests. To do this, we can set `limit = 3`.

```{r small-sample}
small_sample <- nyc_pull_dataset(dataset = "gakf-suji", limit = 3)
small_sample

# Seeing what columns are in the dataset
names(small_sample)
```

We can filter based on any of the columns in the dataset. To filter, we add `filters = list()` and put whatever filters we would like inside. From our `names()` call before, we know that there is a column called "incident_type" which we can use to accomplish this.

```{r filter-incident}

incident_slash_stab <- nyc_pull_dataset("gakf-suji", limit = 3, filters = list(incident_type = "Stabbing"))
head(incident_slash_stab)

# Checking to see the filtering worked
incident_slash_stab |>
  distinct(incident_type)
```

## Further investigation.

This section was meant to look into the most recurring incident types among inmates and take note of their severity. But for future inquiries:

```{r slashing-stabbing}
# Creating the datasets
slash <- nyc_pull_dataset("gakf-suji", limit = 50, filters = list(facility = "AMKC", incident_type = "Slashing"))

stab <- nyc_pull_dataset("gakf-suji", limit = 50, filters = list(facility = "AMKC", incident_type = "Stabbing"))

# Calling head of our new dataset
slash |>
  slice_head(n = 6)

stab |>
  slice_head(n = 6)

# Quick check to make sure our filtering worked
slash |>
  summarize(rows = n())

stab |>
  summarize(rows = n())
```

This code should allow us to see how slashing and stabbing incidents vary by facilities.

## Mini analysis

As an example of how this dataset can be used for exploratory analysis, the code below groups incidents by facility and incident type, then visualizes the resulting counts. This approach offers a straightforward way to compare patterns of violence across locations.

```{r fig.cap="This figure shows incident types by facility."}
data <- nyc_pull_dataset("gakf-suji", limit = 100) |>
  filter(incident_type %in% c("Slashing", "Stabbing")) |>
  count(incident_type, name = "count")

ggplot(data, aes(x = incident_type, y = count)) +
  geom_col(position = "dodge") +
  theme_minimal() +
  labs(
    title = "Slashing vs Stabbing Incidents by Facility",
    x = "Incident Type",
    y = "Number of Incidents",
    fill = "Facility"
  )
```

## Summary

The `nycOpenData` package serves as a robust interface for the NYC Open Data portal, streamlining the path from raw city APIs to actionable insights. By abstracting the complexities of data acquisition—such as pagination, type-casting, and complex filtering—it allows users to focus on analysis rather than data engineering.

## How to Cite

If you use this package for research or educational purposes, please cite it as follows:

Martinez C (2026). nycOpenData: Convenient Access to NYC Open Data API Endpoints. R package version 0.1.6, <https://martinezc1.github.io/nycOpenData/>.
