%\VignetteIndexEntry{Introduction to the ScottKnott Package (PDF)}
%\VignetteEngine{knitr::knitr}
%\VignetteEncoding{UTF-8}

\documentclass[11pt,a4paper]{article}

\usepackage[english]{babel}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[sc]{mathpazo}
\usepackage{microtype}
\usepackage{geometry}
\usepackage{titlesec}
\usepackage{xcolor}

\geometry{
  a4paper,
  tmargin=2.5cm,
  bmargin=2.5cm,
  lmargin=2.5cm,
  rmargin=2.5cm
}

%% Colour palette
\definecolor{accent}{RGB}{26, 82, 118}
\definecolor{rulegray}{RGB}{180, 180, 180}
\definecolor{titlemuted}{RGB}{90, 90, 90}

\usepackage[
  colorlinks=true,
  linkcolor=accent,
  citecolor=accent,
  urlcolor=accent
]{hyperref}
\hypersetup{
  pdftitle  = {Introduction to the ScottKnott Package},
  pdfauthor = {Faria, J. C.; Jelihovschi, E. G.; Allaman, I. B.}
}

\usepackage{url}
\usepackage{float}
\usepackage{parskip}
\usepackage{booktabs}

\setlength{\parindent}{0pt}
\setlength{\parskip}{0.65em}

\titleformat{\section}
  {\normalfont\Large\bfseries\color{accent}}
  {\thesection}{0.75em}{}
\titlespacing*{\section}{0pt}{2.25ex plus .5ex minus .2ex}{1.25ex plus .2ex}

\titleformat{\subsection}
  {\normalfont\large\bfseries\color{titlemuted}}
  {\thesubsection}{0.65em}{}

\newcommand{\HRule}{%
  \leavevmode\leaders\hrule height 0.6pt\hfill\kern0pt\relax}
\newcommand{\titleaccentline}{%
  {\color{accent}\rule{\linewidth}{1.4pt}}}

% ============================================================
\begin{document}

\begin{titlepage}
  \centering
  \vspace*{1.2cm}

  {\color{rulegray}\HRule}
  \vspace{0.55cm}

  {\Huge\bfseries\color{accent}%
    Introduction to the\\[0.3em]
    \texttt{ScottKnott} Package}

  \vspace{0.35cm}
  {\large\color{titlemuted}%
    Multiple Comparisons Using the Scott \& Knott Algorithm}

  \vspace{0.55cm}
  {\color{rulegray}\HRule}

  \vspace{2.4cm}

  \begin{minipage}[t]{0.48\textwidth}
    \raggedright
    \textbf{\color{titlemuted}Authors}\\[0.35em]
    J.~C.\ \textsc{Faria}\\
    E.~G.\ \textsc{Jelihovschi}\\
    I.~B.\ \textsc{Allaman}
  \end{minipage}%
  \hfill
  \begin{minipage}[t]{0.48\textwidth}
    \raggedleft
    \textbf{\color{titlemuted}Institution}\\[0.35em]
    Universidade Estadual\\
    de Santa Cruz --- UESC\\
    Ilh\'{e}us, Bahia, Brasil
  \end{minipage}

  \vfill

  \titleaccentline
  \vspace{0.6cm}
  {\large\today}
\end{titlepage}

% knitr global options
<<setup, include=FALSE>>=
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = '#>',
  fig.width = 6,
  fig.height = 4,
  fig.align = 'center'
)
@

\vspace{1.25cm}
\tableofcontents
\newpage

% ============================================================
\section*{Overview}
\addcontentsline{toc}{section}{Overview}

The \textbf{ScottKnott} package implements the Scott \& Knott (1974) clustering
algorithm as a multiple comparison method in the context of Analysis of Variance
(ANOVA). Unlike classic procedures such as Tukey, Duncan, and Newman-Keuls, the
Scott \& Knott method forms \textbf{non-overlapping groups} of treatment means:
each mean belongs to exactly one group, eliminating the ambiguity that arises
when groups share members.

The algorithm proceeds by sorting the observed treatment means in decreasing
order and then recursively partitioning them into two sub-groups, applying a
likelihood-ratio test at each split. The process stops when no further
significant partition is found. The result is a complete, disjoint labelling of
the treatment means that is easy to interpret regardless of the number of
treatments.

<<library>>=
library(ScottKnott)
@

% ============================================================
\section{Quick Start --- Completely Randomized Design (CRD)}

\texttt{CRD1} contains simulated data for a balanced CRD with \textbf{4
treatment levels} and \textbf{6 replicates} per treatment. The main function
\texttt{SK()} accepts a model formula, an \texttt{aov} object, or an
\texttt{lm} object. The \texttt{which} argument names the factor to be
compared.

<<crd-quick>>=
data(CRD1)

sk1 <- with(CRD1,
            SK(y ~ x,
               data = dfm,
               which = 'x'))
summary(sk1)
@

The summary shows, for each level, the mean and the group letter assigned by the
algorithm. Levels sharing the same letter do not differ significantly at the
default 5\,\% level.

A single call to \texttt{plot()} produces the canonical dot plot with group
letters displayed above each point:

<<crd-plot, fig.cap="CRD1: treatment means with min-max dispersion bars and SK groups.">>=
plot(sk1,
     dispersion = 'mm',
     d.col = 'steelblue')
@

% ============================================================
\section{Accepted Input Classes}

\texttt{SK()} dispatches on the class of its first argument. The same grouping
can be obtained from a \texttt{formula}, an \texttt{aov} object, or an
\texttt{lm} object.

<<input-classes>>=
## From: aov
av1 <- with(CRD1, aov(y ~ x, data = dfm))
sk2 <- SK(av1, which = 'x')
summary(sk2)

## From: lm
lm1 <- with(CRD1, lm(y ~ x, data = dfm))
sk3 <- SK(lm1, which = 'x')
summary(sk3)
@

% ============================================================
\section{Unbalanced Data}

When observations are missing, \texttt{SK()} automatically adjusts the means
using the Least-Squares Means methodology (via the \textbf{emmeans} package).
The analysis proceeds identically to the balanced case.

<<crd-unbalanced>>=
## Remove the first observation to create an unbalanced dataset
u_sk1 <- with(CRD1,
              SK(y ~ x,
                 data = dfm[-1, ],
                 which = 'x'))
summary(u_sk1)
@

The number of replicates shown at the bottom of the plot reflects the actual
(unequal) sample sizes:

<<plot-unbal, fig.cap="CRD1 (unbalanced): adjusted means with SD bars.">>=
plot(u_sk1, dispersion = 'sd', d.col = 'tomato')
@

% ============================================================
\section{Randomized Complete Block Design (RCBD)}

\texttt{RCBD} contains simulated data for a design with \textbf{5 treatment
levels} and \textbf{4 blocks}. The blocking factor \texttt{blk} is included in
the formula; \texttt{which} selects the factor of interest for the comparison.

<<rcbd>>=
data(RCBD)

sk4 <- with(RCBD,
            SK(y ~ blk + tra,
               data = dfm,
               which = 'tra'))
summary(sk4)
@

<<rcbd-plot, fig.cap="RCBD: treatment means with individual CI bars.">>=
plot(sk4,
     dispersion = 'ci',
     d.col = 'darkgreen',
     d.lty = 2)
@

% ============================================================
\section{Significance Level}

The default significance level is \texttt{sig.level = 0.05}. Stricter or looser
levels lead to fewer or more groups, respectively.

<<sig-level>>=
## alpha = 0.01 (stricter)
sk_01 <- with(RCBD,
              SK(y ~ blk + tra,
                 data = dfm,
                 which = 'tra',
                 sig.level = 0.01))

## alpha = 0.10 (looser)
sk_10 <- with(RCBD,
              SK(y ~ blk + tra,
                 data = dfm,
                 which = 'tra',
                 sig.level = 0.10))

cat('--- sig.level = 0.01 ---\n')
summary(sk_01)

cat('--- sig.level = 0.10 ---\n')
summary(sk_10)
@

% ============================================================
\section{Factorial Experiment (FE)}

\texttt{FE} contains simulated data for a \textbf{3-factor factorial} design
(N, P, K), each at 2 levels, in 4 blocks. \texttt{SK()} supports both
main-effect and nested comparisons using colon notation in \texttt{which} and
the \texttt{fl1} / \texttt{fl2} arguments to select the level of the nesting
factor.

<<fe-main>>=
data(FE)

## Main effect: factor N
sk5 <- with(FE,
            SK(y ~ blk + N*P*K,
               data = dfm,
               which = 'N'))
summary(sk5)
@

<<fe-nested>>=
## Nested: levels of N within level 1 of P
sk6 <- with(FE,
            SK(y ~ blk + N*P*K,
               data = dfm,
               which = 'P:N',
               fl1 = 1))
summary(sk6)

## Nested: levels of N within level 2 of P
sk7 <- with(FE,
            SK(y ~ blk + N*P*K,
               data = dfm,
               which = 'P:N',
               fl1 = 2))
summary(sk7)
@

% ============================================================
\section{Split-Plot Experiment (SPE)}

\texttt{SPE} contains simulated data for a design with \textbf{3 whole plots}
(factor P) and \textbf{4 sub-plot treatments} (factor SP). When testing the
whole-plot factor, the appropriate error term must be specified via the
\texttt{error} argument.

<<spe>>=
data(SPE)

## Sub-plot factor SP (residual error, default)
sk8 <- with(SPE,
            SK(y ~ blk + P*SP + Error(blk/P),
               data = dfm,
               which = 'SP'))
summary(sk8)

## Whole-plot factor P (must specify the blk:P error term)
sk9 <- with(SPE,
            SK(y ~ blk + P*SP + Error(blk/P),
               data = dfm,
               which = 'P',
               error = 'blk:P'))
summary(sk9)
@

% ============================================================
\section{Visualisation Options}

\subsection{Dispersion bars}

Four dispersion options are available for \texttt{plot.SK()}, as summarised in
Table~\ref{tab:disp}.

\begin{table}[H]
\centering
\caption{Dispersion options available for \texttt{plot.SK()}.}
\label{tab:disp}
\begin{tabular}{ll}
\toprule
Option           & Description \\
\midrule
\texttt{'mm'}  & Min-max range (default) \\
\texttt{'sd'}  & $\pm 1$ standard deviation \\
\texttt{'ci'}  & Individual 95\,\% confidence interval \\
\texttt{'cip'} & Pooled 95\,\% confidence interval (uses MSE) \\
\bottomrule
\end{tabular}
\end{table}

\texttt{CRD2} provides a more visually rich example with \textbf{45 treatment
levels}:

<<plot-crd2, fig.width=8, fig.height=5, fig.cap="CRD2: 45 treatment means with pooled CI bars.">>=
data(CRD2)

sk10 <- with(CRD2,
             SK(y ~ x,
                data = dfm,
                which = 'x'))

col=c(rep(2, 6),
      rep(3, 36),
      rep(4, 1),
      rep(5, 2))

plot(sk10,
     dispersion='cip',
     yl=FALSE,
     id.las=2,
     col=col,
     d.col=col)
@

\subsection{Comparing all four options}

<<plot-four, fig.width=8, fig.height=7, fig.cap="The four dispersion options applied to CRD1. (A) mm: min-max range; (B) sd: standard deviation; (C) ci: individual confidence interval; (D) cip: pooled confidence interval.">>=
op <- par(mfrow = c(2, 2), mar = c(4, 3, 4, 1))

plot(sk1, dispersion = 'mm',  d.col = 'steelblue')
mtext('(A)', side = 3, adj = 0, line = 2, font = 2)

plot(sk1, dispersion = 'sd',  d.col = 'tomato')
mtext('(B)', side = 3, adj = 0, line = 2, font = 2)

plot(sk1, dispersion = 'ci',  d.col = 'darkgreen')
mtext('(C)', side = 3, adj = 0, line = 2, font = 2)

plot(sk1, dispersion = 'cip', d.col = 'purple')
mtext('(D)', side = 3, adj = 0, line = 2, font = 2)

par(op)
@

\subsection{Boxplot}

\texttt{boxplot.SK()} extends the standard boxplot by overlaying the SK group
letters above the frame and drawing the treatment mean inside each box.

<<boxplot, fig.cap="CRD1: boxplot with SK group labels and means (red line).">>=
## boxplot.SK re-evaluates the data argument from the original call;
## pass CRD1$dfm directly so it is findable in any environment.
sk1_bp <- SK(y ~ x,
             data = CRD1$dfm,
             which = 'x')

boxplot(sk1_bp,
        mean.col = 'red',
        mean.lwd = 2,
        args.legend = list(x = 'topright'))
@

% ============================================================
\section{Tabular Output}

\texttt{xtable()} converts an \texttt{SK} result to an \texttt{xtable} object
for inclusion in \LaTeX{} or HTML documents. Table~\ref{tab:rcbd} shows the
Scott \& Knott grouping for the RCBD example.

<<xtable, results='asis'>>=
library(xtable)

tb <- xtable(sk4,
             caption = 'RCBD: Scott \\& Knott grouping of treatment means.',
             label = 'tab:rcbd',
             digits = 3)
print(tb,
      type = 'latex',
      caption.placement = 'top',
      include.rownames = FALSE,
      booktabs = TRUE,
      table.placement = 'H')
@

% ============================================================
\section{Mixed Models with lme4}

\texttt{SK()} also accepts \texttt{lmerMod} objects from the \textbf{lme4}
package, useful when random effects need to be modelled explicitly.

<<lmer, eval=requireNamespace('lme4', quietly=TRUE)>>=
library(lme4)

data(RCBD)

lmer1 <- with(RCBD,
              lmer(y ~ (1|blk) + tra,
                   data = dfm))

sk11 <- SK(lmer1, which = 'tra')
summary(sk11)
@

% ============================================================
\section*{References}
\addcontentsline{toc}{section}{References}

\begin{thebibliography}{9}

\bibitem{scottknott1974}
Scott, R. J. and Knott, M. (1974).
A cluster analysis method for grouping means in the analysis of variance.
\textit{Biometrics}, \textbf{30}, 507--512.

\bibitem{jelihovschi2014}
Jelihovschi, E. G., Faria, J. C., and Allaman, I. B. (2014).
ScottKnott: A package for performing the Scott-Knott clustering algorithm in R.
\textit{Trends in Applied and Computational Mathematics}, \textbf{15}(1), 3--17.

\bibitem{conrado2017}
Conrado, T. V., Ferreira, D. F., Scapim, C. A., and Maluf, W. R. (2017).
Adjusting the Scott-Knott cluster analyses for unbalanced designs.
\textit{Crop Breeding and Applied Biotechnology}, \textbf{17}(1), 1--9.
\href{https://doi.org/10.1590/1984-70332017v17n1a1}{doi:10.1590/1984-70332017v17n1a1}

\end{thebibliography}

\end{document}
