textTinyR: Text Processing for Small or Big Data Files

It offers functions for splitting, parsing, tokenizing and creating a vocabulary for big text data files. Moreover, it includes functions for building a document-term matrix and extracting information from those (term-associations, most frequent terms). It also embodies functions for calculating token statistics (collocations, look-up tables, string dissimilarities) and functions to work with sparse matrices. Lastly, it includes functions for Word Vector Representations (i.e. 'GloVe', 'fasttext') and incorporates functions for the calculation of (pairwise) text document dissimilarities. The source code is based on 'C++11' and exported in R through the 'Rcpp', 'RcppArmadillo' and 'BH' packages.

Version: 1.1.8
Depends: R (≥ 3.2.3), Matrix
Imports: Rcpp (≥ 0.12.10), R6, data.table, utils
LinkingTo: Rcpp, RcppArmadillo (≥ 0.7.8), BH
Suggests: testthat, covr, knitr, rmarkdown
Published: 2023-12-04
Author: Lampros Mouselimis ORCID iD [aut, cre]
Maintainer: Lampros Mouselimis <mouselimislampros at gmail.com>
BugReports: https://github.com/mlampros/textTinyR/issues
License: GPL-3
Copyright: inst/COPYRIGHTS
textTinyR copyright details
URL: https://github.com/mlampros/textTinyR
NeedsCompilation: yes
SystemRequirements: libarmadillo: apt-get install -y libarmadillo-dev (deb)
Citation: textTinyR citation info
Materials: README NEWS
CRAN checks: textTinyR results

Documentation:

Reference manual: textTinyR.pdf
Vignettes: Functionality of the textTinyR package
Word vectors - doc2vec - text clustering

Downloads:

Package source: textTinyR_1.1.8.tar.gz
Windows binaries: r-devel: textTinyR_1.1.8.zip, r-release: textTinyR_1.1.8.zip, r-oldrel: textTinyR_1.1.8.zip
macOS binaries: r-release (arm64): textTinyR_1.1.8.tgz, r-oldrel (arm64): textTinyR_1.1.8.tgz, r-release (x86_64): textTinyR_1.1.8.tgz
Old sources: textTinyR archive

Reverse dependencies:

Reverse imports: tsrobprep

Linking:

Please use the canonical form https://CRAN.R-project.org/package=textTinyR to link to this page.