A Lightweight and Versatile NLP Toolkit


[Up] [Top]

Documentation for package ‘textpress’ version 1.1.0

Help Pages

abbreviations Common Abbreviations for Linguistic Processing
dict_generations Demo dictionary of generation-name variants for NER
dict_political Demo dictionary of political / partisan term variants for NER
fetch_urls Fetch URLs from a search engine
fetch_wiki_refs Fetch external citation URLs from Wikipedia
fetch_wiki_urls Fetch Wikipedia page URLs by search query
get_search_urls Get the search URL(s) used by fetch_urls (for debugging or browser use)
nlp_cast_tokens Convert Token List to Data Frame
nlp_index_tokens Create a BM25 Search Index
nlp_roll_chunks Roll units into fixed-size chunks with optional context
nlp_split_paragraphs Split Text into Paragraphs
nlp_split_sentences Split Text into Sentences
nlp_tokenize_text Tokenize Text Data (mostly) Non-Destructively
read_urls Read content from URLs
search_dict Exact n-gram matcher (vector of terms)
search_index Search the BM25 Index
search_regex Search corpus via regex
search_vector Vector search by cosine similarity
util_fetch_embeddings Fetch embeddings (Hugging Face utility)