
Extract and Analyze Key Terms from Research Groups
Source:R/sniff_groups_terms.R
sniff_groups_terms.Rd
Identifies and extracts key terms from titles and abstracts of publications within different research groups using natural language processing techniques, and computes term statistics including TF-IDF scores.
Usage
sniff_groups_terms(
net_groups,
algorithm = "rake",
phrase_pattern = "(A|N)*N(P+D*(A|N)*N)*",
n_cores = 1,
show_progress = TRUE,
n_terms = 15,
min_freq = 2,
digits = 4
)
Arguments
- net_groups
A list containing network data with publication information. Must include elements:
network
(with vertex attributes 'group', 'TI', 'AB'),pubs_by_year
, andaggregate
.- algorithm
Term extraction algorithm to use. Options are:
"rake" - Rapid Automatic Keyword Extraction (default)
"pointwise" - Pointwise Mutual Information
"phrase" - Phrase pattern matching
- phrase_pattern
Regular expression pattern for phrase extraction when algorithm = "phrase" (default: "(A|N)N(P+D(A|N)N)")
- n_cores
Number of CPU cores to use for parallel processing (default: 1)
- show_progress
Logical indicating whether to show progress bar (default: TRUE)
- n_terms
Number of top terms to return in summary table (default: 15)
- min_freq
Minimum frequency threshold for terms (default: 2)
- digits
Number of decimal places to round numerical values (default: 4)
Value
A list with two components:
terms_by_group
: A named list (by group) of data frames containing extracted terms with statisticsterms_table
: A summary tibble with top terms by frequency and TF-IDF for each group
Details
This function performs the following steps:
Validates input structure and parameters
Loads the UDPipe language model from package extdata
Processes text data (titles and abstracts) for each group
Applies the selected term extraction algorithm (RAKE, PMI, or phrase patterns)
Computes term frequencies and TF-IDF scores
Returns ranked terms for each research group with comprehensive statistics
The function uses UDPipe for tokenization, lemmatization and POS tagging before term extraction. For phrase extraction, the default pattern finds noun phrases.
Examples
if (FALSE) { # \dontrun{
# Assuming groups is output from sniff_groups()
terms <- sniff_groups_terms(groups, algorithm = "rake")
# View terms for first group
head(terms$terms_by_group[[1]])
# View summary table
print(terms$terms_table)
# Customized extraction
net_groups_terms <- sniff_groups_terms(net_groups,
algorithm = "phrase",
n_terms = 10,
min_freq = 3,
n_cores = 4
)
} # }