Calculate Entropy Based on Keywords Over Time

Computes the entropy of keyword distributions from scientific publications over a specified time range. Entropy measures the diversity and uniformity of keyword usage within research groups or the entire network.

Usage

sniff_entropy(network, scope = "groups", start_year = NULL, end_year = NULL)

Arguments

network: A network object to analyze. For scope = "groups", this should be the output of sniff_groups(). For scope = "network", this should be a tbl_graph or igraph object from sniff_network().
scope: Character specifying the analysis scope: "groups" for multiple groups or "network" for the entire network (default: "groups").
start_year: Starting year for entropy calculation. If NULL, uses the minimum publication year found in the network data.
end_year: Ending year for entropy calculation. If NULL, uses the maximum publication year found in the network data.

Value

A list with three components:

data: A tibble containing entropy values for each group and year
plots: A list of plotly objects visualizing entropy trends for each group
years_range: A vector with the start_year and end_year used in calculations

Details

The function calculates normalized entropy based on Shannon's information theory (Shannon, 1948). Entropy quantifies the average level of uncertainty in keyword distributions and measures the randomness or disorder within research groups.

The normalized entropy is a scale-independent measure of uncertainty that can be used to compare the uncertainty of different groups. It is calculated using the formula: $$H = -\frac{\sum_{i=1}^{n} p_i \log_2 p_i}{n}$$ where $p_i$ is the probability of keyword $i$ and $n$ is the number of unique keywords.

Entropy values range from 0 to 1, where:

0 indicates minimal diversity (few dominant keywords, low uncertainty)
1 indicates maximal diversity (uniform keyword distribution, high uncertainty)

References

Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379-423. doi: https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

The concept of entropy measures the randomness or disorder in the group and provides insights into the diversity of research topics and thematic concentration within scientific communities.

Examples

if (FALSE) { # \dontrun{
# Calculate entropy for groups from sniff_groups() output
groups_data <- sniff_groups(your_network_data)
entropy_results <- sniff_entropy(groups_data, scope = "groups")

# Calculate entropy for entire network from sniff_network() output
network_data <- sniff_network(your_network_data)
entropy_results <- sniff_entropy(network_data, scope = "network")

# Specify custom year range
entropy_results <- sniff_entropy(
  groups_data,
  scope = "groups",
  start_year = 2010,
  end_year = 2020
)

# Access results
entropy_data <- entropy_results$data
entropy_plots <- entropy_results$plots
year_range <- entropy_results$years_range

# Interpret results: higher entropy indicates greater keyword diversity
high_entropy_groups <- entropy_data %>%
  group_by(group) %>%
  summarise(mean_entropy = mean(index, na.rm = TRUE)) %>%
  arrange(desc(mean_entropy))
} # }