Alex J. Bowers

Teachers College, Columbia University

Yingqi Huan

Teachers College, Columbia University

Jonathan Williams

Teachers College, Columbia University

Maeghan Sill

Teachers College, Columbia University

November 1, 2025


Online Supplement 11 Appendix for Bowers et al. (2025) Mapping Public Open Access K-12 State Education Indicator Data Across 7 States and Washington D.C. Using the FAIR Data Principles: Social Network Analysis Visualizations and Code

Alex J. Bowers, Yingqi Huan, Jonathan Williams, Maeghan Sill
Teachers College, Columbia University

This document is the online R markdown supplement to the report titled “Mapping Public Open Access K-12 State Education Indicator Data Across 7 States and Washington D.C. Using the FAIR Data Principles” https://doi.org/10.7916/c0jk-5e64.

This document contains the R code and visualizations for each of the Social Network Analysis (SNA) figures in the report.

Note: For more information on each of the visualizations, please refer to the full report published online.

This research was made possible by support from The Wallace Foundation. The opinions described herein are solely those of the authors.

Online Supplement 11 Appendix for Bowers et al. (2025) Mapping Public Open Access K-12 State Education Indicator Data Across 7 States and Washington D.C. Using the FAIR Data Principles: Social Network Analysis Visualizations and Code. Teachers College, Columbia University. New York, NY. https://doi.org/10.7916/9m7q-sq86

Copyright © The Authors, 2025.

Text, images, and data in this work are licensed under Creative Commons BY 4.0 (CC-BY:4.0)
https://creativecommons.org/licenses/by/4.0/

Code, markdowns, and scripts are released under the MIT License
https://opensource.org/licenses/MIT


Social Network Analysis of Datasets

This Markdown

This R Markdown document serves as a comprehensive supplementary appendix, providing both static and interactive social network analysis (SNA) visualizations for educational equity data across eight jurisdictions (seven states plus Washington, D.C.). The document is particularly valuable because it makes the complex analytical methodology transparent and reproducible by including all the R code necessary to generate both publication-ready static network plots and interactive web-based visualizations. Each jurisdiction’s analysis examines how datasets connect to the 16 National Academies of Sciences, Engineering, and Medicine (NASEM) educational equity indicators and to one another. The code systematically processes metadata from the Metadata Megatable (MDMT) files to construct networks showing these relationships. The document enables readers to reproduce the exact static visualizations from the main report as well as to reproduce and explore dynamic versions of each network. It serves as a template for similar analyses of educational data availability and alignment with equity frameworks, making it an essential resource for researchers, policymakers, and data scientists.

The file begins with the code for each state or district’s static and dynamic social network analysis. Each of these sections begin by loading necessary packages, reads in the respective MDMT file from the downloads folder, and then creates the plot. After the individual plots is the code for calculating the social network descriptive statistics, such as average indegree, average closeness, and average betweenness, and the corresponding visualizations. The file ends with the code for creating the panel plot of all 8 static social network analysis diagrams.

Data

The MDMT files serve as the foundational data structure for these social network analyses. There are 8 files in total, one for each of the jurisdictions from the “Mapping Public Open Access K-12 State Education Indicator Data Across 7 States and Washington D.C. Using the FAIR Data Principles” report. Each file contains comprehensive metadata about datasets found on its respective education agency’s websites, the Stanford Education Data Archive, and the Civil Rights Data Collection. These files systematically catalog each dataset’s alignment with the 16 NASEM educational equity indicators. Each row represents a unique dataset, with columns capturing essential metadata, including file identifiers, academic years, data systems, and, crucially, coding (0-4 scale) indicating whether and to what degree each dataset addresses specific equity indicators such as access to effective teaching, school climate, or postsecondary readiness. More information about the files’ contents can be found in the full report. The datasets can be downloaded from the appendix of the main report and have the following actual file names:
- 20250501_CALL-ECL-MEI_California_v1.3_Metadata_Megatable.csv https://doi.org/10.7916/jqkg-2j17
- 20250501_CALL-ECL-MEI_Ohio_v1.3_Metadata_Megatable.csv https://doi.org/10.7916/wv0f-jq15
- 20250501_CALL-ECL-MEI_Texas_v1.3_Metadata_Megatable.csv https://doi.org/10.7916/1hx6-9013
- 20250501_CALL-ECL-MEI_NorthCarolina_v1.3_Metadata_Megatable.csv https://doi.org/10.7916/mr7d-zc90
- 20250501_CALL-ECL-MEI_WashingtonDC_v1.3_Metadata_Megatable.csv https://doi.org/10.7916/zzfq-7212
- 20250501_CALL-ECL-MEI_Kentucky_v1.3_Metadata_Megatable.csv https://doi.org/10.7916/8gj9-ce21
- 20250501_CALL-ECL-MEI_Maryland_v1.3_Metadata_Megatable.csv https://doi.org/10.7916/gn8t-dp57
- 20250501_CALL-ECL-MEI_Oregon_v1.3_Metadata_Megatable.csv https://doi.org/10.7916/wgbv-mx11
- MDMT data dictionary: https://doi.org/10.7916/ydwn-1s20

Packages

This R Markdown document uses R v. 4.4.3 [@base] and several key packages for network analysis and data visualization. The primary packages include tidyverse v. 2.0.0 [@tidyverse] for data manipulation and ggplot2 (part of the tidyverse v. 2.0.0 [@tidyverse]) for static visualizations, igraph v. 2.1.4 [@igraph2006; @igraph2025] and tidygraph v. 1.3.1 [@tidygraph] for network construction and analysis, ggraph v. 2.2.1 [@ggraph] for static network plotting, and visNetwork v. 2.1.2 [@visNetwork] for interactive network visualizations. Additional packages like ggrepel v. 0.9.6 [@ggrepel] handle label positioning, magick v. 2.9.0 [@magick] processes images, stringr (part of the tidyverse v. 2.0.0 [@tidyverse]) manages text operations, and htmlwidgets v. 1.6.4 [@htmlwidgets] enables interactive web-based outputs. To support figure assembly and export, cowplot v. 1.1.3 [@cowplot] arranges multi-panel composites, patchwork v. 1.3.0 [@patchwork] combines ggplot-based visualizations into aligned layouts, and svglite v. 2.2.1 [@svglite] provides high-resolution SVG rendering for publication-quality graphics. Layout algorithms are enhanced with graphlayouts v. 1.2.2 [@graphlayouts], while RColorBrewer v. 1.1.3 [@RColorBrewer] provides color schemes. The packages work together to create both static and dynamic social network analysis visualizations, with ggraph v. 2.2.1 [@ggraph] generating publication-ready static plots and visNetwork v. 2.1.2 [@visNetwork] creating interactive web-based networks with hover functionality and dynamic layouts. All packages are available on CRAN and represent current best practices for network visualization in R.


library(grateful)

California

Static

# Load packages
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(ggrepel)
library(magick)
library(grid)

# Load data
downloads_path <- file.path(Sys.getenv("HOME"), "Downloads")
csv_file <- file.path(downloads_path, "20250501_CALL-ECL-MEI_California_v1.3_Metadata_Megatable.csv")
if (!file.exists(csv_file)) {
  stop(paste0(
    "Cannot find the CSV file.\n",
    "Please make sure you have downloaded '20250501_CALL-ECL-MEI_California_v1.3_Metadata_Megatable.csv'\n",
    "and that it is located in your Downloads folder."
  ))
}
mdmt_data <- read.csv(csv_file)

# Define indicator columns
indicator_cols <- c(
  "nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs",
  "nasem_pre_k_academic_readiness",
  "nasem_pre_k_self_regulation_and_attention_skills",
  "nasem_k_12_access_to_effective_teaching",
  "nasem_k_12_access_to_rigorous_coursework",
  "nasem_k_12_curricular_breadth",
  "nasem_k_12_access_to_high_quality_academic_supports",
  "nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation",
  "nasem_k_12_school_climate",
  "nasem_k_12_nonexclusionary_discipline_practices",
  "nasem_k_12_nonacademic_supports_for_student_success",
  "nasem_k_12_engagement_in_schooling",
  "nasem_k_12_performance_in_coursework",
  "nasem_k_12_performance_on_tests",
  "nasem_ea_on_time_graduation",
  "nasem_ea_postsecondary_readiness"
)

# Build edges (Value = 3/4)
connections_long <- mdmt_data %>%
  select(all_of(c("id", indicator_cols))) %>%
  pivot_longer(cols = all_of(indicator_cols), names_to = "Indicator", values_to = "Value") %>%
  mutate(Value = as.numeric(as.character(Value))) %>%
  filter(Value %in% c(1, 2, 3, 4))
edges <- connections_long %>% select(from = id, to = Indicator)

# Build node table
all_nodes <- bind_rows(
  data.frame(id = unique(connections_long$id), group = "File"),
  data.frame(id = indicator_cols, group = "Indicator")
)

# Build graph
g <- graph_from_data_frame(d = edges, vertices = all_nodes, directed = TRUE)
connected_datasets <- unique(edges$from)
connected_indicators <- unique(edges$to)
isolated_datasets <- setdiff(unique(mdmt_data$id), connected_datasets)
isolated_indicators <- setdiff(indicator_cols, connected_indicators)

# Colors
indicator_colors <- c(
  'nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs' = '#BC808D',
  'nasem_pre_k_academic_readiness' = '#9C9A2A',
  'nasem_pre_k_self_regulation_and_attention_skills' = '#9F94C8',
  'nasem_k_12_access_to_effective_teaching' = '#FB8072',
  'nasem_k_12_access_to_rigorous_coursework' = '#556b2f',
  'nasem_k_12_curricular_breadth' = '#FDB462',
  'nasem_k_12_access_to_high_quality_academic_supports' = '#20b2aa',
  'nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation' = '#E67AB8',
  'nasem_k_12_school_climate' = '#4443D9',
  'nasem_k_12_nonexclusionary_discipline_practices' = '#BC80BD',
  'nasem_k_12_nonacademic_supports_for_student_success' = '#993300',
  'nasem_k_12_engagement_in_schooling' = '#F31D6F',
  'nasem_k_12_performance_in_coursework' = '#2E86AB',
  'nasem_k_12_performance_on_tests' = 'darkblue',
  'nasem_ea_on_time_graduation' = '#3FA9A1',
  'nasem_ea_postsecondary_readiness' = '#339966'
)
dataset_color <- "#A7BED3"

# Build tidygraph
tg <- as_tbl_graph(g) %>%
  mutate(
    degree = degree(g)[name],
    color = ifelse(group == "Indicator", indicator_colors[name], dataset_color),
    size = ifelse(group == "Indicator", log1p(degree) * 3 + 3, 1.1)
  )
set.seed(123)
layout_df <- create_layout(
  tg, layout = "fr", niter = 5000, area = vcount(tg)^3.3, repulserad = vcount(tg)^3
)
layout_df <- layout_df %>%
  mutate(
    name = str_replace_all(name, "nasem_", ""),
    name = str_replace_all(name, "ea_", ""),
    name = str_replace_all(name, "pre_k_", "pre_k_12_")
  )

# Compute metrics for annotations
g_undirected <- as.undirected(g, mode = "collapse")
options(scipen = 999)
avg_indegree <- round(mean(degree(g, mode = "in"), na.rm = TRUE), 2)
avg_closeness <- round(mean(closeness(g_undirected, normalized = FALSE), na.rm = TRUE), 4)
avg_eigen <- round(mean(eigen_centrality(g_undirected)$vector, na.rm = TRUE), 2)
avg_betweenness <- round(mean(betweenness(g_undirected, normalized = FALSE), na.rm = TRUE), 2)
ca_metrics <- data.frame(
  State = "California",
  Indegree = avg_indegree,
  Closeness = avg_closeness,
  Eigenvector = avg_eigen,
  Betweenness = avg_betweenness
)
avg_indegree <- sprintf("%.2f", mean(degree(g, mode = "in"), na.rm = TRUE))
avg_closeness <- sprintf("%.4f", mean(closeness(g_undirected, normalized = FALSE), na.rm = TRUE))
avg_eigen <- sprintf("%.2f", mean(eigen_centrality(g_undirected)$vector, na.rm = TRUE))
avg_betweenness <- sprintf("%.2f", mean(betweenness(g_undirected, normalized = FALSE), na.rm = TRUE))

total_datasets <- n_distinct(mdmt_data$id)
total_indicators <- n_distinct(edges$to)
x_center <- mean(layout_df$x, na.rm = TRUE)
y_top <- max(layout_df$y, na.rm = TRUE)

# Plot (main)
p_ca <- ggraph(layout_df) +
  geom_edge_link(color = "gray60", width = 0.25, alpha = 0.6) +
  geom_node_point(
    aes(fill = I(color), size = size),
    shape = 21, color = "white", stroke = 0.6, alpha = 1, show.legend = FALSE
  ) +
  geom_text_repel(
    data = subset(layout_df, group == "Indicator"),
    aes(x = x, y = y, label = name, color = I(color)),
    size = 2.3, fontface = "bold", alpha = 0.95,
    box.padding = 0.35, point.padding = 0.2,
    segment.color = "gray70", segment.size = 0.18,
    force = 2.5, max.overlaps = Inf, seed = 123
  ) +
  coord_fixed(ratio = 1, clip = "off") +
  theme_void(base_size = 13) +
  annotate("text", x = x_center, y = y_top + 9, label = "California",
           fontface = "bold", size = 6.5, hjust = 0.5) +
  annotate("text", x = x_center, y = y_top + 7,
           label = paste0("Total N of Datasets: ", total_datasets,
                          " | Equity Indicators: N = ", total_indicators, " out of 16"),
           size = 3, hjust = 0.5) +
  annotate("text", x = x_center, y = y_top + 5.8,
           label = paste0("Average Indegree: ", avg_indegree,
                          " | Average Closeness: ", avg_closeness,
                          " | Average Betweenness: ", avg_betweenness),
           size = 3, hjust = 0.5)
p_ca

p <- p_ca
# Export
desktop_path <- file.path(Sys.getenv("HOME"), "Desktop")
output_path <- file.path(desktop_path, "SNA_outputs")
if (!dir.exists(output_path)) dir.create(output_path)
state <- "CA"
ggsave(file.path(output_path, paste0(state, "_SNA.png")),
       plot = p, width = 5.5, height = 5.5, dpi = 300, units = "in", bg = "white")
ggsave(file.path(output_path, paste0(state, "_SNA.jpg")),
       plot = p, width = 5.5, height = 5.5, dpi = 300, units = "in", bg = "white")
ggsave(file.path(output_path, paste0(state, "_SNA.svg")),
       plot = p, width = 5.5, height = 5.5, units = "in", bg = "white")
img <- image_read(file.path(output_path, paste0(state, "_SNA.png")))
image_write(img, file.path(output_path, paste0(state, "_SNA.gif")))



Dynamic

# Load packages
library(tidyverse)
library(igraph)
library(visNetwork)
library(RColorBrewer)
library(htmlwidgets)
set.seed(123)

# Data Loading and Initial Setup
downloads_path <- file.path(Sys.getenv("HOME"), "Downloads")
csv_file <- file.path(downloads_path, "20250501_CALL-ECL-MEI_California_v1.3_Metadata_Megatable.csv")
if (!file.exists(csv_file)) {
  stop(paste0(
    "Cannot find the CSV file.\n",
    "Please make sure you have downloaded '220250501_CALL-ECL-MEI_California_v1.3_Metadata_Megatable.csv'\n",
    "and that it is located in your Downloads folder."
  ))
}
mdmt_data <- read.csv(csv_file)

indicator_cols <- c(
  "nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs",
  "nasem_pre_k_academic_readiness",
  "nasem_pre_k_self_regulation_and_attention_skills",
  "nasem_k_12_access_to_effective_teaching",
  "nasem_k_12_access_to_rigorous_coursework",
  "nasem_k_12_curricular_breadth",
  "nasem_k_12_access_to_high_quality_academic_supports",
  "nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation",
  "nasem_k_12_school_climate",
  "nasem_k_12_nonexclusionary_discipline_practices",
  "nasem_k_12_nonacademic_supports_for_student_success",
  "nasem_k_12_engagement_in_schooling",
  "nasem_k_12_performance_in_coursework",
  "nasem_k_12_performance_on_tests",
  "nasem_ea_on_time_graduation",
  "nasem_ea_postsecondary_readiness"
)
file_attr_cols <- c("id", "actual_file_name", "academic_year", "data_system_for_download")

# Color Schemes and Mapping
COLOR_INDICATOR_PALETTE <- c(
    "#BC808D", "#bbb434", "#BEBADA", "#FB8072", "#556b2f", "#FDB462",
    "#20b2aa", "#FCCDE5", "#4443D9", "#BC80BD", "#993300", "#F31D6F",
    "#440154", "#234151", "#390000", "#5DC863"
)
names(COLOR_INDICATOR_PALETTE) <- indicator_cols 

COLOR_FILE_NEW_LIGHT <- "#81D4C6"
COLOR_FILE_BORDER_LIGHT <- "#60B8AE"
WHITE_BORDER <- "#FFFFFF"

COLOR_EDGE_NEUTRAL <- "#D3D3D3" 
COLOR_EDGE_HIGHLIGHT <- "#A9A9A9" 

indicator_map <- data.frame(
    id = indicator_cols,
    full_label = c(
        "Pre-K Access to and Participation in High-Quality Pre-K Programs",
        "Pre-K Academic Readiness",
        "Pre-K Self-Regulation and Attention Skills",
        "K-12 Access to Effective Teaching",
        "K-12 Access to Rigorous Coursework",
        "K-12 Curricular Breadth",
        "K-12 Access to High-Quality Academic Supports",
        "K-12 Students’ Exposure to Racial, Ethnic, and Economic Segregation",
        "K-12 School Climate",
        "K-12 Nonexclusionary Discipline Practices",
        "K-12 Nonacademic Supports for Student Success",
        "K-12 Engagement in Schooling",
        "K-12 Performance in Coursework",
        "K-12 Performance on Tests",
        "On-Time Graduation",
        "Postsecondary Readiness"
    ),
    color_bg = COLOR_INDICATOR_PALETTE,
    stringsAsFactors = FALSE
)

# Data Preprocessing and Network Construction
file_attributes <- mdmt_data %>%
    select(all_of(file_attr_cols)) %>%
    mutate(id = as.character(id)) %>%
    rename(label = id, title_file = actual_file_name)

connections_long <- mdmt_data %>%
    select(all_of(c("id", indicator_cols))) %>%
    pivot_longer(
        cols = all_of(indicator_cols),
        names_to = "Indicator",
        values_to = "Value"
    ) %>%
    mutate(
        id = as.character(id),
        Value = as.numeric(as.character(Value))
    ) %>%
  filter(Value %in% c(1, 2, 3, 4))

edges <- connections_long %>%
    select(from = id, to = Indicator)

all_nodes <- bind_rows(
    data.frame(id = file_attributes$label, group = "File", stringsAsFactors = FALSE),
    data.frame(id = indicator_cols, group = "Indicator", stringsAsFactors = FALSE)
) %>%
    distinct(id, .keep_all = TRUE)

g <- graph_from_data_frame(d = edges, vertices = all_nodes, directed = FALSE)
indicator_degrees <- degree(g, v = V(g)[V(g)$group == "Indicator"], mode = "all")

# Year Range Logic
get_compact_year_range <- function(years) {
    years <- years[!is.na(years)]
    years <- as.integer(years)
    years <- years[!is.na(years)]

    if (length(years) == 0) {
        return("No valid year data")
    }

    years <- sort(unique(years))
    min_year <- min(years)
    max_year <- max(years)

    if (min_year == max_year) {
        return(as.character(min_year))
    }

    full_range <- seq(min_year, max_year)
    missing_years <- setdiff(full_range, years)

    year_range_text <- paste0(min_year, " - ", max_year)

    if (length(missing_years) > 0) {
        missing_text <- paste(missing_years, collapse = ", ")
        return(paste0(year_range_text, " (Missing: ", missing_text, ")"))
    } else {
        return(year_range_text)
    }
}

# Prepare visNetwork Node Data
# Setting node sizes and colors
nodes_vis <- all_nodes %>%
    left_join(file_attributes, by = c("id" = "label")) %>%
    left_join(indicator_map %>% select(id, full_label, color_bg), by = "id") %>%
    distinct(id, .keep_all = TRUE) %>%
    mutate(
        # Node colors and white border
        color.background = case_when(
            group == "File" ~ COLOR_FILE_NEW_LIGHT,
            group == "Indicator" ~ color_bg
        ),
        color.border = WHITE_BORDER, 
        borderWidth = 3,
        
        # Highlight colors 
        color.highlight.background = color.background,
        color.highlight.border = WHITE_BORDER,
        
        # Node size
        value = case_when(
            group == "Indicator" ~ indicator_degrees[id] * 7 + 20,
            group == "File" ~ 15
        ),
        
        # Labels
        label = case_when(
            group == "Indicator" ~ full_label,
            group == "File" ~ ""
        ),
        
        font.color = case_when(
            group == "Indicator" ~ color_bg,
            TRUE ~ "transparent"
        ),
        font.size = ifelse(group == "Indicator", 18, 12), 

        dropdown_label = ifelse(group == "Indicator", full_label, "") 
    )

# Calculate Indicator Hover Info
for (ind in indicator_cols) {
    connected_files_id <- connections_long %>%
      filter(Indicator == ind) %>%
      pull(id) %>% 
      unique()

    connected_files <- mdmt_data %>%
        mutate(id = as.character(id)) %>%
        filter(id %in% connected_files_id) %>%
        distinct(id, data_system_for_download, .keep_all = TRUE) %>%
        select(Year_Str = academic_year, data_system_for_download) %>%
        mutate(
            Year = as.integer(str_extract(Year_Str, "^\\d{4}"))
        )

    ds_stats <- connected_files %>% group_by(data_system_for_download) %>% summarise(Count = n())

    year_vector <- connected_files %>% pull(Year) %>% unique()
    compact_year_text <- get_compact_year_range(year_vector)

    ds_text <- paste(ds_stats$data_system_for_download, ds_stats$Count, sep = ": ", collapse = "<br>")
    final_title <- paste0(
        "<b>Indicator Name:</b> ", indicator_map$full_label[indicator_map$id == ind],
        "<br><b>Total Connected Files:</b> ", length(connected_files_id),
        "<br><b>Year Span:</b> ", compact_year_text,
        "<br><b>Data System Distribution:</b><br>", ds_text
    )

    nodes_vis[nodes_vis$id == ind, "title"] <- final_title
}

# Handle File Node Title
nodes_vis <- nodes_vis %>%
    mutate(
        title = ifelse(group == "File",
                       paste0("<b>File ID:</b> ", id,
                              "<br><b>Name:</b> ", title_file,
                              "<br><b>Year:</b> ", academic_year,
                              "<br><b>System:</b> ", data_system_for_download),
                       title)
    ) %>%
    select(-title_file, -full_label, -color_bg, -academic_year)

# Prepare visNetwork edge data
edges_vis_colored <- edges %>%
    mutate(
        id = 1:n(),
        color = list(list(color = COLOR_EDGE_NEUTRAL, highlight = COLOR_EDGE_HIGHLIGHT))
    ) %>%
    select(from, to, id, color)

# Prepare the number of files data
total_files <- length(unique(mdmt_data$id))
connected_files <- length(unique(edges$from))

# Only keep connected node
nodes_vis <- nodes_vis %>%
  filter(id %in% unique(c(edges$from, edges$to)))

# Generate visNetwork Visualization
visNetwork(nodes_vis, edges_vis_colored,
           main = paste0(
    "<div style='text-align:center;'>",
    "<span style='font-size:20px; font-weight:bold;'>California SNA</span><br>",
    "<span style='font-size:13px; font-weight:normal;'>",
    "Total N of Datasets: ", total_files,
    "</span></div>"
  ))  %>%
  visIgraphLayout(randomSeed = 123) %>%
  visOptions(
    highlightNearest = list(enabled = TRUE, degree = 1, hover = TRUE)
  ) %>%
  visPhysics(
    solver = "barnesHut",
    barnesHut = list(
      gravitationalConstant = -15000,
      springLength = 100
    )
  ) %>%
  visNodes(
    shape = "dot",
    font = list(
      color = nodes_vis$font.color,
      size = nodes_vis$font.size,
      face = "arial",
      vadjust = 0, 
      strokeWidth = 1, 
      strokeColor = "white"
    ),
    scaling = list(
      min = 15,
      max = 60,
      label = list(
        enabled = TRUE,
        min = 19,
        max = 30,
        maxVisible = 26
      )
    )
  ) %>%
  visEdges(
    color = list(inherit = FALSE),
    smooth = FALSE,
    hover = list(color = COLOR_EDGE_HIGHLIGHT),
    selection = list(color = COLOR_EDGE_HIGHLIGHT)
  )




Ohio

Static

# Load packages
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(ggrepel)
library(magick)
library(grid)

# Load data and initial setup
downloads_path <- file.path(Sys.getenv("HOME"), "Downloads")
csv_file <- file.path(downloads_path, "20250501_CALL-ECL-MEI_Ohio_v1.3_Metadata_Megatable.csv")
if (!file.exists(csv_file)) {
  stop(paste0(
    "Cannot find the CSV file.\n",
    "Please make sure you have downloaded '20250501_CALL-ECL-MEI_Ohio_v1.3_Metadata_Megatable.csv'\n",
    "and that it is located in your Downloads folder."
  ))
}
mdmt_data <- read.csv(csv_file)

# Define indicator columns
indicator_cols <- c(
  "nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs",
  "nasem_pre_k_academic_readiness",
  "nasem_pre_k_self_regulation_and_attention_skills",
  "nasem_k_12_access_to_effective_teaching",
  "nasem_k_12_access_to_rigorous_coursework",
  "nasem_k_12_curricular_breadth",
  "nasem_k_12_access_to_high_quality_academic_supports",
  "nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation",
  "nasem_k_12_school_climate",
  "nasem_k_12_nonexclusionary_discipline_practices",
  "nasem_k_12_nonacademic_supports_for_student_success",
  "nasem_k_12_engagement_in_schooling",
  "nasem_k_12_performance_in_coursework",
  "nasem_k_12_performance_on_tests",
  "nasem_ea_on_time_graduation",
  "nasem_ea_postsecondary_readiness"
)

# Build edges
connections_long <- mdmt_data %>%
  select(all_of(c("id", indicator_cols))) %>%
  pivot_longer(cols = all_of(indicator_cols), names_to = "Indicator", values_to = "Value") %>%
  mutate(Value = as.numeric(as.character(Value))) %>%
  filter(Value %in% c(1, 2, 3, 4))
edges <- connections_long %>% select(from = id, to = Indicator)

# Build node table
all_nodes <- bind_rows(
  data.frame(id = unique(connections_long$id), group = "File"),
  data.frame(id = indicator_cols, group = "Indicator")
)

# Build graph
g <- graph_from_data_frame(d = edges, vertices = all_nodes, directed = TRUE)
connected_datasets <- unique(edges$from)
connected_indicators <- unique(edges$to)
isolated_datasets <- setdiff(unique(mdmt_data$id), connected_datasets)
isolated_indicators <- setdiff(indicator_cols, connected_indicators)

# Colors
indicator_colors <- c(
  'nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs' = '#BC808D',
  'nasem_pre_k_academic_readiness' = '#9C9A2A',
  'nasem_pre_k_self_regulation_and_attention_skills' = '#9F94C8',
  'nasem_k_12_access_to_effective_teaching' = '#FB8072',
  'nasem_k_12_access_to_rigorous_coursework' = '#556b2f',
  'nasem_k_12_curricular_breadth' = '#FDB462',
  'nasem_k_12_access_to_high_quality_academic_supports' = '#20b2aa',
  'nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation' = '#E67AB8',
  'nasem_k_12_school_climate' = '#4443D9',
  'nasem_k_12_nonexclusionary_discipline_practices' = '#BC80BD',
  'nasem_k_12_nonacademic_supports_for_student_success' = '#993300',
  'nasem_k_12_engagement_in_schooling' = '#F31D6F',
  'nasem_k_12_performance_in_coursework' = '#2E86AB',
  'nasem_k_12_performance_on_tests' = 'darkblue',
  'nasem_ea_on_time_graduation' = '#3FA9A1',
  'nasem_ea_postsecondary_readiness' = '#339966'
)
dataset_color <- "#A7BED3"

# Build tidygraph
tg <- as_tbl_graph(g) %>%
  mutate(
    degree = degree(g)[name],
    color = ifelse(group == "Indicator", indicator_colors[name], dataset_color),
    size = ifelse(group == "Indicator", log1p(degree) * 3 + 3, 1.1)
  )
set.seed(123)
layout_df <- create_layout(
  tg, layout = "fr", niter = 5000, area = vcount(tg)^3.3, repulserad = vcount(tg)^3
)
layout_df <- layout_df %>%
  mutate(
    name = str_replace_all(name, "nasem_", ""),
    name = str_replace_all(name, "ea_", ""),
    name = str_replace_all(name, "pre_k_", "pre_k_12_")
  )

# Compute metrics for annotations
g_undirected <- as.undirected(g, mode = "collapse")
options(scipen = 999)
avg_indegree <- round(mean(degree(g, mode = "in"), na.rm = TRUE), 2)
avg_closeness <- round(mean(closeness(g_undirected, normalized = FALSE), na.rm = TRUE), 4)
avg_eigen <- round(mean(eigen_centrality(g_undirected)$vector, na.rm = TRUE), 2)
avg_betweenness <- round(mean(betweenness(g_undirected, normalized = FALSE), na.rm = TRUE), 2)
oh_metrics <- data.frame(
  State = "Ohio",
  Indegree = avg_indegree,
  Closeness = avg_closeness,
  Eigenvector = avg_eigen,
  Betweenness = avg_betweenness
)
avg_indegree <- sprintf("%.2f", mean(degree(g, mode = "in"), na.rm = TRUE))
avg_closeness <- sprintf("%.4f", mean(closeness(g_undirected, normalized = FALSE), na.rm = TRUE))
avg_eigen <- sprintf("%.2f", mean(eigen_centrality(g_undirected)$vector, na.rm = TRUE))
avg_betweenness <- sprintf("%.2f", mean(betweenness(g_undirected, normalized = FALSE), na.rm = TRUE))

total_datasets <- n_distinct(mdmt_data$id)
total_indicators <- n_distinct(edges$to)
x_center <- mean(layout_df$x, na.rm = TRUE)
y_top <- max(layout_df$y, na.rm = TRUE)

# Plot (main)
p_oh <- ggraph(layout_df) +
  geom_edge_link(color = "gray60", width = 0.25, alpha = 0.6) +
  geom_node_point(
    aes(fill = I(color), size = size),
    shape = 21, color = "white", stroke = 0.6, alpha = 1, show.legend = FALSE
  ) +
  geom_text_repel(
    data = subset(layout_df, group == "Indicator"),
    aes(x = x, y = y, label = name, color = I(color)),
    size = 2.3, fontface = "bold", alpha = 0.95,
    box.padding = 0.35, point.padding = 0.2,
    segment.color = "gray70", segment.size = 0.18,
    force = 15, max.overlaps = Inf, seed = 123
  ) +
  coord_fixed(ratio = 1, clip = "off") +
  theme_void(base_size = 13) +
  annotate("text", x = x_center, y = y_top + 9, label = "Ohio",
           fontface = "bold", size = 6.5, hjust = 0.5) +
  annotate("text", x = x_center, y = y_top + 7,
           label = paste0("Total N of Datasets: ", total_datasets,
                          " | Equity Indicators: N = ", total_indicators, " out of 16"),
           size = 3, hjust = 0.5) +
  annotate("text", x = x_center, y = y_top + 5.8,
           label = paste0("Average Indegree: ", avg_indegree,
                          " | Average Closeness: ", avg_closeness,
                          " | Average Betweenness: ", avg_betweenness),
           size = 3, hjust = 0.5)
p_oh

p <- p_oh
# Export
desktop_path <- file.path(Sys.getenv("HOME"), "Desktop")
output_path <- file.path(desktop_path, "SNA_outputs")
if (!dir.exists(output_path)) dir.create(output_path)
state <- "OH"
ggsave(file.path(output_path, paste0(state, "_SNA.png")),
       plot = p, width = 5.5, height = 5.5, dpi = 300, units = "in", bg = "white")
ggsave(file.path(output_path, paste0(state, "_SNA.jpg")),
       plot = p, width = 5.5, height = 5.5, dpi = 300, units = "in", bg = "white")
ggsave(file.path(output_path, paste0(state, "_SNA.svg")),
       plot = p, width = 5.5, height = 5.5, units = "in", bg = "white")
img <- image_read(file.path(output_path, paste0(state, "_SNA.png")))
image_write(img, file.path(output_path, paste0(state, "_SNA.gif")))



Dynamic

# Load packages
library(tidyverse)
library(igraph)
library(visNetwork)
library(RColorBrewer)
library(htmlwidgets)
set.seed(123)

# Data Loading and Initial Setup
downloads_path <- file.path(Sys.getenv("HOME"), "Downloads")
csv_file <- file.path(downloads_path, "20250501_CALL-ECL-MEI_Ohio_v1.3_Metadata_Megatable.csv")
if (!file.exists(csv_file)) {
  stop(paste0(
    "Cannot find the CSV file.\n",
    "Please make sure you have downloaded '20250501_CALL-ECL-MEI_Ohio_v1.3_Metadata_Megatable.csv'\n",
    "and that it is located in your Downloads folder."
  ))
}
mdmt_data <- read.csv(csv_file)

indicator_cols <- c(
  "nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs",
  "nasem_pre_k_academic_readiness",
  "nasem_pre_k_self_regulation_and_attention_skills",
  "nasem_k_12_access_to_effective_teaching",
  "nasem_k_12_access_to_rigorous_coursework",
  "nasem_k_12_curricular_breadth",
  "nasem_k_12_access_to_high_quality_academic_supports",
  "nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation",
  "nasem_k_12_school_climate",
  "nasem_k_12_nonexclusionary_discipline_practices",
  "nasem_k_12_nonacademic_supports_for_student_success",
  "nasem_k_12_engagement_in_schooling",
  "nasem_k_12_performance_in_coursework",
  "nasem_k_12_performance_on_tests",
  "nasem_ea_on_time_graduation",
  "nasem_ea_postsecondary_readiness"
)
file_attr_cols <- c("id", "actual_file_name", "academic_year", "data_system_for_download")

# Color Schemes and Mapping
COLOR_INDICATOR_PALETTE <- c(
    "#BC808D", "#bbb434", "#BEBADA", "#FB8072", "#556b2f", "#FDB462",
    "#20b2aa", "#FCCDE5", "#4443D9", "#BC80BD", "#993300", "#F31D6F",
    "#440154", "#234151", "#390000", "#5DC863"
)
names(COLOR_INDICATOR_PALETTE) <- indicator_cols 

COLOR_FILE_NEW_LIGHT <- "#81D4C6"
COLOR_FILE_BORDER_LIGHT <- "#60B8AE"
WHITE_BORDER <- "#FFFFFF"

COLOR_EDGE_NEUTRAL <- "#D3D3D3" 
COLOR_EDGE_HIGHLIGHT <- "#A9A9A9" 

indicator_map <- data.frame(
    id = indicator_cols,
    full_label = c(
        "Pre-K Access to and Participation in High-Quality Pre-K Programs",
        "Pre-K Academic Readiness",
        "Pre-K Self-Regulation and Attention Skills",
        "K-12 Access to Effective Teaching",
        "K-12 Access to Rigorous Coursework",
        "K-12 Curricular Breadth",
        "K-12 Access to High-Quality Academic Supports",
        "K-12 Students’ Exposure to Racial, Ethnic, and Economic Segregation",
        "K-12 School Climate",
        "K-12 Nonexclusionary Discipline Practices",
        "K-12 Nonacademic Supports for Student Success",
        "K-12 Engagement in Schooling",
        "K-12 Performance in Coursework",
        "K-12 Performance on Tests",
        "On-Time Graduation",
        "Postsecondary Readiness"
    ),
    color_bg = COLOR_INDICATOR_PALETTE,
    stringsAsFactors = FALSE
)

# Data Preprocessing and Network Construction
file_attributes <- mdmt_data %>%
    select(all_of(file_attr_cols)) %>%
    mutate(id = as.character(id)) %>%
    rename(label = id, title_file = actual_file_name)

connections_long <- mdmt_data %>%
    select(all_of(c("id", indicator_cols))) %>%
    pivot_longer(
        cols = all_of(indicator_cols),
        names_to = "Indicator",
        values_to = "Value"
    ) %>%
    mutate(
        id = as.character(id),
        Value = as.numeric(as.character(Value))
    ) %>%
    filter(Value %in% c(1, 2, 3, 4))

edges <- connections_long %>%
    select(from = id, to = Indicator)

all_nodes <- bind_rows(
    data.frame(id = file_attributes$label, group = "File", stringsAsFactors = FALSE),
    data.frame(id = indicator_cols, group = "Indicator", stringsAsFactors = FALSE)
) %>%
    distinct(id, .keep_all = TRUE)

g <- graph_from_data_frame(d = edges, vertices = all_nodes, directed = FALSE)
indicator_degrees <- degree(g, v = V(g)[V(g)$group == "Indicator"], mode = "all")

# Year Range Logic
get_compact_year_range <- function(years) {
    years <- years[!is.na(years)]
    years <- as.integer(years)
    years <- years[!is.na(years)]

    if (length(years) == 0) {
        return("No valid year data")
    }

    years <- sort(unique(years))
    min_year <- min(years)
    max_year <- max(years)

    if (min_year == max_year) {
        return(as.character(min_year))
    }

    full_range <- seq(min_year, max_year)
    missing_years <- setdiff(full_range, years)

    year_range_text <- paste0(min_year, " - ", max_year)

    if (length(missing_years) > 0) {
        missing_text <- paste(missing_years, collapse = ", ")
        return(paste0(year_range_text, " (Missing: ", missing_text, ")"))
    } else {
        return(year_range_text)
    }
}

# Prepare visNetwork Node Data
# Setting node sizes and colors
nodes_vis <- all_nodes %>%
    left_join(file_attributes, by = c("id" = "label")) %>%
    left_join(indicator_map %>% select(id, full_label, color_bg), by = "id") %>%
    distinct(id, .keep_all = TRUE) %>%
    mutate(
        # Node colors and white border
        color.background = case_when(
            group == "File" ~ COLOR_FILE_NEW_LIGHT,
            group == "Indicator" ~ color_bg
        ),
        color.border = WHITE_BORDER, 
        borderWidth = 3,
        
        # Highlight colors 
        color.highlight.background = color.background,
        color.highlight.border = WHITE_BORDER,
        
        # Node size
        value = case_when(
            group == "Indicator" ~ indicator_degrees[id] * 7 + 20,
            group == "File" ~ 15
        ),
        
        # Labels
        label = case_when(
            group == "Indicator" ~ full_label,
            group == "File" ~ ""
        ),
        
        font.color = case_when(
            group == "Indicator" ~ color_bg,
            TRUE ~ "transparent"
        ),
        font.size = ifelse(group == "Indicator", 18, 12), 

        dropdown_label = ifelse(group == "Indicator", full_label, "") 
    )

# Calculate Indicator Hover Info
for (ind in indicator_cols) {
    connected_files_id <- connections_long %>%
      filter(Indicator == ind) %>%
      pull(id) %>% 
      unique()

    connected_files <- mdmt_data %>%
        mutate(id = as.character(id)) %>%
        filter(id %in% connected_files_id) %>%
        distinct(id, data_system_for_download, .keep_all = TRUE) %>%
        select(Year_Str = academic_year, data_system_for_download) %>%
        mutate(
            Year = as.integer(str_extract(Year_Str, "^\\d{4}"))
        )

    ds_stats <- connected_files %>% group_by(data_system_for_download) %>% summarise(Count = n())

    year_vector <- connected_files %>% pull(Year) %>% unique()
    compact_year_text <- get_compact_year_range(year_vector)

    ds_text <- paste(ds_stats$data_system_for_download, ds_stats$Count, sep = ": ", collapse = "<br>")
    final_title <- paste0(
        "<b>Indicator Name:</b> ", indicator_map$full_label[indicator_map$id == ind],
        "<br><b>Total Connected Files:</b> ", length(connected_files_id),
        "<br><b>Year Span:</b> ", compact_year_text,
        "<br><b>Data System Distribution:</b><br>", ds_text
    )

    nodes_vis[nodes_vis$id == ind, "title"] <- final_title
}

# Handle File Node Title
nodes_vis <- nodes_vis %>%
    mutate(
        title = ifelse(group == "File",
                       paste0("<b>File ID:</b> ", id,
                              "<br><b>Name:</b> ", title_file,
                              "<br><b>Year:</b> ", academic_year,
                              "<br><b>System:</b> ", data_system_for_download),
                       title)
    ) %>%
    select(-title_file, -full_label, -color_bg, -academic_year)

# Prepare visNetwork edge data
edges_vis_colored <- edges %>%
    mutate(
        id = 1:n(),
        color = list(list(color = COLOR_EDGE_NEUTRAL, highlight = COLOR_EDGE_HIGHLIGHT))
    ) %>%
    select(from, to, id, color)

# Prepare the number of files data
total_files <- length(unique(mdmt_data$id))
connected_files <- length(unique(edges$from))

# Only keep connected node
nodes_vis <- nodes_vis %>%
  filter(id %in% unique(c(edges$from, edges$to)))

# Generate visNetwork Visualization
visNetwork(nodes_vis, edges_vis_colored,
           main = paste0(
    "<div style='text-align:center;'>",
    "<span style='font-size:20px; font-weight:bold;'>Ohio SNA</span><br>",
    "<span style='font-size:13px; font-weight:normal;'>",
    "Total N of Datasets: ", total_files,
    "</span></div>"
  ))  %>%
  visIgraphLayout(randomSeed = 123) %>%
  visOptions(
    highlightNearest = list(enabled = TRUE, degree = 1, hover = TRUE)
  ) %>%
  visPhysics(
    solver = "barnesHut",
    barnesHut = list(
      gravitationalConstant = -15000,
      springLength = 100
    )
  ) %>%
  visNodes(
    shape = "dot",
    font = list(
      color = nodes_vis$font.color,
      size = nodes_vis$font.size,
      face = "arial",
      vadjust = 0, 
      strokeWidth = 1, 
      strokeColor = "white"
    ),
    scaling = list(
      min = 15,
      max = 60,
      label = list(
        enabled = TRUE,
        min = 19,
        max = 30,
        maxVisible = 26
      )
    )
  ) %>%
  visEdges(
    color = list(inherit = FALSE),
    smooth = FALSE,
    hover = list(color = COLOR_EDGE_HIGHLIGHT),
    selection = list(color = COLOR_EDGE_HIGHLIGHT)
  )




Texas

Static

# Load packages
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(ggrepel)
library(graphlayouts)
library(magick)
library(grid)

# Load data
downloads_path <- file.path(Sys.getenv("HOME"), "Downloads")
csv_file <- file.path(downloads_path, "20250501_CALL-ECL-MEI_Texas_v1.3_Metadata_Megatable.csv")
if (!file.exists(csv_file)) {
  stop(paste0(
    "Cannot find the CSV file.\n",
    "Please make sure you have downloaded '20250501_CALL-ECL-MEI_Texas_v1.3_Metadata_Megatable.csv'\n",
    "and that it is located in your Downloads folder."
  ))
}
mdmt_data <- read.csv(csv_file)

# Define indicator columns
indicator_cols <- c(
  "nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs",
  "nasem_pre_k_academic_readiness",
  "nasem_pre_k_self_regulation_and_attention_skills",
  "nasem_k_12_access_to_effective_teaching",
  "nasem_k_12_access_to_rigorous_coursework",
  "nasem_k_12_curricular_breadth",
  "nasem_k_12_access_to_high_quality_academic_supports",
  "nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation",
  "nasem_k_12_school_climate",
  "nasem_k_12_nonexclusionary_discipline_practices",
  "nasem_k_12_nonacademic_supports_for_student_success",
  "nasem_k_12_engagement_in_schooling",
  "nasem_k_12_performance_in_coursework",
  "nasem_k_12_performance_on_tests",
  "nasem_ea_on_time_graduation",
  "nasem_ea_postsecondary_readiness"
)

# Build edges
connections_long <- mdmt_data %>%
  select(all_of(c("id", indicator_cols))) %>%
  pivot_longer(cols = all_of(indicator_cols), names_to = "Indicator", values_to = "Value") %>%
  mutate(Value = as.numeric(as.character(Value))) %>%
  filter(Value %in% c(1, 2, 3, 4))
edges <- connections_long %>% select(from = id, to = Indicator)

# Build node table
all_nodes <- bind_rows(
  data.frame(id = unique(connections_long$id), group = "File"),
  data.frame(id = indicator_cols, group = "Indicator")
)

# Build graph
g <- graph_from_data_frame(d = edges, vertices = all_nodes, directed = TRUE)
connected_datasets <- unique(edges$from)
connected_indicators <- unique(edges$to)
isolated_datasets <- setdiff(unique(mdmt_data$id), connected_datasets)
isolated_indicators <- setdiff(indicator_cols, connected_indicators)

# Colors
indicator_colors <- c(
  'nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs' = '#BC808D',
  'nasem_pre_k_academic_readiness' = '#9C9A2A',
  'nasem_pre_k_self_regulation_and_attention_skills' = '#9F94C8',
  'nasem_k_12_access_to_effective_teaching' = '#FB8072',
  'nasem_k_12_access_to_rigorous_coursework' = '#556b2f',
  'nasem_k_12_curricular_breadth' = '#FDB462',
  'nasem_k_12_access_to_high_quality_academic_supports' = '#20b2aa',
  'nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation' = '#E67AB8',
  'nasem_k_12_school_climate' = '#4443D9',
  'nasem_k_12_nonexclusionary_discipline_practices' = '#BC80BD',
  'nasem_k_12_nonacademic_supports_for_student_success' = '#993300',
  'nasem_k_12_engagement_in_schooling' = '#F31D6F',
  'nasem_k_12_performance_in_coursework' = '#2E86AB',
  'nasem_k_12_performance_on_tests' = 'darkblue',
  'nasem_ea_on_time_graduation' = '#3FA9A1',
  'nasem_ea_postsecondary_readiness' = '#339966'
)
dataset_color <- "#A7BED3"

# Build tidygraph
tg <- as_tbl_graph(g) %>%
  mutate(
    degree = degree(g)[name],
    color = ifelse(group == "Indicator", indicator_colors[name], dataset_color),
    size = ifelse(group == "Indicator", log1p(degree) * 3 + 3, 1.1)
  )
set.seed(123)
# layout_df <- create_layout(tg, layout = "fr", niter = 5000, area = vcount(tg)^3.3, repulserad = vcount(tg)^3)
# layout_df <- create_layout(tg, layout = "stress")
layout_matrix <- layout_with_stress(tg)
layout_df <- create_layout(graph = tg, layout = "manual", x = layout_matrix[, 1],y = layout_matrix[, 2])
layout_df <- layout_df %>%
  mutate(
    name = str_replace_all(name, "nasem_", ""),
    name = str_replace_all(name, "ea_", ""),
    name = str_replace_all(name, "pre_k_", "pre_k_12_")
  )

# Compute metrics for annotations
g_undirected <- as.undirected(g, mode = "collapse")
options(scipen = 999)
avg_indegree <- round(mean(degree(g, mode = "in"), na.rm = TRUE), 2)
avg_closeness <- round(mean(closeness(g_undirected, normalized = FALSE), na.rm = TRUE), 4)
avg_eigen <- round(mean(eigen_centrality(g_undirected)$vector, na.rm = TRUE), 2)
avg_betweenness <- round(mean(betweenness(g_undirected, normalized = FALSE), na.rm = TRUE), 2)
tx_metrics <- data.frame(
  State = "Texas",
  Indegree = avg_indegree,
  Closeness = avg_closeness,
  Eigenvector = avg_eigen,
  Betweenness = avg_betweenness
)
avg_indegree <- sprintf("%.2f", mean(degree(g, mode = "in"), na.rm = TRUE))
avg_closeness <- sprintf("%.4f", mean(closeness(g_undirected, normalized = FALSE), na.rm = TRUE))
avg_eigen <- sprintf("%.2f", mean(eigen_centrality(g_undirected)$vector, na.rm = TRUE))
avg_betweenness <- sprintf("%.2f", mean(betweenness(g_undirected, normalized = FALSE), na.rm = TRUE))

total_datasets <- n_distinct(mdmt_data$id)
total_indicators <- n_distinct(edges$to)
x_center <- mean(layout_df$x, na.rm = TRUE)
y_top <- max(layout_df$y, na.rm = TRUE)

# Plot (main)
p_tx <- ggraph(layout_df) +
  geom_edge_link(color = "gray60", width = 0.25, alpha = 0.6) +
  geom_node_point(
    aes(fill = I(color), size = size),
    shape = 21, color = "white", stroke = 0.6, alpha = 1, show.legend = FALSE
  ) +
  geom_text_repel(
    data = subset(layout_df, group == "Indicator"),
    aes(x = x, y = y, label = name, color = I(color)),
    size = 2.3, fontface = "bold", alpha = 0.95,
    box.padding = 0.35, point.padding = 0.2,
    segment.color = "gray70", segment.size = 0.18,
    force = 2.5, max.overlaps = Inf, seed = 123
  ) +
  coord_fixed(ratio = 1, clip = "off") +
  theme_void(base_size = 13) +
  annotate("text", x = x_center, y = y_top + 3, label = "Texas",
           fontface = "bold", size = 6.5, hjust = 0.5) +
  annotate("text", x = x_center, y = y_top + 2.5,
           label = paste0("Total N of Datasets: ", total_datasets,
                          " | Equity Indicators: N = ", total_indicators, " out of 16"),
           size = 3, hjust = 0.5) +
  annotate("text", x = x_center, y = y_top + 2.2,
           label = paste0("Average Indegree: ", avg_indegree,
                          " | Average Closeness: ", avg_closeness,
                          " | Average Betweenness: ", avg_betweenness),
           size = 3, hjust = 0.5)

p_tx

p <- p_tx
# Export
desktop_path <- file.path(Sys.getenv("HOME"), "Desktop")
output_path <- file.path(desktop_path, "SNA_outputs")
if (!dir.exists(output_path)) dir.create(output_path)
state <- "TX"
ggsave(file.path(output_path, paste0(state, "_SNA.png")),
       plot = p, width = 5.7, height = 5.5, dpi = 300, units = "in", bg = "white")
ggsave(file.path(output_path, paste0(state, "_SNA.jpg")),
       plot = p, width = 5.7, height = 5.5, dpi = 300, units = "in", bg = "white")
ggsave(file.path(output_path, paste0(state, "_SNA.svg")),
       plot = p, width = 5.7, height = 5.5, units = "in", bg = "white")
img <- image_read(file.path(output_path, paste0(state, "_SNA.png")))
image_write(img, file.path(output_path, paste0(state, "_SNA.gif")))



Dynamic

# Load packages
library(tidyverse)
library(igraph)
library(visNetwork)
library(RColorBrewer)
library(htmlwidgets)
set.seed(2025)

# Data Loading and Initial Setup
downloads_path <- file.path(Sys.getenv("HOME"), "Downloads")
csv_file <- file.path(downloads_path, "20250501_CALL-ECL-MEI_Texas_v1.3_Metadata_Megatable.csv")
if (!file.exists(csv_file)) {
  stop(paste0(
    "Cannot find the CSV file.\n",
    "Please make sure you have downloaded '20250501_CALL-ECL-MEI_Texas_v1.3_Metadata_Megatable.csv'\n",
    "and that it is located in your Downloads folder."
  ))
}
mdmt_data <- read.csv(csv_file)

indicator_cols <- c(
  "nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs",
  "nasem_pre_k_academic_readiness",
  "nasem_pre_k_self_regulation_and_attention_skills",
  "nasem_k_12_access_to_effective_teaching",
  "nasem_k_12_access_to_rigorous_coursework",
  "nasem_k_12_curricular_breadth",
  "nasem_k_12_access_to_high_quality_academic_supports",
  "nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation",
  "nasem_k_12_school_climate",
  "nasem_k_12_nonexclusionary_discipline_practices",
  "nasem_k_12_nonacademic_supports_for_student_success",
  "nasem_k_12_engagement_in_schooling",
  "nasem_k_12_performance_in_coursework",
  "nasem_k_12_performance_on_tests",
  "nasem_ea_on_time_graduation",
  "nasem_ea_postsecondary_readiness"
)
file_attr_cols <- c("id", "actual_file_name", "academic_year", "data_system_for_download")

# Color Schemes and Mapping
COLOR_INDICATOR_PALETTE <- c(
    "#BC808D", "#bbb434", "#BEBADA", "#FB8072", "#556b2f", "#FDB462",
    "#20b2aa", "#FCCDE5", "#4443D9", "#BC80BD", "#993300", "#F31D6F",
    "#440154", "#234151", "#390000", "#5DC863"
)
names(COLOR_INDICATOR_PALETTE) <- indicator_cols 

COLOR_FILE_NEW_LIGHT <- "#81D4C6"
COLOR_FILE_BORDER_LIGHT <- "#60B8AE"
WHITE_BORDER <- "#FFFFFF"

COLOR_EDGE_NEUTRAL <- "#D3D3D3" 
COLOR_EDGE_HIGHLIGHT <- "#A9A9A9" 

indicator_map <- data.frame(
    id = indicator_cols,
    full_label = c(
        "Pre-K Access to and Participation in High-Quality Pre-K Programs",
        "Pre-K Academic Readiness",
        "Pre-K Self-Regulation and Attention Skills",
        "K-12 Access to Effective Teaching",
        "K-12 Access to Rigorous Coursework",
        "K-12 Curricular Breadth",
        "K-12 Access to High-Quality Academic Supports",
        "K-12 Students’ Exposure to Racial, Ethnic, and Economic Segregation",
        "K-12 School Climate",
        "K-12 Nonexclusionary Discipline Practices",
        "K-12 Nonacademic Supports for Student Success",
        "K-12 Engagement in Schooling",
        "K-12 Performance in Coursework",
        "K-12 Performance on Tests",
        "On-Time Graduation",
        "Postsecondary Readiness"
    ),
    color_bg = COLOR_INDICATOR_PALETTE,
    stringsAsFactors = FALSE
)

# Data Preprocessing and Network Construction
file_attributes <- mdmt_data %>%
    select(all_of(file_attr_cols)) %>%
    mutate(id = as.character(id)) %>%
    rename(label = id, title_file = actual_file_name)

connections_long <- mdmt_data %>%
    select(all_of(c("id", indicator_cols))) %>%
    pivot_longer(
        cols = all_of(indicator_cols),
        names_to = "Indicator",
        values_to = "Value"
    ) %>%
    mutate(
        id = as.character(id),
        Value = as.numeric(as.character(Value))
    ) %>%
    filter(Value %in% c(1, 2, 3, 4))

edges <- connections_long %>%
    select(from = id, to = Indicator)

all_nodes <- bind_rows(
    data.frame(id = file_attributes$label, group = "File", stringsAsFactors = FALSE),
    data.frame(id = indicator_cols, group = "Indicator", stringsAsFactors = FALSE)
) %>%
    distinct(id, .keep_all = TRUE)

g <- graph_from_data_frame(d = edges, vertices = all_nodes, directed = FALSE)
indicator_degrees <- degree(g, v = V(g)[V(g)$group == "Indicator"], mode = "all")

# Year Range Logic
get_compact_year_range <- function(years) {
    years <- years[!is.na(years)]
    years <- as.integer(years)
    years <- years[!is.na(years)]

    if (length(years) == 0) {
        return("No valid year data")
    }

    years <- sort(unique(years))
    min_year <- min(years)
    max_year <- max(years)

    if (min_year == max_year) {
        return(as.character(min_year))
    }

    full_range <- seq(min_year, max_year)
    missing_years <- setdiff(full_range, years)

    year_range_text <- paste0(min_year, " - ", max_year)

    if (length(missing_years) > 0) {
        missing_text <- paste(missing_years, collapse = ", ")
        return(paste0(year_range_text, " (Missing: ", missing_text, ")"))
    } else {
        return(year_range_text)
    }
}

# Prepare visNetwork Node Data
# Setting node sizes and colors
nodes_vis <- all_nodes %>%
    left_join(file_attributes, by = c("id" = "label")) %>%
    left_join(indicator_map %>% select(id, full_label, color_bg), by = "id") %>%
    distinct(id, .keep_all = TRUE) %>%
    mutate(
        # Node colors and white border
        color.background = case_when(
            group == "File" ~ COLOR_FILE_NEW_LIGHT,
            group == "Indicator" ~ color_bg
        ),
        color.border = WHITE_BORDER, 
        borderWidth = 3,
        
        # Highlight colors 
        color.highlight.background = color.background,
        color.highlight.border = WHITE_BORDER,
        
        # Node size
        value = case_when(
            group == "Indicator" ~ indicator_degrees[id] * 7 + 20,
            group == "File" ~ 15
        ),
        
        # Labels
        label = case_when(
            group == "Indicator" ~ full_label,
            group == "File" ~ ""
        ),
        
        font.color = case_when(
            group == "Indicator" ~ color_bg,
            TRUE ~ "transparent"
        ),
        font.size = ifelse(group == "Indicator", 18, 12), 

        dropdown_label = ifelse(group == "Indicator", full_label, "") 
    )

# Calculate Indicator Hover Info
for (ind in indicator_cols) {
    connected_files_id <- connections_long %>%
      filter(Indicator == ind) %>%
      pull(id) %>% 
      unique()

    connected_files <- mdmt_data %>%
        mutate(id = as.character(id)) %>%
        filter(id %in% connected_files_id) %>%
        distinct(id, data_system_for_download, .keep_all = TRUE) %>%
        select(Year_Str = academic_year, data_system_for_download) %>%
        mutate(
            Year = as.integer(str_extract(Year_Str, "^\\d{4}"))
        )

    ds_stats <- connected_files %>% group_by(data_system_for_download) %>% summarise(Count = n())

    year_vector <- connected_files %>% pull(Year) %>% unique()
    compact_year_text <- get_compact_year_range(year_vector)

    ds_text <- paste(ds_stats$data_system_for_download, ds_stats$Count, sep = ": ", collapse = "<br>")
    final_title <- paste0(
        "<b>Indicator Name:</b> ", indicator_map$full_label[indicator_map$id == ind],
        "<br><b>Total Connected Files:</b> ", length(connected_files_id),
        "<br><b>Year Span:</b> ", compact_year_text,
        "<br><b>Data System Distribution:</b><br>", ds_text
    )

    nodes_vis[nodes_vis$id == ind, "title"] <- final_title
}

# Handle File Node Title
nodes_vis <- nodes_vis %>%
    mutate(
        title = ifelse(group == "File",
                       paste0("<b>File ID:</b> ", id,
                              "<br><b>Name:</b> ", title_file,
                              "<br><b>Year:</b> ", academic_year,
                              "<br><b>System:</b> ", data_system_for_download),
                       title)
    ) %>%
    select(-title_file, -full_label, -color_bg, -academic_year)

# Prepare visNetwork edge data
edges_vis_colored <- edges %>%
    mutate(
        id = 1:n(),
        color = list(list(color = COLOR_EDGE_NEUTRAL, highlight = COLOR_EDGE_HIGHLIGHT))
    ) %>%
    select(from, to, id, color)

# Prepare the number of files data
total_files <- length(unique(mdmt_data$id))
connected_files <- length(unique(edges$from))

# Only keep connected node
nodes_vis <- nodes_vis %>%
  filter(id %in% unique(c(edges$from, edges$to)))

# Generate visNetwork Visualization
visNetwork(nodes_vis, edges_vis_colored,
           main = paste0(
    "<div style='text-align:center;'>",
    "<span style='font-size:20px; font-weight:bold;'>Texas SNA</span><br>",
    "<span style='font-size:13px; font-weight:normal;'>",
    "Total N of Datasets: ", total_files,
    "</span></div>"
  ))  %>%
  visIgraphLayout(randomSeed = 123) %>%
  visOptions(
    highlightNearest = list(enabled = TRUE, degree = 1, hover = TRUE)
  ) %>%
  visPhysics(
    solver = "barnesHut",
    barnesHut = list(
      gravitationalConstant = -15000,
      springLength = 100
    )
  ) %>%
  visNodes(
    shape = "dot",
    font = list(
      color = nodes_vis$font.color,
      size = nodes_vis$font.size,
      face = "arial",
      vadjust = 0, 
      strokeWidth = 1, 
      strokeColor = "white"
    ),
    scaling = list(
      min = 15,
      max = 60,
      label = list(
        enabled = TRUE,
        min = 19,
        max = 30,
        maxVisible = 26
      )
    )
  ) %>%
  visEdges(
    color = list(inherit = FALSE),
    smooth = FALSE,
    hover = list(color = COLOR_EDGE_HIGHLIGHT),
    selection = list(color = COLOR_EDGE_HIGHLIGHT)
  )




North Carolina

Static

# Load packages
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(ggrepel)
library(magick)
library(grid)

# Load data
downloads_path <- file.path(Sys.getenv("HOME"), "Downloads")
csv_file <- file.path(downloads_path, "20250501_CALL-ECL-MEI_NorthCarolina_v1.3_Metadata_Megatable.csv")
if (!file.exists(csv_file)) {
  stop(paste0(
    "Cannot find the CSV file.\n",
    "Please make sure you have downloaded '20250501_CALL-ECL-MEI_NorthCarolina_v1.3_Metadata_Megatable.csv'\n",
    "and that it is located in your Downloads folder."
  ))
}
mdmt_data <- read.csv(csv_file)

# Define indicator columns
indicator_cols <- c(
  "nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs",
  "nasem_pre_k_academic_readiness",
  "nasem_pre_k_self_regulation_and_attention_skills",
  "nasem_k_12_access_to_effective_teaching",
  "nasem_k_12_access_to_rigorous_coursework",
  "nasem_k_12_curricular_breadth",
  "nasem_k_12_access_to_high_quality_academic_supports",
  "nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation",
  "nasem_k_12_school_climate",
  "nasem_k_12_nonexclusionary_discipline_practices",
  "nasem_k_12_nonacademic_supports_for_student_success",
  "nasem_k_12_engagement_in_schooling",
  "nasem_k_12_performance_in_coursework",
  "nasem_k_12_performance_on_tests",
  "nasem_ea_on_time_graduation",
  "nasem_ea_postsecondary_readiness"
)

# Build edges
connections_long <- mdmt_data %>%
  select(all_of(c("id", indicator_cols))) %>%
  pivot_longer(cols = all_of(indicator_cols), names_to = "Indicator", values_to = "Value") %>%
  mutate(Value = as.numeric(as.character(Value))) %>%
  filter(Value %in% c(1, 2, 3, 4))
edges <- connections_long %>% select(from = id, to = Indicator)

# Build node table
all_nodes <- bind_rows(
  data.frame(id = unique(connections_long$id), group = "File"),
  data.frame(id = indicator_cols, group = "Indicator")
)

# Build graph
g <- graph_from_data_frame(d = edges, vertices = all_nodes, directed = TRUE)
connected_datasets <- unique(edges$from)
connected_indicators <- unique(edges$to)
isolated_datasets <- setdiff(unique(mdmt_data$id), connected_datasets)
isolated_indicators <- setdiff(indicator_cols, connected_indicators)

# Colors
indicator_colors <- c(
  'nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs' = '#BC808D',
  'nasem_pre_k_academic_readiness' = '#9C9A2A',
  'nasem_pre_k_self_regulation_and_attention_skills' = '#9F94C8',
  'nasem_k_12_access_to_effective_teaching' = '#FB8072',
  'nasem_k_12_access_to_rigorous_coursework' = '#556b2f',
  'nasem_k_12_curricular_breadth' = '#FDB462',
  'nasem_k_12_access_to_high_quality_academic_supports' = '#20b2aa',
  'nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation' = '#E67AB8',
  'nasem_k_12_school_climate' = '#4443D9',
  'nasem_k_12_nonexclusionary_discipline_practices' = '#BC80BD',
  'nasem_k_12_nonacademic_supports_for_student_success' = '#993300',
  'nasem_k_12_engagement_in_schooling' = '#F31D6F',
  'nasem_k_12_performance_in_coursework' = '#2E86AB',
  'nasem_k_12_performance_on_tests' = 'darkblue',
  'nasem_ea_on_time_graduation' = '#3FA9A1',
  'nasem_ea_postsecondary_readiness' = '#339966'
)
dataset_color <- "#A7BED3"

# Build tidygraph
tg <- as_tbl_graph(g) %>%
  mutate(
    degree = degree(g)[name],
    color = ifelse(group == "Indicator", indicator_colors[name], dataset_color),
    size = ifelse(group == "Indicator", log1p(degree) * 3 + 3, 1.1)
  )
set.seed(123)
layout_df <- create_layout(
  tg, layout = "fr", niter = 5000, area = vcount(tg)^3.3, repulserad = vcount(tg)^3
)
layout_df <- layout_df %>%
  mutate(
    name = str_replace_all(name, "nasem_", ""),
    name = str_replace_all(name, "ea_", ""),
    name = str_replace_all(name, "pre_k_", "pre_k_12_")
  )

# Compute metrics for annotations
g_undirected <- as.undirected(g, mode = "collapse")
options(scipen = 999)
avg_indegree <- round(mean(degree(g, mode = "in"), na.rm = TRUE), 2)
avg_closeness <- round(mean(closeness(g_undirected, normalized = FALSE), na.rm = TRUE), 4)
avg_eigen <- round(mean(eigen_centrality(g_undirected)$vector, na.rm = TRUE), 2)
avg_betweenness <- round(mean(betweenness(g_undirected, normalized = FALSE), na.rm = TRUE), 2)
nc_metrics <- data.frame(
  State = "North Carolina",
  Indegree = avg_indegree,
  Closeness = avg_closeness,
  Eigenvector = avg_eigen,
  Betweenness = avg_betweenness
)
avg_indegree <- sprintf("%.2f", mean(degree(g, mode = "in"), na.rm = TRUE))
avg_closeness <- sprintf("%.4f", mean(closeness(g_undirected, normalized = FALSE), na.rm = TRUE))
avg_eigen <- sprintf("%.2f", mean(eigen_centrality(g_undirected)$vector, na.rm = TRUE))
avg_betweenness <- sprintf("%.2f", mean(betweenness(g_undirected, normalized = FALSE), na.rm = TRUE))

total_datasets <- n_distinct(mdmt_data$id)
total_indicators <- n_distinct(edges$to)
x_center <- mean(layout_df$x, na.rm = TRUE)
y_top <- max(layout_df$y, na.rm = TRUE)

# Plot (main)
p_nc <- ggraph(layout_df) +
  geom_edge_link(color = "gray60", width = 0.25, alpha = 0.6) +
  geom_node_point(
    aes(fill = I(color), size = size),
    shape = 21, color = "white", stroke = 0.6, alpha = 1, show.legend = FALSE
  ) +
  geom_text_repel(
    data = subset(layout_df, group == "Indicator"),
    aes(x = x, y = y, label = name, color = I(color)),
    size = 2.3, fontface = "bold", alpha = 0.95,
    box.padding = 0.35, point.padding = 0.2,
    segment.color = "gray70", segment.size = 0.18,
    force = 2.5, max.overlaps = Inf, seed = 123
  ) +
  coord_fixed(ratio = 1, clip = "off") +
  theme_void(base_size = 13) +
  annotate("text", x = x_center, y = y_top + 9, label = "North Carolina",
           fontface = "bold", size = 6.5, hjust = 0.5) +
  annotate("text", x = x_center, y = y_top + 7,
           label = paste0("Total N of Datasets: ", total_datasets,
                          " | Equity Indicators: N = ", total_indicators, " out of 16"),
           size = 3, hjust = 0.5) +
  annotate("text", x = x_center, y = y_top + 5.8,
           label = paste0("Average Indegree: ", avg_indegree,
                          " | Average Closeness: ", avg_closeness,
                          " | Average Betweenness: ", avg_betweenness),
           size = 3, hjust = 0.5)

p_nc

p <- p_nc
# Export
desktop_path <- file.path(Sys.getenv("HOME"), "Desktop")
output_path <- file.path(desktop_path, "SNA_outputs")
if (!dir.exists(output_path)) dir.create(output_path)
state <- "NC"
ggsave(file.path(output_path, paste0(state, "_SNA.png")),
       plot = p, width = 5.5, height = 5.5, dpi = 300, units = "in", bg = "white")
ggsave(file.path(output_path, paste0(state, "_SNA.jpg")),
       plot = p, width = 5.5, height = 5.5, dpi = 300, units = "in", bg = "white")
ggsave(file.path(output_path, paste0(state, "_SNA.svg")),
       plot = p, width = 5.5, height = 5.5, units = "in", bg = "white")
img <- image_read(file.path(output_path, paste0(state, "_SNA.png")))
image_write(img, file.path(output_path, paste0(state, "_SNA.gif")))



Dynamic

# Load packages
library(tidyverse)
library(igraph)
library(visNetwork)
library(RColorBrewer)
library(htmlwidgets)
set.seed(123)

# Data Loading and Initial Setup
downloads_path <- file.path(Sys.getenv("HOME"), "Downloads")
csv_file <- file.path(downloads_path, "20250501_CALL-ECL-MEI_NorthCarolina_v1.3_Metadata_Megatable.csv")
if (!file.exists(csv_file)) {
  stop(paste0(
    "Cannot find the CSV file.\n",
    "Please make sure you have downloaded '20250501_CALL-ECL-MEI_NorthCarolina_v1.3_Metadata_Megatable.csv'\n",
    "and that it is located in your Downloads folder."
  ))
}
mdmt_data <- read.csv(csv_file)

indicator_cols <- c(
  "nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs",
  "nasem_pre_k_academic_readiness",
  "nasem_pre_k_self_regulation_and_attention_skills",
  "nasem_k_12_access_to_effective_teaching",
  "nasem_k_12_access_to_rigorous_coursework",
  "nasem_k_12_curricular_breadth",
  "nasem_k_12_access_to_high_quality_academic_supports",
  "nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation",
  "nasem_k_12_school_climate",
  "nasem_k_12_nonexclusionary_discipline_practices",
  "nasem_k_12_nonacademic_supports_for_student_success",
  "nasem_k_12_engagement_in_schooling",
  "nasem_k_12_performance_in_coursework",
  "nasem_k_12_performance_on_tests",
  "nasem_ea_on_time_graduation",
  "nasem_ea_postsecondary_readiness"
)
file_attr_cols <- c("id", "actual_file_name", "academic_year", "data_system_for_download")

# Color Schemes and Mapping
COLOR_INDICATOR_PALETTE <- c(
    "#BC808D", "#bbb434", "#BEBADA", "#FB8072", "#556b2f", "#FDB462",
    "#20b2aa", "#FCCDE5", "#4443D9", "#BC80BD", "#993300", "#F31D6F",
    "#440154", "#234151", "#390000", "#5DC863"
)
names(COLOR_INDICATOR_PALETTE) <- indicator_cols 

COLOR_FILE_NEW_LIGHT <- "#81D4C6"
COLOR_FILE_BORDER_LIGHT <- "#60B8AE"
WHITE_BORDER <- "#FFFFFF"

COLOR_EDGE_NEUTRAL <- "#D3D3D3" 
COLOR_EDGE_HIGHLIGHT <- "#A9A9A9" 

indicator_map <- data.frame(
    id = indicator_cols,
    full_label = c(
        "Pre-K Access to and Participation in High-Quality Pre-K Programs",
        "Pre-K Academic Readiness",
        "Pre-K Self-Regulation and Attention Skills",
        "K-12 Access to Effective Teaching",
        "K-12 Access to Rigorous Coursework",
        "K-12 Curricular Breadth",
        "K-12 Access to High-Quality Academic Supports",
        "K-12 Students’ Exposure to Racial, Ethnic, and Economic Segregation",
        "K-12 School Climate",
        "K-12 Nonexclusionary Discipline Practices",
        "K-12 Nonacademic Supports for Student Success",
        "K-12 Engagement in Schooling",
        "K-12 Performance in Coursework",
        "K-12 Performance on Tests",
        "On-Time Graduation",
        "Postsecondary Readiness"
    ),
    color_bg = COLOR_INDICATOR_PALETTE,
    stringsAsFactors = FALSE
)

# Data Preprocessing and Network Construction
file_attributes <- mdmt_data %>%
    select(all_of(file_attr_cols)) %>%
    mutate(id = as.character(id)) %>%
    rename(label = id, title_file = actual_file_name)

connections_long <- mdmt_data %>%
    select(all_of(c("id", indicator_cols))) %>%
    pivot_longer(
        cols = all_of(indicator_cols),
        names_to = "Indicator",
        values_to = "Value"
    ) %>%
    mutate(
        id = as.character(id),
        Value = as.numeric(as.character(Value))
    ) %>%
    filter(Value %in% c(1, 2, 3, 4))

edges <- connections_long %>%
    select(from = id, to = Indicator)

all_nodes <- bind_rows(
    data.frame(id = file_attributes$label, group = "File", stringsAsFactors = FALSE),
    data.frame(id = indicator_cols, group = "Indicator", stringsAsFactors = FALSE)
) %>%
    distinct(id, .keep_all = TRUE)

g <- graph_from_data_frame(d = edges, vertices = all_nodes, directed = FALSE)
indicator_degrees <- degree(g, v = V(g)[V(g)$group == "Indicator"], mode = "all")

# Year Range Logic
get_compact_year_range <- function(years) {
    years <- years[!is.na(years)]
    years <- as.integer(years)
    years <- years[!is.na(years)]

    if (length(years) == 0) {
        return("No valid year data")
    }

    years <- sort(unique(years))
    min_year <- min(years)
    max_year <- max(years)

    if (min_year == max_year) {
        return(as.character(min_year))
    }

    full_range <- seq(min_year, max_year)
    missing_years <- setdiff(full_range, years)

    year_range_text <- paste0(min_year, " - ", max_year)

    if (length(missing_years) > 0) {
        missing_text <- paste(missing_years, collapse = ", ")
        return(paste0(year_range_text, " (Missing: ", missing_text, ")"))
    } else {
        return(year_range_text)
    }
}

# Prepare visNetwork Node Data
# Setting node sizes and colors
nodes_vis <- all_nodes %>%
    left_join(file_attributes, by = c("id" = "label")) %>%
    left_join(indicator_map %>% select(id, full_label, color_bg), by = "id") %>%
    distinct(id, .keep_all = TRUE) %>%
    mutate(
        # Node colors and white border
        color.background = case_when(
            group == "File" ~ COLOR_FILE_NEW_LIGHT,
            group == "Indicator" ~ color_bg
        ),
        color.border = WHITE_BORDER, 
        borderWidth = 3,
        
        # Highlight colors 
        color.highlight.background = color.background,
        color.highlight.border = WHITE_BORDER,
        
        # Node size
        value = case_when(
            group == "Indicator" ~ indicator_degrees[id] * 7 + 20,
            group == "File" ~ 15
        ),
        
        # Labels
        label = case_when(
            group == "Indicator" ~ full_label,
            group == "File" ~ ""
        ),
        
        font.color = case_when(
            group == "Indicator" ~ color_bg,
            TRUE ~ "transparent"
        ),
        font.size = ifelse(group == "Indicator", 18, 12), 

        dropdown_label = ifelse(group == "Indicator", full_label, "") 
    )

# Calculate Indicator Hover Info
for (ind in indicator_cols) {
    connected_files_id <- connections_long %>%
      filter(Indicator == ind) %>%
      pull(id) %>% 
      unique()

    connected_files <- mdmt_data %>%
        mutate(id = as.character(id)) %>%
        filter(id %in% connected_files_id) %>%
        distinct(id, data_system_for_download, .keep_all = TRUE) %>%
        select(Year_Str = academic_year, data_system_for_download) %>%
        mutate(
            Year = as.integer(str_extract(Year_Str, "^\\d{4}"))
        )

    ds_stats <- connected_files %>% group_by(data_system_for_download) %>% summarise(Count = n())

    year_vector <- connected_files %>% pull(Year) %>% unique()
    compact_year_text <- get_compact_year_range(year_vector)

    ds_text <- paste(ds_stats$data_system_for_download, ds_stats$Count, sep = ": ", collapse = "<br>")
    final_title <- paste0(
        "<b>Indicator Name:</b> ", indicator_map$full_label[indicator_map$id == ind],
        "<br><b>Total Connected Files:</b> ", length(connected_files_id),
        "<br><b>Year Span:</b> ", compact_year_text,
        "<br><b>Data System Distribution:</b><br>", ds_text
    )

    nodes_vis[nodes_vis$id == ind, "title"] <- final_title
}

# Handle File Node Title
nodes_vis <- nodes_vis %>%
    mutate(
        title = ifelse(group == "File",
                       paste0("<b>File ID:</b> ", id,
                              "<br><b>Name:</b> ", title_file,
                              "<br><b>Year:</b> ", academic_year,
                              "<br><b>System:</b> ", data_system_for_download),
                       title)
    ) %>%
    select(-title_file, -full_label, -color_bg, -academic_year)

# Prepare visNetwork edge data
edges_vis_colored <- edges %>%
    mutate(
        id = 1:n(),
        color = list(list(color = COLOR_EDGE_NEUTRAL, highlight = COLOR_EDGE_HIGHLIGHT))
    ) %>%
    select(from, to, id, color)

# Prepare the number of files data
total_files <- length(unique(mdmt_data$id))
connected_files <- length(unique(edges$from))

# Only keep connected node
nodes_vis <- nodes_vis %>%
  filter(id %in% unique(c(edges$from, edges$to)))

# Generate visNetwork Visualization
visNetwork(nodes_vis, edges_vis_colored,
           main = paste0(
    "<div style='text-align:center;'>",
    "<span style='font-size:20px; font-weight:bold;'>North Carolina SNA</span><br>",
    "<span style='font-size:13px; font-weight:normal;'>",
    "Total N of Datasets: ", total_files,
    "</span></div>"
  ))  %>%
  visIgraphLayout(randomSeed = 123) %>%
  visOptions(
    highlightNearest = list(enabled = TRUE, degree = 1, hover = TRUE)
  ) %>%
  visPhysics(
    solver = "barnesHut",
    barnesHut = list(
      gravitationalConstant = -15000,
      springLength = 100
    )
  ) %>%
  visNodes(
    shape = "dot",
    font = list(
      color = nodes_vis$font.color,
      size = nodes_vis$font.size,
      face = "arial",
      vadjust = 0, 
      strokeWidth = 1, 
      strokeColor = "white"
    ),
    scaling = list(
      min = 15,
      max = 60,
      label = list(
        enabled = TRUE,
        min = 19,
        max = 30,
        maxVisible = 26
      )
    )
  ) %>%
  visEdges(
    color = list(inherit = FALSE),
    smooth = FALSE,
    hover = list(color = COLOR_EDGE_HIGHLIGHT),
    selection = list(color = COLOR_EDGE_HIGHLIGHT)
  )




Washington, D.C.

Static

# Load packages
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(ggrepel)
library(magick)
library(grid)

# Load data
downloads_path <- file.path(Sys.getenv("HOME"), "Downloads")
csv_file <- file.path(downloads_path, "20250501_CALL-ECL-MEI_WashingtonDC_v1.3_Metadata_Megatable.csv")
if (!file.exists(csv_file)) {
  stop(paste0(
    "Cannot find the CSV file.\n",
    "Please make sure you have downloaded '20250501_CALL-ECL-MEI_WashingtonDC_v1.3_Metadata_Megatable.csv'\n",
    "and that it is located in your Downloads folder."
  ))
}
mdmt_data <- read.csv(csv_file)

# Define indicator columns
indicator_cols <- c(
  "nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs",
  "nasem_pre_k_academic_readiness",
  "nasem_pre_k_self_regulation_and_attention_skills",
  "nasem_k_12_access_to_effective_teaching",
  "nasem_k_12_access_to_rigorous_coursework",
  "nasem_k_12_curricular_breadth",
  "nasem_k_12_access_to_high_quality_academic_supports",
  "nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation",
  "nasem_k_12_school_climate",
  "nasem_k_12_nonexclusionary_discipline_practices",
  "nasem_k_12_nonacademic_supports_for_student_success",
  "nasem_k_12_engagement_in_schooling",
  "nasem_k_12_performance_in_coursework",
  "nasem_k_12_performance_on_tests",
  "nasem_ea_on_time_graduation",
  "nasem_ea_postsecondary_readiness"
)

# Build edges
connections_long <- mdmt_data %>%
  select(all_of(c("id", indicator_cols))) %>%
  pivot_longer(cols = all_of(indicator_cols), names_to = "Indicator", values_to = "Value") %>%
  mutate(Value = as.numeric(as.character(Value))) %>%
  filter(Value %in% c(1,2,3, 4))
edges <- connections_long %>% select(from = id, to = Indicator)

# Build node table
all_nodes <- bind_rows(
  data.frame(id = unique(connections_long$id), group = "File"),
  data.frame(id = indicator_cols, group = "Indicator")
)

# Build graph
g <- graph_from_data_frame(d = edges, vertices = all_nodes, directed = TRUE)
connected_datasets <- unique(edges$from)
connected_indicators <- unique(edges$to)
isolated_datasets <- setdiff(unique(mdmt_data$id), connected_datasets)
isolated_indicators <- setdiff(indicator_cols, connected_indicators)

# Colors
indicator_colors <- c(
  'nasem_pre_k_access_to_and_participation_in_high_quality_pre_k_programs' = '#BC808D',
  'nasem_pre_k_academic_readiness' = '#9C9A2A',
  'nasem_pre_k_self_regulation_and_attention_skills' = '#9F94C8',
  'nasem_k_12_access_to_effective_teaching' = '#FB8072',
  'nasem_k_12_access_to_rigorous_coursework' = '#556b2f',
  'nasem_k_12_curricular_breadth' = '#FDB462',
  'nasem_k_12_access_to_high_quality_academic_supports' = '#20b2aa',
  'nasem_k_12_students_exposure_to_racial_ethnic_and_economic_segregation' = '#E67AB8',
  'nasem_k_12_school_climate' = '#4443D9',
  'nasem_k_12_nonexclusionary_discipline_practices' = '#BC80BD',
  'nasem_k_12_nonacademic_supports_for_student_success' = '#993300',
  'nasem_k_12_engagement_in_schooling' = '#F31D6F',
  'nasem_k_12_performance_in_coursework' = '#2E86AB',
  'nasem_k_12_performance_on_tests' = 'darkblue',
  'nasem_ea_on_time_graduation' = '#3FA9A1',
  'nasem_ea_postsecondary_readiness' = '#339966'
)
dataset_color <- "#A7BED3"

# Build tidygraph
tg <- as_tbl_graph(g) %>%
  mutate(
    degree = degree(g)[name],
    color = ifelse(group == "Indicator", indicator_colors[name], dataset_color),
    size = ifelse(group == "Indicator", log1p(degree) * 3 + 3, 1.1)
  )
set.seed(123)
layout_df <- create_layout(
  tg, layout = "fr", niter = 5000, area = vcount(tg)^3.3, repulserad = vcount(tg)^3
)
layout_df <- layout_df %>%
  mutate(
    name = str_replace_all(name, "nasem_", ""),
    name = str_replace_all(name, "ea_", ""),
    name = str_replace_all(name, "pre_k_", "pre_k_12_")
  )

# Compute metrics for annotations
g_undirected <- as.undirected(g, mode = "collapse")
options(scipen = 999)
avg_indegree <- round(mean(degree(g, mode = "in"), na.rm = TRUE), 2)
avg_closeness <- round(mean(closeness(g_undirected, normalized = FALSE), na.rm = TRUE), 4)
avg_eigen <- round(mean(eigen_centrality(g_undirected)$vector, na.rm = TRUE), 2)
avg_betweenness <- round(mean(betweenness(g_undirected, normalized = FALSE), na.rm = TRUE), 2)
dc_metrics <- data.frame(
  State = "Washington, D.C.",
  Indegree = avg_indegree,
  Closeness = avg_closeness,
  Eigenvector = avg_eigen,
  Betweenness = avg_betweenness
)
avg_indegree <- sprintf("%.2f", mean(degree(g, mode = "in"), na.rm = TRUE))
avg_closeness <- sprintf("%.4f", mean(closeness(g_undirected, normalized = FALSE), na.rm = TRUE))
avg_eigen <- sprintf("%.2f", mean(eigen_centrality(g_undirected)$vector, na.rm = TRUE))
avg_betweenness <- sprintf("%.2f", mean(betweenness(g_undirected, normalized = FALSE), na.rm = TRUE))

total_datasets <- n_distinct(mdmt_data$id)
total_indicators <- n_distinct(edges$to)
x_center <- mean(layout_df$x, na.rm = TRUE)
y_top <- max(layout_df$y, na.rm = TRUE)

# Plot (main)
p_dc <- ggraph(layout_df) +
  geom_edge_link(color = "gray60", width = 0.25, alpha = 0.6) +
  geom_node_point(
    aes(fill = I(color), size = size),
    shape = 21, color = "white", stroke = 0.6, alpha = 1, show.legend = FALSE
  ) +
  geom_text_repel(
    data = subset(layout_df, group == "Indicator"),
    aes(x = x, y = y, label = name, color = I(color)),
    size = 2.3, fontface = "bold", alpha = 0.95,
    box.padding = 0.35, point.padding = 0.2,
    segment.color = "gray70", segment.size = 0.18,
    force = 2.5, max.overlaps = Inf, seed = 123
  ) +
  coord_fixed(ratio = 1, clip = "off") +
  theme_void(base_size = 13) +
  annotate("text", x = x_center, y = y_top + 9, label = "Washington, D.C.",
           fontface = "bold", size = 6.5, hjust = 0.5) +
  annotate("text", x = x_center, y = y_top + 7,
           label = paste0("Total N of Datasets: ", total_datasets,
                          " | Equity Indicators: N = ", total_indicators, " out of 16"),
           size = 3, hjust = 0.5) +
  annotate("text", x = x_center, y = y_top + 5.8,
           label = paste0("Average Indegree: ", avg_indegree,
                          " | Average Closeness: ", avg_closeness,
                          " | Average Betweenness: ", avg_betweenness),
           size = 3, hjust = 0.5)

p_dc