Contact information: Email: Website: https://www.tc.columbia.edu/faculty/ab3764/

Citation for this R Markdown:

Online Supplement 2 Appendix for the article published in The High School Journal:

Bowers, A.J., Zhao, Y., Ho, E. (2023) Towards Hierarchical Cluster Analysis Heatmaps as Visual Data Analysis of Entire Student Cohort Longitudinal Trajectories and Outcomes from Grade 9 through College. The High School Journal.

Note: Please see the full paper for details.

Background reading on HCA Heatmaps, with application in Education Research:

This Online Supplement 2 Appendix overviews the background and reasoning behind HCA heatmaps for their usefulness in education research with an example: Bowers, A.J. (2010) Analyzing the Longitudinal K-12 Grading Histories of Entire Cohorts of Students: Grades, Data Driven Decision Making, Dropping Out and Hierarchical Cluster Analysis. Practical Assessment, Research & Evaluation (PARE), 15(7), 1-18. https://doi.org/10.7275/r4zq-9c31

This paper is a good example for how cluster analysis heatmaps are useful for learning analytics clickstream logfile data, especially from Learning Management Systems (LMSs): Lee, J., Recker, M., Bowers, A.J., Yuan, M. (2016). Hierarchical Cluster Analysis Heatmaps and Pattern Analysis: An Approach for Visualizing Learning Management System Interaction Data. Presented at the annual International Conference on Educational Data Mining (EDM), Raleigh, NC: June 2016. https://www.educationaldatamining.org/EDM2016/proceedings/paper_34.pdf

This paper gives an example of using cluster analysis heatmaps to visualize how students interact and collaboratively build together in a hybrid digital/physical educational game museum exhibit at the New York Hall of Science: Jorion, N., Roberts, J., Bowers, A.J., Tissenbaum, M., Lyons, L., Kuma, V., Berland, M. (2020) Uncovering Patterns in Constructionist Collaborative Learning Activities via Cluster Analysis of Museum Exhibit Log Files. Frontline Learning Research, 8(6), p.77-87. https://doi.org/10.14786/flr.v8i6.597

A tutorial on creating HCA Heatmps in R

This document relies heavily on the work in ComplexHeatmap() by Zuguang Gu. Please refer to the documentation reference for more information: https://jokergoo.github.io/ComplexHeatmap-reference/book/

Copy paste the below to install packages directly from github if they are not available in CRAN.

library(devtools) install_github(“jokergoo/ComplexHeatmap”)

install_github(“cran/hopach”)

install.packages(“circlize”)

Load packages

library(ComplexHeatmap)
## Loading required package: grid
## ========================================
## ComplexHeatmap version 2.12.0
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
## 
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##   genomic data. Bioinformatics 2016.
## 
## The new InteractiveComplexHeatmap package can directly export static 
## complex heatmaps into an interactive Shiny app with zero effort. Have a try!
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================
library(circlize)
## ========================================
## circlize version 0.4.15
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: https://jokergoo.github.io/circlize_book/book/
## 
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization
##   in R. Bioinformatics 2014.
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(circlize))
## ========================================
library(hopach)
## Loading required package: cluster
## Loading required package: Biobase
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

Read in the data for “mtcars” dataset

We’ll use the included R dataset “mtcars” which is a dataset from Motor Trend with data on different types of cars from 1974, the era of the “muscle car” in the USA as well as the beginning of the trend of compact and fuel efficient cars, with the first shocks of the energy crisis.

This dataset is interesting because it has a lot of variance with cars included that are large “gas guzzling” muscle cars, to economic and eco-friendly cars and a few in between. https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/mtcars