×
When to use each method:
PCA — Best for exploring linear relationships. Fast, deterministic. Shows which variables (loadings) drive separation. Use when groups separate along linear axes.
t-SNE — Reveals clusters in complex, non-linear data. Good for visualizing distinct groups. Stochastic: re-runs may differ. Adjust Perplexity (lower = tighter clusters, higher = more global view). Try values between 5 and 50.
UMAP — Similar to t-SNE but better preserves global structure. Faster on large datasets. Adjust Neighbors (local detail) and Min Dist (cluster tightness). Lower values give tighter, more separated clusters.
Tip: There is no single “correct” parameter value. Try different settings and pick the one that gives the clearest group separation — this is standard practice.
Statistics used:
PCA: Eigendecomposition of the covariance matrix. Data is centered. Variance explained (%) shown per component. Loadings show eigenvector coefficients.
t-SNE: Converts pairwise distances to probabilities, then optimizes a 2D layout where similar points stay close. Stochastic optimization.
UMAP: Builds a nearest-neighbor graph and optimizes a 2D layout preserving both local and global structure. Initialized from PCA coordinates.
Loadings (t-SNE/UMAP): Shows Pearson correlations between original features and embedding coordinates. Note: These are less reliable than PCA loadings because t-SNE/UMAP distort distances non-linearly. Arrows often cluster in similar directions and should be interpreted with caution. PCA loadings are generally preferred for identifying driving variables.
Implementation:
All methods implemented in client-side JavaScript (pca.js). Statistical tests in other modes use jStat.
Methods text for manuscripts:
“Dimensionality reduction was performed using [PCA / t-SNE (perplexity = X) / UMAP (n_neighbors = X, min_dist = X)]. Data were z-score standardized prior to embedding. Visualizations were generated using Visualize (client-side JavaScript).”