About Ohio Transparency Maps
Overview
Ohio Transparency Maps tracks how money flows through Ohio state politics. The site combines campaign finance filings, legislative records, election results, and machine learning analytics into an interactive dashboard covering 134 state legislative districts (33 Senate + 99 House).
Pages
| Page | What it shows |
|---|---|
| Home | Interactive choropleth map, KPI cards, top recipients, compare views, donation flow arcs, and analytics highlights |
| Dashboard | Full analysis with filters, detail panel, and 12 chart types across contributions, networks, industry, and spending |
| Network | WebGL graph of 2,500+ donors and candidates, donor co-funding projections, industry influence map, GNN predictions |
| Legislation | Bills, vote alignment heatmaps, co-sponsorship networks, and donation-legislation correlation |
| Analysis | ML-powered insights: SHAP explainability, UMAP voting clusters, DIME ideology scores, PAC alignment, association rules |
| Explore | Drag-and-drop dashboard builder with configurable chart widgets |
| Playground | SQL console, chart builder, and D3 sketch pad for ad-hoc exploration |
Data Sources
- Campaign Finance: Ohio Secretary of State Campaign Finance Search (2022--2025)
- Candidate Contributions (CAC_CON), PAC Expenditures (PAC_EXP), Candidate Expenditures (CAC_EXP), PPC Contributions (PPC_CON)
- Legislative Data: LegiScan API — bills, votes, sponsors, committees (bulk dataset download)
- Election Results: MIT Election Data + Science Lab (MEDSL) — 2018, 2022, and 2024 state legislative results
- Ideology Scores: DIME (Database on Ideology, Money in Politics, and Elections) — CFscores for campaign finance-based ideology estimation
- District Boundaries: Ohio Redistricting Commission adopted maps (2023/2024)
- Census Data: American Community Survey 5-Year Estimates (2022)
- County Boundaries: U.S. Census TIGER/Line Shapefiles
Methodology
Contribution Tiers
Districts are colored by total campaign contributions received:
| Tier | Range | Color |
|---|---|---|
| None | $0 | Light gray |
| Low | $1 -- $100K | Light blue |
| Medium | $100K -- $300K | Medium blue |
| High | $300K -- $600K | Dark blue |
| Very High | $600K+ | Darkest blue |
Donor Name Normalization
Organizations often appear under multiple name variants in filings. We maintain a curated alias mapping that maps ~300 raw names to canonical forms (e.g., "OHIO EDUCATION ASSN" and "OEA" both map to "Ohio Education Association").
Donor Industry Classification
Donor organizations are classified into 16 industry sectors (Energy, Healthcare, Labor/Unions, Real Estate, Legal, Finance & Insurance, Construction, Manufacturing, Technology, Agriculture, Education, Transportation, Telecom, Retail & Services, Lobbying & Gov Affairs, Other). Classification was performed using an LLM (Claude) based on organization names, then manually reviewed for accuracy.
Expenditure Category Classification
Campaign expenditure purposes are classified into 12 spending categories (Media & Advertising, Consulting, Events & Fundraising, Travel, Staff & Payroll, Office & Operations, Legal & Compliance, Printing & Mail, Polling & Research, Direct Voter Contact, Contributions to Others, Other). Classification was performed using an LLM (Claude) based on expenditure purpose descriptions, then manually reviewed.
Interactive Filtering
The dashboard sidebar supports real-time filtering across all visualizations. Filters include donor name search, contribution amount range, date range, party, and industry sector. When filters are active, charts re-aggregate from raw contribution and expenditure records (~28K contributions + ~16K expenditures) in the browser via DuckDB-WASM. Filter state is synced to the URL hash for shareable filtered views.
Network Analysis
The force-directed network, Sankey diagram, and chord diagram all show the top 20 donors (by total amount) connected to legislative candidates. Links represent aggregated donation totals across all filings.
The full network explorer (Network page) renders 2,500+ nodes using Sigma.js (WebGL). Build-time graph metrics are pre-computed with NetworkX: Louvain community detection for color grouping, PageRank for node sizing, and betweenness centrality for identifying bridge nodes. Bipartite projections show donor co-funding patterns and candidate co-donor networks.
Graph Neural Network Analytics
A 2-layer heterogeneous GraphSAGE encoder is trained on the donor-candidate bipartite graph to learn node embeddings that capture structural patterns in campaign finance flows. The model operates on a HeteroData graph with two node types (donors and candidates) and bidirectional donation edges.
Node features:
- Donors: log-scaled total contributions, PageRank, betweenness centrality, and a one-hot industry sector vector (16 sectors)
- Candidates: log-scaled total contributions, PageRank, betweenness centrality, and a one-hot party vector
Training: The model is trained with a link prediction objective (binary cross-entropy) using an 80/10/10 train/val/test split via RandomLinkSplit. Training runs for 100 epochs with Adam optimizer on a single GPU, with the best checkpoint selected by validation loss.
Link prediction: After training, all non-existing donor-candidate pairs are scored by dot-product similarity in embedding space. The top 500 predicted links represent the most likely future donation connections. GNN scores are supplemented by three classical baselines: Jaccard coefficient, Adamic-Adar index, and preferential attachment.
Anomaly detection: An Isolation Forest (100 estimators, 5% contamination) is fitted on the 32-dimensional GNN embeddings to flag structurally unusual nodes — donors or candidates whose network neighborhood patterns are statistical outliers.
Visualization: Node embeddings are projected to 2D using UMAP (15 neighbors) for the embedding scatter plot. Anomalies are highlighted in red; other nodes are colored by Louvain community assignment.
Legislative Data
Legislative bill, vote, and sponsorship data is sourced from the LegiScan API via bulk dataset downloads. Legislators are matched to campaign finance candidates by last name, district number, and chamber. The Legislation page shows vote alignment, co-sponsorship networks, and correlations between campaign donations and bill sponsorship.
Bill Similarity
Bill similarity is computed using Jaccard indices on two dimensions: shared subject tags and shared sponsors. The combined similarity score is a weighted average of subject similarity (0.6) and sponsor similarity (0.4).
SHAP Explainability
SHAP (SHapley Additive exPlanations) values are computed for a vote prediction model to identify which features most influence how legislators vote. Global importance shows the top features across all legislators; per-legislator breakdowns show individual vote drivers.
DIME Ideology Scores
CFscores from the Database on Ideology, Money in Politics, and Elections (DIME) provide campaign finance-based ideology estimates for Ohio legislators. Scores are displayed as a beeswarm plot on a liberal-to-conservative spectrum.
Technical Stack
- Data Pipeline: Python 3.12, DuckDB (spatial), GeoPandas, NetworkX, pandas
- Graph Neural Network: PyTorch, PyTorch Geometric (GraphSAGE, HeteroData)
- Anomaly Detection: scikit-learn (Isolation Forest), UMAP for dimensionality reduction
- Legislative Data: LegiScan API (bulk dataset download)
- Election Data: MIT Election Data + Science Lab (MEDSL)
- Maps: MapLibre GL JS (WebGL), deck.gl (ArcLayer for donation flows), OpenFreeMap vector tiles
- Visualization: Observable Framework, D3.js v7, Sigma.js (WebGL), Cytoscape.js, Observable Plot
- Browser Data: DuckDB-WASM (in-browser SQL queries on Parquet files)
- Dashboard Builder: GridStack (drag-and-drop widget grid)
- Hosting: GitHub Pages via GitHub Actions CI/CD