AI projects
Tools for tracking education policy debates across multiple spaces, making them easier to follow and engage with. Building them has also shown how analytical choices shape what becomes visible.
AtlasED
Live dashboard →NLP · Policy analysis · Open source
An end-to-end NLP system that maps how education policy problems are framed across jurisdictions, and makes the assumptions inside that mapping visible and open to challenge.
▶ Read more
AtlasED analyses 4,100+ policy documents from government departments, think tanks, research bodies, and civil society organisations across England, Scotland, and Ireland. It tracks issue attention, organisational influence, and shifts in policy agendas through automated pipelines for weekly inference and monitoring.
The system uses a two-layer architecture. A cross-national model identifies shared policy domains, while within-country models extract nationally specific framing patterns. Each stage is designed to make modelling choices visible, including sensitivity analysis that shows which topics are stable and which depend on parameter settings, and divergence measures that reveal how framings differ across contexts.
The interface includes interactive topic exploration, framing comparison tools, and features that allow users to interrogate and challenge how the model represents policy problems.
Alongside the technical work, I am running cross-national workshops with policymakers in England, Scotland, and Ireland to test, interpret, and challenge model outputs, and to co-design governance approaches for responsible AI use in policy.
Newsletter classification pipeline
Live dashboard →NLP · Text classification · Specification sensitivity
A system that helps curate the ESRC Education Research Programme's weekly newsletter - identifying, scraping, and categorising education policy articles from 350+ sources, helping to make the policy landscape easier to follow and engage with.
▶ Read more
AM2 processes articles from 350+ policy sources and assigns them to six newsletter categories. It is trained on 104 newsletters (1,109 articles), using curator decisions as ground truth labels. A sentence transformer model is used in production, validated against LLM-based classification as a benchmark.
The project extends the specification sensitivity framework from AtlasED into supervised classification. Five models trained on the same data produce 65% disagreement. Transformer models classify by topic vocabulary, while LLMs classify by document type. These differences reflect competing assumptions about what each category represents. Per-class variance analysis, proxy concentration audits, and frame sensitivity testing make these differences visible and measurable.