Governments invest billions in cyber defence. AI is transforming both the threat landscape and our tools to fight it. Yet almost no one can prove what works. Here is why Monitoring, Evaluation and Learning is the missing piece — and how to build it.
The cybersecurity industry has a measurement problem — and artificial intelligence is making it both worse and more urgent. We can map adversary tactics with precision, deploy AI-powered threat detection across critical infrastructure, and share intelligence across borders in near real-time. Yet when a government asks, “Is our cyber programme actually working?” the honest answer, globally, remains: “We don’t have a consistent way to tell.”
This article argues that Monitoring, Evaluation and Learning (MEL) is not a bureaucratic afterthought in cybersecurity — it is the missing structural component that determines whether billions in public investment produce real, measurable security improvements. AI amplifies this urgency in two directions: adversaries are using generative AI, deepfakes, and machine learning to scale attacks at unprecedented speed, while defenders are deploying AI tools whose own effectiveness and risks must themselves be measured and governed. Drawing on global frameworks, regional comparisons across six continents, and practitioner experience, I set out a practical approach any government can adapt.
THE PROBLEM
Billions Spent, Impact Unknown — Now Compounded by AI
Governments are not short on cybersecurity activity. The global cybersecurity market is projected to exceed $300 billion by 2030, while cybercrime costs are expected to reach $13.82 trillion annually by 2028. The CyBIL Portal catalogues hundreds of international cyber capacity-building initiatives. The Oxford GCSCC has deployed its Cybersecurity Capacity Maturity Model (CMM) for Nations in over 95 countries. NIST released its landmark CSF 2.0 framework in 2024, and the EU’s NIS2 Directive expanded cybersecurity obligations to an estimated 300,000 entities.
95+
Countries assessed by Oxford CMM
$13.8T
Projected annual cybercrime cost by 2028
50%
Rise in AI-powered cyberattacks (CSET est.)
60%
Recipients deceived by AI-generated phishing
76%
Phishing campaigns using polymorphic AI tactics
Yet the evidence base for what actually works remains remarkably thin. A 2023 study by the EU Institute for Security Studies found that international cyber capacity-building efforts suffer from insufficient coordination, limited evidence of impact, and supply-driven rather than demand-responsive programming. National audit offices across multiple countries have reached similar conclusions: strategies are launched with insufficient baselines, weak outcome metrics, and no systematic way to determine what interventions produce which results.
AI has compounded this challenge. Harvard Business Review research shows AI-generated phishing emails deceive up to 60% of recipients — far exceeding traditional phishing. Malware-as-a-Service kits using polymorphic AI techniques are now present in an estimated 76% of phishing campaigns. Deepfake incidents in the first quarter of 2025 alone surpassed the total for all of 2024. Meanwhile, defenders are deploying AI-powered threat detection, anomaly identification, and automated incident response — but rarely measuring whether these AI tools themselves are performing as expected, or introducing new risks.
THE AI PARADOX
Yet the evidence base for what actually works remains remarkably thin. A 2023 study by the EU Institute for Security Studies found that international cyber capacity-building efforts suffer from insufficient coordination, limited evidence of impact, and supply-driven rather than demand-responsive programming. National audit offices across multiple countries have reached similar conclusions: strategies are launched with insufficient baselines, weak outcome metrics, and no systematic way to determine what interventions produce which results.
AI has compounded this challenge. Harvard Business Review research shows AI-generated phishing emails deceive up to 60% of recipients — far exceeding traditional phishing. Malware-as-a-Service kits using polymorphic AI techniques are now present in an estimated 76% of phishing campaigns. Deepfake incidents in the first quarter of 2025 alone surpassed the total for all of 2024. Meanwhile, defenders are deploying AI-powered threat detection, anomaly identification, and automated incident response — but rarely measuring whether these AI tools themselves are performing as expected, or introducing new risks.
THE AI FACTOR
How AI Is Reshaping Both Threat and Defence
Understanding the dual role of AI in cybersecurity is essential before designing any MEL framework. The landscape is no longer just “humans attacking systems” — it is increasingly “AI attacking AI-defended systems,” with humans struggling to measure what is happening at machine speed.
AI as Threat Multiplier
AI as Defence Enabler
The critical MEL question is no longer just “Are our cyber programmes working?” — it is also “Are the AI tools within those programmes performing as intended, and are we governing their risks?” This requires new evaluation frameworks that span both traditional cyber MEL and AI governance.
TWO WORLDS, ONE PROBLEM
Why This Requires Both Cyber and MEL Expertise
The cybersecurity community and the evaluation community each bring essential capabilities — and each has a critical blind spot without the other.
The cyber threat intelligence community understands MITRE ATT&CK, adversary TTPs, state-linked actor analysis, and the operational realities of threat hunting and incident response. They can tell you exactly what behaviours adversaries exhibit. What they typically cannot do is connect that understanding to structured programme design, evidence-based prioritisation, or credible impact assessment.
The MEL and evaluation community understands results frameworks, contribution analysis, theory of change, and how to build evidence-based learning systems. What they often lack is the operational understanding of cybersecurity: classification constraints, threat intelligence tradecraft, and the reality that “beneficiaries” might be CSIRTs and “outcomes” might be an adversary’s degraded capability.
AI governance introduces a third requirement: the ability to evaluate AI systems themselves — their performance, fairness, security, and trustworthiness — within the broader cyber programme context.
GLOBAL LANDSCAPE
How Different Regions Approach Cyber MEL and AI
The maturity of cybersecurity MEL — and the integration of AI governance into it — varies dramatically across regions. Understanding these differences is essential for designing context-appropriate approaches.
United States
The US leads globally in cybersecurity and AI governance frameworks. NIST CSF 2.0 (2024) expanded to all organisations, introducing a sixth core function — Govern — alongside Identify, Protect, Detect, Respond, and Recover, with four Implementation Tiers for maturity benchmarking. CISA’s ATT&CK Mapping guidance provides the most granular TTP-to-framework linkage globally. On the AI side, the NIST AI Risk Management Framework (AI RMF, 2023) introduces four functions — Govern, Map, Measure, Manage — specifically for AI trustworthiness, while the NIST Adversarial Machine Learning taxonomy (AI 100-2e2025) provides the world’s most detailed classification of AI-specific attacks. ISO/IEC 42001:2023 adoption is accelerating. However, the US approach remains heavily compliance-oriented and organisational rather than programmatic — strong at measuring whether entities meet baselines, weaker at evaluating collective impact of national cyber investments.
Strong frameworks, weaker programme-level evaluation
European Union
The EU has taken a regulatory-first approach with NIS2 Directive (2022/2555), expanding scope to ~300,000 entities with mandatory risk management and incident reporting, supported by ENISA technical guidance. On AI, the EU AI Act (2024) is the world’s first comprehensive AI regulation, classifying AI systems by risk level and mandating conformity assessments for high-risk applications. ENISA’s cybersecurity Threat Landscape reports provide annual baselines. Peer review mechanisms between member states offer rudimentary MEL. However, the EU focuses overwhelmingly on compliance measurement (are entities meeting NIS2 obligations? Are AI systems classified correctly?) rather than outcome measurement (is the EU actually more resilient?). With 13 of 27 member states still not having transposed NIS2 as of mid-2025, even compliance measurement faces challenges.
Compliance-heavy; outcome measurement still emerging
United Kingdom
Self-aware of gaps; strongest government evaluation culture
Asia-Pacific (ASEAN Focus)
Output-focused; outcome measurement nascent
Australia
Most MEL-intentional; still building evidence base
Africa
Foundational; donor-dependent; nationally fragmented
KEY INSIGHT
Across all six regions, the same structural gap persists: governments have invested in building cybersecurity technical capacity but have not systematically invested in measuring whether that capacity produces intended outcomes. AI governance adds a new dimension: almost no government has integrated evaluation of AI-powered cyber tools into their existing MEL frameworks. The opportunity is to build this integrated approach now, before the gap between investment and evidence widens further.
AI GOVERNANCE FOR CYBER
Governing AI Within Cybersecurity: The Emerging Framework Landscape
Governments deploying AI within cybersecurity programmes face a governance challenge that sits at the intersection of two rapidly evolving domains. Several frameworks have emerged to guide this, and any serious cyber MEL approach must integrate them.
NIST AI RMF
Four functions: Govern, Map, Measure, Manage. Voluntary US standard for AI trustworthiness across the full lifecycle.
ISO/IEC 42001
First international AI Management System standard (2023). Certifiable. 38 controls across 9 objectives. Plan-Do-Check-Act cycle.
EU AI Act
World’s first comprehensive AI regulation (2024). Risk-based classification with mandatory conformity assessment for high-risk systems.
NIST AI 100-2e2025
Adversarial ML taxonomy: evasion, poisoning, privacy attacks, availability breakdown. Essential for evaluating defensive AI security.
MITRE ATLAS
Adversarial Threat Landscape for AI Systems. Like ATT&CK but for AI-specific attacks. Maps real-world case studies of AI compromise.
OWASP Top 10 for LLMs
Identifies critical vulnerabilities in LLM applications: prompt injection, insecure output handling, training data poisoning, and more.
PRACTITIONER GUIDANCE
For governments embedding AI within cyber programmes, the minimum viable governance stack is: NIST CSF 2.0 for overall cyber risk management, NIST AI RMF for AI-specific trustworthiness evaluation, and MITRE ATT&CK + ATLAS for threat mapping of both conventional and AI-targeted attacks. Organisations seeking certification should additionally align with ISO/IEC 42001. All of these should feed into the broader MEL framework so that AI governance is not a standalone exercise but an integrated part of programme evaluation.
A PRACTICAL FRAMEWORK
Six Stages for Integrating MEL into Cyber & AI Programmes
Regardless of regional context, the pathway from threat understanding to measurable impact follows the same logic — now extended to include AI governance. What changes is the starting point, scale, and sophistication.
IDENTIFY: Map the Threat Landscape
Anchor analysis in observable behaviours using MITRE ATT&CK (14 tactical categories, 190+ techniques). Include AI-specific threat vectors using MITRE ATLAS. Profile prevalent TTPs including AI-powered attack techniques. Update quarterly.
PRIORITISE: Decide Where to Engage
Apply multi-criteria decision analysis across threat severity, intervention feasibility, partner readiness (CMM or equivalent), disruption potential, and risk of inaction. Factor in AI maturity of both threat actors and partner defences.
INTERVENE: Select from a Structured Menu
Three tiers: foundational (CERT establishment, workforce, strategy), operational (threat sharing, joint hunting, incident response), and assertive (coordinated attribution, disruption). Specify AI-specific components within each tier.
MEASURE: Apply Tiered Benchmarking
Three measurement levels: process indicators (real-time), outcome markers (mid-term capacity shifts via CMM/NIST Tiers), and strategic indicators (adversary behaviour change, deterrence effects). Add AI model performance and trustworthiness metrics.
ADAPT: Feed Evidence Back
Feedback loops connecting evidence to identification and prioritisation. Explicit scaling rules: expand, replicate, pivot, or discontinue based on evidence. Include AI model retraining and drift monitoring triggers.
GOVERN AI: Evaluate AI Tools Themselves
Apply NIST AI RMF functions (Govern, Map, Measure, Manage) to all AI-powered components. Monitor for adversarial ML attacks, data poisoning, model drift, and bias. Ensure AI tools do not introduce new risks while addressing existing ones.
MEASUREMENT IN PRACTICE
What to Measure and How
The biggest practical challenge is designing measurement that is rigorous enough to be credible but lightweight enough to work in classification-sensitive, fast-moving environments where AI is accelerating both attack and defence tempos.
Contribution analysis (Mayne, 2012) remains the most appropriate evaluation methodology. It builds and tests plausible causal pathways without requiring a counterfactual — endorsed by OECD DAC and HM Treasury’s Magenta Book for complex, multi-actor environments. For AI components specifically, NIST AI RMF’s “Measure” function provides structured guidance on assessing model performance, bias, robustness, and trustworthiness.
Are We Doing What We Planned?
Delivery milestones, partner engagement quality, resource utilisation. For AI: model deployment status, training data quality checks, integration testing completion.
Is Capacity Actually Changing?
CMM dimension shifts, NIST Tier progression, detection rates, response times. For AI: threat detection accuracy, false positive rates, adversarial robustness scores.
Is the Threat Landscape Shifting?
Adversary behaviour change, incident reduction, deterrence evidence. For AI: model performance vs. evolving attack techniques, governance compliance trajectory.
TOOLKIT
A Practitioner's Toolkit: Key Global Frameworks
| Framework | Purpose | MEL Application |
|---|---|---|
| MITRE ATT&CK | 190+ adversary techniques across 14 tactical categories based on real-world observations | Anchors threat identification by behaviour; enables consistent baseline measurement |
| MITRE ATLAS | Adversarial threat landscape specifically for AI/ML systems, with real-world case studies | Maps AI-specific attack vectors; enables evaluation of defensive AI security posture |
| NIST CSF 2.0 | Six-function framework (Govern, Identify, Protect, Detect, Respond, Recover) with four Tiers | Maturity Tiers as outcome indicators; Organisational Profiles for before/after benchmarking |
| NIST AI RMF | Four-function AI risk management (Govern, Map, Measure, Manage) for trustworthy AI | Structured evaluation of AI tools within cyber programmes; maps to ISO 42001 |
| ISO/IEC 42001 | First international AI Management System standard; certifiable; 38 controls, 9 objectives | Formal governance of AI systems; Plan-Do-Check-Act cycle with audit readiness |
| Oxford CMM | National cybersecurity capacity across 5 dimensions and 5 maturity stages; 95+ deployments | Strongest tool for measuring partner readiness and tracking national capacity shifts |
| OECD DAC Criteria | Six evaluation criteria: relevance, coherence, effectiveness, efficiency, impact, sustainability | Global standard for programme evaluation; apply to cyber and AI programme assessment |
| Contribution Analysis | Theory-based evaluation tracing plausible causal pathways (Mayne, 2012) | Primary methodology for cyber impact evaluation in complex, attribution-constrained domains |
CHECKLISTS
Government Readiness Assessment
Institutional Prerequisites
- Clear mandate and budget: Multi-year political backing and dedicated resources for cyber MEL and AI governance — not just implementation.
- Cross-agency coordination: Mechanisms connecting cyber, foreign policy, defence, intelligence, and AI development agencies around shared frameworks.
- Threat intelligence access: Ability to consume and translate both open-source and classified threat intelligence — including AI-specific threat reporting.
- Evaluation capability: Evaluators with MEL expertise and sufficient security clearances, plus AI governance knowledge or access to it.
- AI inventory: A current catalogue of all AI tools deployed within cyber programmes, with documented purposes, data sources, and risk profiles.
- Data infrastructure: Systems for securely storing and analysing programme data, model performance data, and threat intelligence across classification levels.
- Learning culture: Willingness to treat both cyber strategies and AI tools as living systems to be refined through evidence, not documents to be filed.
MEL + AI Governance Design Checklist
- Baseline established: Current posture documented using a recognised maturity model (CMM, NIST CSF, Essential Eight) before intervention.
- Theory of change articulated: Clear, testable causal pathway linking activities to expected outcomes, with explicit assumptions — including AI components.
- Indicators tiered: Process, outcome, and strategic indicators defined for each intervention with collection methods assigned. AI model KPIs included.
- AI risk assessment completed: NIST AI RMF Map function applied; MITRE ATLAS threats profiled; adversarial robustness tested.
- Evaluation methodology selected: Contribution analysis or equivalent theory-based approach, with alternative explanations pre-identified.
- AI model monitoring designed: Continuous monitoring for model drift, performance degradation, adversarial exploitation, and bias amplification.
- Adaptation protocol defined: Decision rules for scaling, pivoting, or discontinuing — including AI model retraining or replacement triggers.
- Scenario tested: At minimum two real-world scenarios run through the complete framework, including at least one AI-augmented attack scenario.
Common Pitfalls to Avoid
- Measuring activity not outcomes: “20 officials trained” tells you nothing. Track capability shifts, not headcounts.
- Bolting MEL on after design: Evaluation must be built in from inception. Retrofitting baselines is expensive and unreliable.
- Treating AI as a magic solution: AI tools require their own evaluation, governance, and risk management — not blind trust in vendor claims.
- Ignoring adversarial AI risks: If you deploy AI for defence but don’t evaluate it against adversarial ML attacks, you’ve created a new vulnerability.
- Confusing compliance with impact: NIS2 compliance, Essential Eight maturity, NIST Tier progression are proxies — not proof of reduced risk.
- Siloing AI governance from cyber MEL: AI governance and cyber evaluation must be integrated — separate reporting lines produce blind spots.
IMPLEMENTATION
A Phased Approach for Any Government
- Inception and Scoping (2–3 weeks): Convene cross-agency stakeholders including AI specialists. Confirm scope, map existing programmes and AI deployments, establish data access, and identify the evaluation questions that matter most. Produce inception report.
- Evidence Synthesis (3–4 weeks): Benchmark existing interventions against international comparators (CyBIL Portal, CMM data, NIST community profiles). Include AI effectiveness evidence from published evaluations. Conduct key informant interviews. Identify what works, under what conditions.
- Framework Design (3–4 weeks): Operationalise each pathway stage: threat mapping templates, prioritisation matrix, intervention menu, tiered measurement framework, AI governance integration, and adaptation protocol. Iterative with evidence synthesis.
- Validation and Handover (2–3 weeks): Stress-test with real scenarios including AI-augmented attack scenarios. Present to stakeholders. Conduct peer review. Transfer ownership to internal teams with sustainability plan.
CONTEXT ADAPTATION
A government with mature institutions (US, UK, Australia, Singapore) might compress this to 8–10 weeks. A government building foundational capacity might need 16–20 weeks with heavier emphasis on inception and evidence synthesis. The principle is constant: ground the framework in evidence, test it against reality, design for adaptation, and integrate AI governance from the start.
CONCLUSION
The Case for Acting Now
The threat landscape is not waiting. AI-powered attacks are scaling faster than most defences can adapt. Ransomware-as-a-Service, deepfake fraud, adversarial machine learning, and the commercial proliferation of cyber intrusion tools are all accelerating. Meanwhile, governments deploying AI in their own cyber defences face an accountability gap: they cannot demonstrate that these tools work as intended, or that they are not introducing new risks.
The analytical building blocks already exist. MITRE ATT&CK and ATLAS provide threat taxonomies. NIST CSF 2.0 and AI RMF provide maturity and trustworthiness frameworks. Oxford CMM provides national capacity benchmarks. ISO/IEC 42001 provides AI governance structure. Contribution analysis provides evaluation methodology. The OECD DAC criteria provide assessment standards. What has been missing is the deliberate integration of these tools into a coherent decision pathway that moves from observed threats, through programme design, through AI governance, to credible evidence of impact.
Building that integration requires a specific combination of expertise: professionals who understand both the operational realities of cybersecurity and the methodological demands of credible evaluation — now with the additional layer of AI governance. That combination is rare, but it is precisely what this moment demands. The governments that invest in it now will be the ones that can answer the questions that, today, almost no one can: Is our cyber programme actually working? Are our AI tools performing as intended? And are we governing both responsibly?
REFERENCES
- NIST (2024). The NIST Cybersecurity Framework (CSF) 2.0. CSWP 29. nist.gov/cyberframework
- NIST (2023). AI Risk Management Framework (AI RMF 1.0). nist.gov/itl/ai-risk-management-framework
- NIST (2025). Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations. AI 100-2e2025. nvlpubs.nist.gov
- MITRE Corporation. MITRE ATT&CK. attack.mitre.org
- MITRE Corporation. ATLAS: Adversarial Threat Landscape for AI Systems. atlas.mitre.org
- CISA (2023). Best Practices for MITRE ATT&CK Mapping. cisa.gov
- ISO/IEC (2023). ISO/IEC 42001:2023 — AI Management Systems.
- GCSCC (2021). CMM for Nations, 2021 Edition. University of Oxford. gcscc.ox.ac.uk
- European Parliament (2022). Directive (EU) 2022/2555 (NIS2). digital-strategy.ec.europa.eu
- ENISA (2025). NIS2 Technical Implementation Guidance. enisa.europa.eu
- European Parliament (2024). Regulation (EU) 2024/1689 (EU AI Act).
- Australian Government (2023). 2023–2030 Australian Cyber Security Strategy. homeaffairs.gov.au
- Australian Government (2025). Horizon 2 Strategy Evaluation Model Consultation. homeaffairs.gov.au
- ASEAN (2021). Cybersecurity Cooperation Strategy 2021–2025. asean.org
- CSA Singapore (2024). ASEAN Regional CERT Launch. csa.gov.sg
- African Union (2014). Malabo Convention on Cyber Security and Personal Data Protection. au.int
- Chatham House (2024). The AU and cybersecurity at its 2024 summit. chathamhouse.org
- Chatham House (2025). The AU and the UN Cybercrime Convention. chathamhouse.org
- UK Government (2022). National Cyber Strategy 2022. gov.uk
- National Cyber Force (2023). Responsible Cyber Power in Practice. gov.uk
- UK–France (2024). The Pall Mall Process Declaration. gov.uk
- National Audit Office (2019). Progress of the 2016–2021 National Cyber Security Programme.
- Mayne, J. (2012). Contribution Analysis: Coming of Age? Evaluation, 18(3), 270–280.
- OECD DAC (2019). Revised Evaluation Criteria Definitions and Principles for Use.
- HM Treasury (2020). The Magenta Book: Guidance for Evaluation.
- EUISS (2023). International Cyber Capacity Building: Global Trends.
- ERIA (2024). Strengthening ASEAN’s Cybersecurity. Policy Brief 2024-06. eria.org
- CyBIL Portal. Cyber Capacity Building Inventory. cybilportal.org
- Palo Alto Networks (2025). Cybersecurity Predictions 2025. paloaltonetworks.com
- IBM (2024). Security Roundup: Top AI Stories in 2024. ibm.com
- OWASP. Top 10 for Large Language Model Applications. owasp.org
- Caltagirone, S., Pendergast, A. & Betz, C. (2013). The Diamond Model of Intrusion Analysis. apps.dtic.mil

