Back to Construction AI Brief Get this by email

Construction AI Brief

13 May 2026

Construction AI: the defender models arrive, and the reasoning ceiling shows itself

OpenAI's Daybreak landed yesterday as a direct answer to Anthropic's Mythos/Glasswing - frontier AI is now a security category. Meanwhile, every major model scored 0% on the new ARC-AGI 3 reasoning benchmark.

Construction AI: defender models arrive, and the reasoning ceiling shows itself

Today’s context: This brief covers the latest movements in AI tooling, adoption, and signals for construction teams. Read on for what matters and what to focus on.

Industry Readiness

UKCW London - day two of three at ExCeL

UK Construction Week London continues today (Wednesday 13 May) from 10:00 to 17:00, with the ConTech & AI Hub running across all three days. Yesterday's first day saw discussion focused on AI-driven decision-making, generative design, automated estimating and the cross-over with the Marketing & Procurement Hub. Today's programming includes the densest concentration of agentic AI and digital twins sessions - and Thursday closes with the Women in Construction sessions and a final pass over the AI-readiness panels.

If you have not been yet, today is the day. The hub is small enough to navigate in a focused two-hour visit if your team has clear questions on procurement-AI, document automation, or digital twin tooling.

Why it matters

Use the second day to follow up on conversations started yesterday rather than collecting more cards. Three or four solid 30-minute vendor scoping conversations are worth more than a full lap of the floor.

Sources:

UKCW London 2026 - ExCeL →

UKCW - ConTech & AI →

Specification Online - Six inspiring Women in Construction sessions →

Microsoft signs up the construction trades for AI literacy at scale

Microsoft has partnered with North America's Building Trades Unions (NABTU) to offer free AI literacy courses and industry-recognised credentials to millions of skilled craft professionals across North America. The framing is workforce development for the AI build-out - both for trades who will build the data centres and for the wider construction workforce, where AI is increasingly embedded in project controls.

For UK construction leaders, the takeaway is two-fold. First, expect similar UK-side partnerships to follow - CITB, Constructionarium, the Federation of Master Builders are all natural counterparties. Second, your AI tooling decisions will increasingly be questioned on whether they're supported with structured upskilling, not just licences.

Why it matters

AI literacy is moving from a "nice to have" upskilling line item to a workforce-strategy obligation. Tie your tooling investments to a published, measurable upskilling commitment.

Sources:

Axios - Microsoft partners with construction unions on AI boom →

OpenAI - Building the compute infrastructure for the Intelligence Age →

Tools & Platforms

Hermes Agent ships a desktop app - always-on, self-evolving, local

Nous Research's Hermes Agent dropped a desktop app on 9 May, taking the open-source autonomous agent from CLI/server experiment to a long-lived application running locally on the user's machine. The pitch is an always-on agent that can "self-evolve" by editing its own skill set, watching inboxes and queues continuously, and persisting across sessions. Combined with last week's 0.13 "Tenacity" release (multi-agent kanban, /goal, ElevenLabs voice, DeepSeek v4 Pro), Hermes is now a credible self-hosted option for organisations that don't want a cloud-only agent stack.

For UK construction firms with client confidentiality clauses, this matters. An agent that runs locally, persists across sessions and orchestrates multi-step workflows without sending project data to a vendor is a different procurement conversation than a cloud SaaS agent.

Why it matters

If your data residency and confidentiality story currently blocks cloud-only AI tools, Hermes Desktop is worth a focused two-week evaluation.

Sources:

GitHub - Hermes Agent releases →

Hermes Agent - Nous Research →

Claude for Word: tracked changes are the audit trail you've been missing

Worth a fresh look this week: Claude for Word, launched in April and now available on Pro and Max plans, proposes every edit as a tracked change in Microsoft Word's native review pane. The original text is shown as a deletion, the new text as an insertion, with each change individually acceptable or rejectable. Claude can also read comment threads, understand what they anchor to, edit the anchored passage and reply to the thread explaining what it did.

For construction document workflows - contracts, RFIs, design narratives, JCT correspondence, BSA-related submissions - that is the audit trail that has been missing from most AI-assisted editing. It also pairs cleanly with the CDM/PI accountability conversation: every AI change is signed, dated and reviewable.

Security & Governance

OpenAI Daybreak vs Anthropic Mythos/Glasswing - frontier AI is now an enterprise security category

OpenAI launched Daybreak yesterday (11 May) as a cybersecurity platform built on GPT-5.5 and the Codex Security agent. Daybreak connects to codebases and infrastructure, simulates attack routes, surfaces vulnerabilities, generates and tests patches inside repositories, and produces audit-ready validation. It is positioned as a direct counter to Anthropic's Claude Mythos (announced April) and Project Glasswing - a partnership that already includes AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA and Palo Alto Networks.

The strategic read for construction is sharper than it looks. Last week's Beazley poll ranked construction the least prepared industry for cyber threats. The same week, Palisade Research showed frontier models can self-replicate across vulnerable networks at up to 81 per cent success. Now both major frontier labs are racing to build defender programmes - but those programmes are gated to enterprises in their partner ecosystems. If your stack is Microsoft 365 / Anthropic, you'll want to understand Glasswing. If it's OpenAI / Microsoft Foundry, Daybreak is your route in.

Why it matters

Pick an ecosystem and get on the right defender table early. The "AI defender" capability your insurers and clients will start asking about in 2026/27 is being decided in these alliances now.

Sources:

Anthropic - Project Glasswing →

Anthropic - Claude Mythos Preview →

Help Net Security - OpenAI's Daybreak uses Codex Security →

Gizmodo - Daybreak: OpenAI's answer to Project Glasswing →

PYMNTS - OpenAI debuts Daybreak to counter Anthropic's Mythos →

Adoption & Evidence

Every frontier model scored 0% on the new ARC-AGI 3 reasoning benchmark

GPT-5.4, Claude Opus 4.6 and Gemini 3.1 all scored exactly 0 per cent on the latest version of the Abstraction and Reasoning Corpus benchmark (ARC-AGI 3), created by François Chollet to test general reasoning that cannot be cracked by memorised training data. Untrained humans score 100 per cent on the same tasks. These same models simultaneously write substantially better code, score higher on professional exams, and handle longer multi-step tasks than their predecessors - so this is a structural limitation, not a model-quality regression.

The construction read-through is straightforward. Frontier AI is genuinely useful for repeatable, pattern-rich, well-documented work - estimating, scheduling, RFIs, document review. It is still beaten by an untrained twelve-year-old on novel abstract reasoning. That is the right line to use in any boardroom conversation about where AI should and should not be making decisions on a project.

Why it matters

Use this number deliberately. It is the single most credible counterweight to "AI can do anything" claims, and it makes the case for human-in-the-loop on judgement-heavy decisions far more defensible than vague risk talk.

Sources:

MindStudio - Why GPT-5.4, Claude 4.6 and Gemini 3.1 all scored 0% on ARC-AGI 3 →

LM Council - AI Model Benchmarks May 2026 →

What matters most

→Pick a defender - if you're a Microsoft 365 / Anthropic shop, Glasswing applies; if you're an OpenAI shop, Daybreak applies. Get on the right table early.
→Frontier models are still beatable on novel reasoning by an untrained 12-year-old. Use this when defending which decisions should not be AI-only.
→UK construction-specific tools (NavLive, MyQS.ai, BRCKS, ProcurePro) are punching above their weight - buy local where the use case is local.

Ready to put AI to work on your projects?

50 free Intelligence Units. Set up your first project in under 20 minutes. No credit card needed.

Get 50 free Intelligence Units

Construction AI: the defender models arrive, and the reasoning ceiling shows itself

Industry Readiness

UKCW London - day two of three at ExCeL

Microsoft signs up the construction trades for AI literacy at scale

Tools & Platforms

Hermes Agent ships a desktop app - always-on, self-evolving, local

Claude for Word: tracked changes are the audit trail you've been missing

Security & Governance

OpenAI Daybreak vs Anthropic Mythos/Glasswing - frontier AI is now an enterprise security category

Adoption & Evidence

Every frontier model scored 0% on the new ARC-AGI 3 reasoning benchmark

What matters most

Ready to put AI to work on your projects?

Why PlanOps publishes this

Related issues

Construction AI: the Building Safety Regulator's backlog finally shifts, and a central bank flags the bill behind the data centre boom

Pomelli - Google's marketing AI for the smaller end of the construction supply chain

Construction AI: McLaren puts robot dogs on its sites at scale, and the office finally captures what gets said on Teams

Construction AI: NG Bailey puts a chief AI officer in the boardroom, and the data centre becomes a cyber-security problem

Construction AI: the defender models arrive, and the reasoning ceiling shows itself

Industry Readiness

UKCW London - day two of three at ExCeL

Microsoft signs up the construction trades for AI literacy at scale

Tools & Platforms

Hermes Agent ships a desktop app - always-on, self-evolving, local

Claude for Word: tracked changes are the audit trail you've been missing

Security & Governance

OpenAI Daybreak vs Anthropic Mythos/Glasswing - frontier AI is now an enterprise security category

Adoption & Evidence

Every frontier model scored 0% on the new ARC-AGI 3 reasoning benchmark

What matters most

Ready to put AI to work on your projects?

Get the brief by email

Why PlanOps publishes this

Related issues

Construction AI: the Building Safety Regulator's backlog finally shifts, and a central bank flags the bill behind the data centre boom

Pomelli - Google's marketing AI for the smaller end of the construction supply chain

Construction AI: McLaren puts robot dogs on its sites at scale, and the office finally captures what gets said on Teams

Construction AI: NG Bailey puts a chief AI officer in the boardroom, and the data centre becomes a cyber-security problem