- Tech Insights 2026 Week 22by Johan Sanneblad
AI is getting smart in ways few people expected.
For a long time, one of the standard criticisms of large language models was that they are just next-token prediction models. The argument was that they could only remix material they had already seen during training, and therefore could not produce genuinely novel content.
Last week, we had two announcements that challenge this assumption on a deep level.
First, Anthropic announced an initial update from Project Glasswing. They collaborated with 50 companies that used their unreleased model Claude Mythos Preview to scan over 1,000 open-source projects, where it identified 6,202 high- or critical-severity vulnerabilities. Out of these, 1,752 were assessed by independent security firms, and 90.6% of these vulnerabilities were confirmed as valid true positives. It achieved this by creating novel test cases exploring paths human developers did not consider. It thought “out of the box” and created new solutions.
AI models have in 2026 reached a level of source code understanding that is far beyond what humans are capable of. This means that even if you are not using AI to write and secure your codebase, you should still assume that someone else will use AI to find weaknesses in it. Using AI to write and secure source code will no longer be an option for most companies. It will be a requirement.
Secondly, OpenAI has a new internal general-purpose reasoning model that disproved a central conjecture in the planar unit distance problem, a discrete geometry problem posed by Paul Erdős in 1946. Many human mathematicians have tried to make progress on this problem over the past 80 years without success, and now we have not a special-purpose math model, but a general-purpose reasoning model producing a valid proof. It solved the problem by reaching into algebraic number theory, a field not typically associated with geometry.
It’s hard to overstate just how big of an achievement this is. Disproving this conjecture means thinking outside the box and creating new mathematical knowledge. A model acting strictly by copying existing knowledge would never have solved this.
As a human race, we have now officially begun our journey into unknown waters. We have created an intelligence that in some areas greatly surpasses our own, and that will keep getting smarter and more creative as the months go by. There is so much AI news every week that it is almost hard to keep up, but disproving a central conjecture in the planar unit distance problem is a big step not just for AI but for humanity as a whole, because it gives a slight glimpse into a future where AI is able to create magical things we could only dream of.
Thank you for being a Tech Insights subscriber!
Listen to Tech Insights on Spotify: Tech Insights 2026 Week 22 on Spotify
Notable model releases last week:
- Composer 2.5, the latest proprietary, agentic AI coding model by Cursor.
- Lance by ByteDance. Open-source 3B model “that supports image and video understanding, generation, and editing within a single framework”.
- QWEN 3.7-MAX by Alibaba. Flagship proprietary agent-foundation model.
- Trellis 2 by Microsoft. “State-of-the-art large 3D generative model (4B parameters) designed for high-fidelity image-to-3D generation“.
THIS WEEK’S NEWS:
- OpenAI Model Disproves Erdős Unit Distance Conjecture
- PwC Study Finds Grep Beats Vector Search in Agentic Memory Retrieval
- Anthropic Acquires the SDK Factory That Built OpenAI’s Developer Tools
- Project Glasswing: Anthropic’s Mythos Model Finds Over 10,000 Vulnerabilities in One Month
- Google Launches Gemini Spark, a 24/7 Cloud-Based AI Agent
- Google Releases Gemini 3.5 Flash and Gemini Omni at I/O 2026
- Google Antigravity 2.0: Standalone Agent Platform with CLI, SDK, and New Pricing
- Stability AI Releases Stable Audio 3.0 With Four Models and Six-Minute Compositions
- OpenAI Releases ChatGPT for PowerPoint as a Native Add-In
OpenAI Model Disproves Erdős Unit Distance Conjecture
https://openai.com/index/model-disproves-discrete-geometry-conjecture

The News:
- On May 20, 2026, OpenAI announced that an internal general-purpose reasoning model disproved the planar unit distance conjecture, a discrete geometry problem posed by Paul Erdős in 1946.
- The conjecture concerns the maximum number of pairs of points exactly one unit apart, given n points on a plane. For nearly 80 years, known constructions produced unit-distance counts of n^(1+C/loglog(n)), growing only slightly faster than linear, and mathematicians widely assumed the upper bound was n^(1+o(1)).
- The model found an infinite family of point configurations that produce at least n^(1+δ) unit-distance pairs for a fixed positive δ, breaking the conjecture. Princeton mathematician Will Sawin refined the construction to show that δ can be taken as 0.014.
- The proof does not use geometry directly. It draws on algebraic number theory, specifically infinite class field towers and the Golod-Shafarevich theorem, tools that had not previously been applied to this problem.
- External mathematicians including Fields Medalist Tim Gowers, Noga Alon, Thomas Bloom, and Melanie Wood independently verified the proof. The result updates the lower bound (i.e., the best-known construction) for the first time since Erdős’s original 1946 paper. The upper bound was separately improved by Spencer, Szemerédi, and Trotter in 1984.
“This paper demonstrates that current AI models go beyond just helpers to human mathematicians – they are capable of having original ingenious ideas, and then carrying them out to fruition.” Arul Shankar, leading number theorist.
My take: This might be one of the greatest achievements we have seen so far for any AI model. What makes this extraordinary is that this work does not come from a mathematics-specific system like Google AlphaProof, but from a general-purpose reasoning model. Some might call this AGI, while others view the results from tests like ARC-AGI-3 and declare AGI to be far away. I think the answer lies somewhere in the middle. For some well-defined tasks, the best AI models of today will show clear signs of intelligence and will be able to reason in a somewhat creative way to solve problems. But for tasks where it has no specific training data, models will fail catastrophically, like in ARC-AGI-3. As models get larger the number of areas the models are lacking in will be less, and my gut feeling is that within two years we should have general-purpose reasoning models that can do basically any task a human can do (fine-tuned versions of models running on Vera rubin-generation hardware are rolling out next year).
Read more:
- Arxiv: Remarks on the disproof of the unit distance conjecture
- AI just solved an 80-year-old ‘Erdős problem,’ and mathematicians are amazed | Scientific American
- OpenAI’s internal model disproves Unit Distance Conjecture of Erdos : r/math
PwC Study Finds Grep Beats Vector Search in Agentic Memory Retrieval
https://arxiv.org/abs/2605.15184

The News:
- Researchers at PricewaterhouseCoopers U.S. published a study on May 14, 2026 comparing lexical (grep) and vector (semantic) retrieval inside LLM agent workflows, using 116 questions from the LongMemEval benchmark across four agent harnesses: a custom harness (Chronos), Claude Code, Codex CLI, and Gemini CLI.
- With inline tool delivery, grep outperforms vector retrieval on every tested harness-model combination. The largest gap is Chronos with Gemini 3.1 Flash-Lite (86.2% grep vs. 62.9% vector). The narrowest is Claude Code with Claude Opus 4.6 (76.7% vs. 75.0%).
- The agent harness affects accuracy by margins comparable to swapping retrievers entirely: Claude Opus 4.6 reaches 93.1% accuracy under the Chronos harness with inline grep, but only 76.7% under Claude Code with the same retrieval method.
- Switching from inline to file-based (programmatic) tool delivery reverses the pattern in several configurations. Codex CLI with GPT-5.4 drops from 93.1% under inline grep to 55.2% under programmatic grep, while programmatic vector reaches 67.2% for the same pairing.
- In the noise scaling experiment (Experiment 2), vector retrieval tends to score higher at small session counts, while grep can match or overtake it as more irrelevant sessions accumulate, depending on the harness and model backbone.
My take: Most developers building agentic systems today default to vector search. Embeddings feel like the modern answer, vector databases have excellent tooling, and semantic similarity sounds more capable than pattern matching. The core finding of this paper is that when an agent searches through conversational memory, session history, or accumulated user context, a simple grep over stored text consistently outperforms vector retrieval, at least when results are returned inline to the model. The reason is that agents retrieving session state are usually searching for concrete facts, a filename agreed upon two sessions ago, a budget the user stated, or a preference logged earlier in the conversation. Those are exact-match problems and semantic similarity just adds noise rather than signal when the target is a literal string.
The benchmark in this study is quite narrow, just 116 conversational memory questions, and the task type structurally favors grep since the questions are typically answered by literal spans like exact dates, names, or stated preferences rather than fuzzy semantic matches. The authors are clear that their results might not generalize to enterprise document search or dense texts. But if you are building agentic systems, the practical takeaway from this article is to always start with a simple retriever, understand exactly how your chosen harness delivers context to the model, and only add embedding infrastructure when you have a clear reason for it.
Read more:
Anthropic Acquires the SDK Factory That Built OpenAI’s Developer Tools
https://www.anthropic.com/news/anthropic-acquires-stainless

The News:
- Anthropic acquired Stainless on May 18, a New York-based startup founded in 2022 by former Stripe engineer Alex Rattray. Stainless takes an API specification and outputs SDKs, CLIs, and MCP servers across multiple languages.
- Stainless built every official Anthropic SDK since the Claude API launched, and also built SDKs for OpenAI, Google, Meta, Cloudflare, and Runway.
- Supported languages include TypeScript, Python, Go, Java, Kotlin, and Ruby. Generated SDKs include handling for retries, streaming, pagination, authentication, and automatic syncing when the underlying API changes.
- Stainless also built dedicated MCP server tooling, directly supporting Anthropic’s Model Context Protocol, introduced in November 2024 to standardize connections between AI agents and external systems.
- Stainless is winding down all hosted products. New signups, projects, and SDK generation are no longer available. Existing customers retain full rights to previously generated SDKs and can modify them freely.
- The deal is reportedly valued at over $300 million, more than double Stainless’s $150 million Series A valuation from December 2024, according to The Information. Anthropic has not officially confirmed the price.
My take: This is Anthropic’s second developer infrastructure acquisition in five months, following the December 2025 acquisition of Bun, the JavaScript runtime. And the strategy behind this is clear: frontier AI labs now acquire tools instead of building them. In this case however the acquisition will have quite a lot of consequences. Every single SDK that is published by OpenAI is generated by Stainless, and effective immediately they can no longer use that. So this was a double win for Anthropic. Was it worth $300? For the platform, maybe not. But being able to slow down OpenAI for just a tiny bit? Definitely.
Read more:
- Anthropic acquires Bun as Claude Code reaches $1B milestone \ Anthropic
- Stainless – OpenAI makes artificial intelligence accessible to millions
Project Glasswing: Anthropic’s Mythos Model Finds Over 10,000 Vulnerabilities in One Month
https://www.anthropic.com/research/glasswing-initial-update

The News:
- Project Glasswing is Anthropic’s restricted cybersecurity initiative, launched in April, using Claude Mythos Preview, a non-public model, to scan critical software for vulnerabilities at scale.
- After one month, approximately 50 partner organizations collectively found more than 10,000 high- or critical-severity vulnerabilities. Most partners each found hundreds. Several report bug-finding rates more than ten times higher than before.
- Cloudflare found 2,000 bugs across critical-path systems, 400 rated high- or critical-severity, with a false positive rate that Cloudflare’s team considers better than human testers.
- Mozilla found and fixed 271 vulnerabilities in Firefox 150 while testing Mythos Preview, over ten times more than found in Firefox 148 using Claude Opus 4.6. The UK’s AI Security Institute reports Mythos Preview is the first model to solve both of its cyber range simulations end to end.
- Mythos Preview scanned more than 1,000 open-source projects and identified 6,202 high- or critical-severity vulnerabilities out of 23,019 total. Of 1,752 assessed by independent security firms, 90.6% were confirmed as valid true positives, and 62.4% were confirmed at high- or critical-severity.
- At one Glasswing partner bank, Mythos Preview helped detect and prevent a fraudulent $1.5 million wire transfer after a threat actor compromised a customer’s email account and made spoof phone calls.
- Anthropic is now making available to qualifying customers’ security teams on request the tools used with Mythos Preview: a codebase scanning harness, a threat model builder, and pre-built skills (custom instructions for repeated work).
My take: This changes our baseline for software security. If an AI can now scan thousands of codebases and find real high-severity vulnerabilities at a rate far beyond traditional human testing, then every company with software needs to assume two things: First, defenders must us AI to secure code faster than ever before, and secondly attackers will use the same kind of capability to try to find weaknesses before you do. The only way to fully secure your code base going forward is to switch to AI-based code generation and testing as quickly as possible.
Google Launches Gemini Spark, a 24/7 Cloud-Based AI Agent

The News:
- Google announced Gemini Spark at Google I/O on May 19: a background AI agent that runs on dedicated Google Cloud virtual machines, executing tasks without requiring the user’s device to be on.
- Spark runs on Gemini 3.5 Flash and is built on the Antigravity agent harness, which handles long-running and multi-step task execution.
- It integrates out-of-the-box with Gmail, Google Docs, Slides, and Calendar, and at launch connects to three third-party apps via MCP: Canva, OpenTable, and Instacart, with more partner integrations announced as in progress.
- Users can interact with Spark via a dedicated Gmail address, and on Android, task progress is visible through the new Android Halo system.
- Spark requires approval before “high-stakes actions like spending money or sending emails”, and all app connections are opt-in.
- Upcoming features include the ability to send texts and emails through Spark, and to control a browser. Google plans to bring Spark to the Gemini desktop app this summer, adding local file access.
- Spark is currently in beta, available to Google AI Ultra subscribers in the US starting the week of May 26, with no confirmed rollout date for the EU, UK, or other markets.
“It’s your personal AI agent that helps you navigate your digital life, taking action on your behalf and under your direction.” Sundar Pichai
My take: OpenAI has Codex, Anthropic has Claude Cowork, and now Google has Gemini Spark. Gemini Spark is the most expensive of the three services and requires a $100/month AI Ultra plan, but Google promises that in the future this will be your fully automated digital assistant that can act directly on all your cloud data and do all your errands. Right now however Gemini Spark is quite limited and can only execute tasks on a schedule or trigger or perform workflows. Google promises they have “a packed roadmap of features” coming in the next few months, so let’s keep an eye out for this one and see what’s coming.
Google Releases Gemini 3.5 Flash and Gemini Omni at I/O 2026
https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/
https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/
The News:
- Google released two models at I/O 2026 on May 19, 2026: Gemini 3.5 Flash, targeting agentic and coding workloads, and Gemini Omni Flash, a multimodal model that generates and edits video from any combination of text, image, audio, and video inputs.
- Gemini 3.5 Flash scores 76.2% on Terminal-Bench 2.1, 1656 Elo on GDPval-AA, 83.6% on MCP Atlas, and 84.2% on CharXiv Reasoning, outperforming Gemini 3.1 Pro on all four benchmarks.
- Google states output tokens per second for 3.5 Flash are four times faster than other frontier models, and Artificial Analysis places it in the top-right “Leader” quadrant of its index.
- Gemini 3.5 Flash is available today via the Gemini app, AI Mode in Google Search, Google Antigravity, Gemini API in Google AI Studio and Android Studio, Gemini Enterprise Agent Platform, and Gemini Enterprise. Gemini 3.5 Pro is in internal use and planned for public release in June 2026.
- Gemini Omni Flash is rolling out to all Google AI Plus, Pro, and Ultra subscribers through the Gemini app and Google Flow, and at no cost to YouTube Shorts and YouTube Create App users. API access for developers and enterprise customers is scheduled for the coming weeks.
- Gemini 3.5 Flash is also the default model powering Gemini Spark, a personal AI agent announced at I/O 2026 that runs 24/7 under user direction, currently rolling out to trusted testers with a broader Beta for Google AI Ultra subscribers in the US planned for the following week.
- All videos created with Gemini Omni include an imperceptible SynthID digital watermark, verifiable through the Gemini app, Gemini in Chrome, and Google Search.
My take: Demis Hassabis keep talking abut the singularity, and how we are close to this “profound moment for humanity” when technology exceeds human control. But the models they released last week didn’t really back this up. Gemini 3.5 Flash is worse than Gemini 3.1 pro for coding, and Gemini 3.1 pro is worse than GPT-5.5 for coding. I see no other company representative talk as much about AGI and the singularity than Demis Hassabis, yet Google is the one with the poorest-performing agentic model for coding right now. I would really like to see them solve programming first, then they can go ahead and plan for the singularity after that.
Gemini Omni Flash however is an interesting model since it can create videos from any source material, meaning you can send in a video and ask it to augment or change things in it. I definitely see this one as becoming a hit on social media in the next few weeks. If you have a minute check out these 10 wild examples of Gemini Omni in use, it’s really something else!
Read more:
- Introducing Gemini Omni: Create Anything from Anything – YouTube
- Min Choi on X: “Ok Gemini Omni is insane. 10 wild examples.” / X
- Gemini 3.5 flash is not that great at coding : r/singularity
- Gemini 3.5 Flash is nowhere near Gemini 3.1 pro for coding : r/GeminiAI
Google Antigravity 2.0: Standalone Agent Platform with CLI, SDK, and New Pricing
https://antigravity.google/blog/introducing-google-antigravity-2-0

The News:
- Google Antigravity 2.0, announced at Google I/O 2026 on May 19, is a redesigned desktop application that adds multi-agent orchestration, a new CLI, and an SDK to what originally launched as an agent-first IDE in November 2025.
- The desktop app lets developers run multiple AI agents simultaneously, design custom subagent workflows, and schedule tasks to run automatically in the background.
- Antigravity 2.0 connects natively with Google AI Studio, Firebase, and Android; developers can export AI Studio projects directly to their local Antigravity instance with full context carried over.
- Google ships a new Antigravity CLI built in Go, described by the company as faster and more responsive than the Gemini CLI it replaces, alongside an SDK for building custom agents on Google’s coding infrastructure.
- The update adds native voice command support, consistent with Google’s rollout of voice input to Gmail and Docs.
- The default model is Gemini 3.5 Flash, which Google says outperforms Gemini 3.1 Pro on coding and agentic benchmarks and runs four times faster than competing frontier models; the model was itself co-developed using Antigravity.
- Consumer access to Gemini CLI and Gemini Code Assist IDE extensions ends June 18, 2026 for AI Pro, AI Ultra, and free-tier users; Enterprise Gemini Code Assist Standard and Enterprise licenses are exempt.
- Pricing adds a new AI Ultra tier at $100 per month (5x Pro limits); the existing top-tier AI Ultra drops from $250 to $200 per month (20x Pro limits), matching Anthropic’s Claude Max and OpenAI’s ChatGPT Pro tiered pricing.
My take: I’m not really sure what the target group is for this tool. Do you want use an agentic environment where you need to pay $200 per month and have to use models that’s worse than Opus 4.7 or GPT-5.5? If this was free or very cheap maybe this would have a market, but paying $200 per month for Gemini 3.5 Flash and Opus 4.6? The only reason I can think of for not supporting Opus 4.7 is that Google is locked in to a cheaper API licensing deal with Anthropic but that agreement is only for 4.6, but that is just speculation on my side. Let’s re-visit this once Google launches Gemini 3.5 Pro in a week or so.
Stability AI Releases Stable Audio 3.0 With Four Models and Six-Minute Compositions

The News:
- Stability AI released Stable Audio 3.0 on May 20, a family of four text-to-audio models for music and sound effects generation, all trained on fully licensed data. The company has partnerships with Universal Music Group and Warner Music Group to co-develop AI music tools.
- The family includes four models: Small SFX (459M parameters) for sound effects, Small (459M parameters) for music, Medium (1.4B parameters), and Large (2.7B parameters); Small SFX and Small generate up to 2 minutes; Medium and Large generate up to 6 minutes 20 seconds.
- Small runs on a MacBook Pro M4, generating 120 seconds of audio in 5.92 seconds; with CoreML acceleration that drops to 3 seconds. Medium requires approximately 6.5 GB VRAM, fits on an RTX 3060 or RTX 4060, and generates 6 minutes of audio in a few seconds.
- Small SFX, Small, and Medium are open-weight and free to download on Hugging Face; Large is available only via the Stability AI API and self-hosted paid services, with an enterprise license required for organizations exceeding $1 million in revenue.
- All models support audio inpainting in three modes (single segment, multi-segment, and continuation), as well as LoRA fine-tuning. Compute scales with requested duration, so a 9-second request runs only 9 seconds of compute rather than padding to the maximum length.
My take: If you have the need for cheap ambient beats or background music and want to use AI for it, this model should be on top of your list. The models are totally free for personal and creative use and easily can be run on your laptop. Just don’t expect any Suno-level quality out of it.
Read more:
OpenAI Releases ChatGPT for PowerPoint as a Native Add-In
https://chatgpt.com/apps/powerpoint

The News:
- OpenAI released a beta version of ChatGPT for PowerPoint on May 21, a sidebar add-in for Microsoft PowerPoint that accepts natural language prompts to create and edit presentation decks directly inside the application.
- Users can generate slides from scratch, revise existing slides, add sections, reformat content for specific audiences such as executive or customer-facing decks, and turn long-form documents or notes into structured presentations.
- The tool supports “Skills,” which are reusable prompt playbooks encoding style rules and formatting workflows, and can connect to external data sources via apps linked to a user’s ChatGPT account, subject to admin controls and permissions.
- ChatGPT for PowerPoint can analyze a finished deck to identify narrative gaps and surface likely audience questions.
- The add-in prompts users for confirmation before applying significant changes.
- Free and Go tier users receive limited usage access; Plus, Pro, Business, Enterprise, Edu, and K-12 accounts receive access subject to each plan’s agentic usage limits.
- This release follows the March 2026 launch of ChatGPT for Excel, which accompanied the GPT-5.4 release, extending OpenAI’s Office integration to a second Microsoft application.
My take: I have been using Claude for PowerPoint a lot the past weeks, and I really like it. Typically I use it with my own templates and ask it to just fill in the text, but for smaller presentations it’s actually quite good at creating the design itself. GPT-5.5 is not really up to par with Claude when it comes to design, but I know OpenAI is working on this, it might even be in the GPT-5.6 release coming soon. I have never enjoyed working in PowerPoint as a tool, it’s clunky and the Mac version lacks several shortcuts that power users depend on. Let’s all hope both Opus and GPT quickly get to the level of PowerPoint skills so we all never have to touch that piece software again by hand.

