• Tech Insights 2025 Week 38

    On Friday last week, Anthropic posted a status message saying: “We’ve identified the root causes of the reported quality issues and deployed mitigations for each. A technical post-mortem will be published on our engineering blog next week”. If you like me have been using Claude Code extensively the past months you know that it has behaved extremely inconsistent over the past three weeks. Sometimes it performed ok, but most of the times it performed really bad.

    For me this was not a showstopper since I could quickly switch to OpenAI Codex. Because when Anthropic went bad, OpenAI Codex with GPT5-High went the other way and started to perform exceptionally well. When it comes to raw coding skills I would currently rate Codex with GPT5-High (only the top model works well here) at maybe 9/10. It still has a tendency to overcomplicate things and it’s fairly slow, but it is exceptionally good at solving complex issues and creating clever solutions to difficult problems. The quality increase and feature development of OpenAI Codex over the past 4 weeks has been nothing short of outstanding!

    The current Claude Code with Claude 4.1 Opus I would rate at maybe 3/10, where Claude Code with Opus 4 in July/August was maybe 8/10. The only thing Claude Code is better at right now compared to OpenAI Codex is writing good inline comments and solid documentation. Based on my tests over the weekend, Claude Code with Opus 4.1 is still nowhere near the performance it showed a month ago – core skills like managing todo lists, adhering to rules files and coordinating agents are long forgotten.

    The reason I am posting this is simple – in a world where every month more parts of your company become dependent on emerging AI technologies to increase quality and productivity, you need to ensure you have the organizational structure in place to support it. If your employees depend on Claude Code and they suddenly report that it does not perform well, you must be able to quickly switch to another provider to keep momentum. Someone needs to have the mandate to choose and change models and providers quickly, and that person needs to be well informed on exactly what is happening within the AI world to not only be reactive, but proactive.

    Thank you for being a Tech Insights subscriber!

    Listen to Tech Insights on Spotify: Tech Insights 2025 Week 38 on Spotify

    THIS WEEK’S NEWS:

    1. Anthropic Claude Can Now Create Microsoft Office Files
    2. Microsoft Partners with Anthropic for Office 365 AI Features
    3. OpenAI Adds Full MCP Tool Support and Conversation Branching to ChatGPT
    4. ByteDance Launches Seedream 4.0 to Challenge Google’s Nano Banana
    5. Swedish Music Rights Group STIM Launches AI Licensing Framework
    6. ElevenLabs Launches Voice Remixing Alpha Feature
    7. Alibaba Releases Qwen3-Next Architecture and Trillion-Parameter Model
    8. Google Launches A2A Extensions for Agent-to-Agent Protocol Customization
    9. Stability AI Launches Enterprise-Focused Audio Generation Mode

    Anthropic Claude Can Now Create Microsoft Office Files

    https://www.anthropic.com/news/create-files

    The News:

    • Claude can now create and modify Excel spreadsheets, Word documents, PowerPoint presentations, and PDFs through conversational prompts within Claude.ai and the desktop application. Users receive downloadable files instead of text responses, converting raw data into polished outputs with formulas, charts, and multiple sheets in minutes.
    • The feature runs in a private computer environment where Claude executes code behind the scenes to produce files. Users can upload data or documents and instruct Claude to transform them into different formats, such as converting PDF reports into PowerPoint slides or meeting notes into formatted documents.
    • File creation requires enabling “Upgraded file creation and analysis” in Settings > Features and is available to Max, Team, and Enterprise subscribers. Pro subscribers will get access over the coming weeks.
    • Claude creates financial models with scenario analysis, project trackers with automated dashboards, and budget templates with variance calculations. The system handles cross-format conversions and produces spreadsheets with working formulas from conversational instructions.

     “We’ve given Claude access to a private computer environment where it can write code and run programs to produce the files and analyses you need.”

    My take: Did you know that if you take a Word .docx file, an Excel .xlsx file, or a PowerPoint .pptx file and rename them to “.zip”, you will be able to decompress them and view what’s inside? All files we use today in our office environments are compressed zip files, with complex proprietary metadata files stored within. To produce the highest quality office documents you need to run the actual Office programs to do it, and this is exactly what Anthropic has done here. They run the full office suite in a private computer environment and remote control the office suite with scripts to produce content documents. It works very well, and before we get full computer access (maybe within 1-2 years) this setup will work wonders for producing office documents in the old “.—x” format. I haven’t written a line of software code in the past year but I have produced hundreds of thousands lines of code, and I cannot wait until I no longer have to start Word, Excel and Powerpoint but still be able to produce hundreds of high-quality reports and presentations every year.

    Microsoft Partners with Anthropic for Office 365 AI Features

    https://www.theinformation.com/articles/microsoft-buy-ai-anthropic-shift-openai

    The News:

    • Microsoft will integrate Anthropic’s Claude Sonnet 4 into Office 365 applications including Word, Excel, Outlook, and PowerPoint, ending exclusive reliance on OpenAI for AI features in its productivity suite.
    • The move comes after internal testing showed Claude Sonnet 4 outperforms OpenAI’s models in specific tasks like creating PowerPoint presentations and automating Excel financial functions.
    • Microsoft will pay Amazon Web Services for access to Anthropic’s models, creating an unusual arrangement where Microsoft purchases AI from a cloud competitor that invested heavily in Anthropic.
    • Office 365 Copilot pricing remains at $30 per user per month despite the dual-model integration, with OpenAI continuing to power some features while Anthropic handles more advanced tasks.
    • The partnership reaches Microsoft’s 430 million Office 365 subscribers, providing Anthropic significant new distribution compared to OpenAI’s consumer-focused reach.

    My take: If you have used both Microsoft 365 Copilot and OpenAI ChatGPT you know how different the responses can be even when you ask both the same question (which is interesting since they both use GPT-5). There are many reasons for this, but one is that there is actually lots of “magic” happening before the model itself processes the request and sends back the text. Microsoft has done quite a lot of tweaking so the quality of Copilot has increased quite a lot the past six months, but in my experience people who have access to both services (ChatGPT and M365 Copilot) still tend to prefer using ChatGPT mainly because they know how to prompt it. It will be interesting to see how it works with Claude Sonnet 4. I use Claude 4 Sonnet and Opus for most of my technical writing (except this newsletter which is hand-written), and if I get the same results with M365 Copilot as with Claude then this could be a potential game-changer for Microsoft (going from a model that’s slightly worse than ChatGPT for technical documents to a model that’s clearly better than ChatGPT for technical documents).

    OpenAI Adds Full MCP Tool Support and Conversation Branching to ChatGPT

    https://platform.openai.com/docs/guides/developer-mode

    The News:

    • OpenAI introduced full Model Context Protocol (MCP) support in ChatGPT Developer Mode, expanding beyond read-only operations to include write actions that can modify external systems and trigger workflows.
    • The feature enables developers to build custom connectors that interact with tools like Jira, Zapier, GitHub, and CRMs directly through chat conversations, turning ChatGPT from a query interface into an automation platform.
    • Developer Mode requires Plus or Pro accounts and is currently in beta, with OpenAI warning of security risks including prompt injection attacks and potential data access by compromised MCP servers.
    • ChatGPT also now supports conversation branching, allowing users to fork discussions at any point while preserving the original thread and context.
    • Users can hover over any message, select “More actions,” and choose “Branch in new chat” to create parallel exploration paths without losing their original conversation.

    My take: I think we see a clear direction where these tools are heading. Microsoft is going the strict office-worker route, where M365 Copilot is built straight into office applications and you do not get interactions like MCP. Then on the other hand you have ChatGPT and Claude that allows advanced users to automate their work by connecting the AI models to basically any tool you can imagine. We are moving further and further away from a situation where one tool fits all, so if you still have not locked all your employees into one tool in your company I strongly recommend you at least evaluate the option to roll out more than one chat client. Some users are simply more comfortable with more advanced AI tools, especially if they use them in their spare time.

    ByteDance Launches Seedream 4.0 to Challenge Google’s Nano Banana

    https://seed.bytedance.com/en/seedream4_0

    The News:

    • ByteDance released Seedream 4.0, combining text-to-image generation and image editing capabilities in a unified AI tool that competes directly with Google’s Gemini 2.5 Flash Image (Nano Banana).
    • The model generates 2K resolution images in approximately 1.8 seconds using a new architecture that accelerates image inference by more than 10 times compared to previous versions.
    • ByteDance claims Seedream 4.0 outperformed Gemini 2.5 Flash Image on its internal MagicBench evaluation for prompt adherence, alignment, and aesthetics, though these results were not published in an official technical report.
    • Seedream 4.0 costs $0.03 per image on Fal.ai compared to Gemini 2.5 Flash Image’s $0.039 per image, with bulk pricing at $30 per 1,000 generations.
    • The tool merges capabilities from Seedream 3.0 (text-to-image) and SeedEdit 3.0 (image editing) while maintaining the same pricing as the previous generation.

    My take: If you have a few minutes, I really recommend you visit the Bytedance Seedream 4 web page to get a feeling for just how far we have come with AI-generated images today. They main thing here is not the actual quality of the images, which in itself is insanely good, but the way you can instruct these AI models to “here is a photo of me sitting at a bench, render a new photo from the back” and it produces it just like magic. As a photographer, these new AI engines both feel and behave almost like magic, and it will be very interesting to see where it evolves from here.

    Read more:

    Swedish Music Rights Group STIM Launches AI Licensing Framework

    https://www.reuters.com/business/media-telecom/sweden-launches-ai-music-licence-protect-songwriters-2025-09-09

    The News:

    • STIM, representing 100,000 Swedish songwriters and composers, introduced the world’s first collective AI licensing system that allows AI companies to train on copyrighted music while paying royalties to creators.
    • The framework requires mandatory attribution technology (Sureel) to track AI-generated outputs in real time, ensuring transparency and proper compensation for artists whose works are used in training data.
    • Stockholm-based startup Songfox became the first company to operate under the license, enabling users to create legal AI-generated songs and covers through a controlled pilot program.
    • AI companies pay through a mix of licensing fees and revenue shares, with compensation flowing during model training and downstream consumption of AI outputs.
    • CISAC estimates that AI could reduce music creators’ income by 24% by 2028, while generative AI outputs in music could reach $17 billion annually by the same year.

    My take: A while ago Songfox were recruiting for a CTO, and you could then read in the ad that Songfox has “a board and existing investors from Universal, Live Nation, the founder of Betsson, and a full-scale team within growth, AI, product, and deep tech”. What they don’t have is an actual company, if you try to look them up on allabolag.se there is no registered company in Sweden called Songfox. So I am not sure I would call this an actual launch, but more like a small limited usage experiment with a virtual organization that might become a real company if they get enough interest and investment money.

    ElevenLabs Launches Voice Remixing Alpha Feature

    https://www.youtube.com/watch?v=wjnT5NahQY0

    The News:

    • ElevenLabs released Voice Remixing in alpha, which modifies core attributes of existing voices while preserving their unique identity.
    • Users can adjust gender, accent, speaking style, pacing, and audio quality through natural language prompts on voices they own.
    • The feature includes four prompt strength levels from subtle changes (Low) to complete transformation (Max), allowing precise control over modifications.
    • Voice Remixing supports iterative editing, where users can continue refining voices based on previously remixed versions.
    • The tool works with cloned voices (Instant Voice Clone and Professional Voice Clone) and voices created through Voice Design.

    My take: ElevenLabs never stops surprising! This new Voice Remixing tool “transforms existing voices by allowing you to modify their core attributes while maintaining the unique characteristics that make them recognizable”. The main use case is for audio book producers that really want to fine-tune the voices of each character to match the plot. It’s so fun to see all thees new services that are becoming available as AI technology advances, especially in video and audio generation.

    Read more:

    Alibaba Releases Qwen3-Next Architecture and Trillion-Parameter Model

    https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list

    The News:

    • Alibaba unveiled Qwen3-Next-80B-A3B-Base, an 80-billion-parameter model that activates only 3 billion parameters during inference while matching Qwen3-32B performance. The sparse Mixture of Experts architecture reduces training costs by 90% and delivers 10x faster inference throughput.
    • The company released two post-trained versions: Qwen3-Next-80B-A3B-Instruct for general conversational tasks and Qwen3-Next-80B-A3B-Thinking for complex reasoning chains. The thinking variant outperforms Gemini-2.5-Flash-Thinking on multiple benchmarks and approaches their flagship Qwen3-235B-A22B-Thinking-2507 performance.
    • Alibaba also introduced Qwen3-Max-Preview with over 1 trillion parameters and a 262,144-token context window. The model outperforms Qwen3-235B-A22B-2507 across benchmarks including Arena-Hard v2 (86.1 vs 79.2) and AIME25 (80.6 vs 70.3).
    • Qwen3-ASR supports 11 languages with automatic detection and maintains under 8% word error rate in noisy environments. The model accurately transcribes songs with background music and accepts contextual text input to bias transcription toward specific vocabulary.
    • The Qwen3-Next architecture combines gated attention with DeltaNet layers using 512 experts, routing 10 per token plus one shared expert. Multi-token prediction capabilities accelerate speculative decoding for faster response generation.

    My take: The driving factor behind Qwen3-Next architecture is a larger context window and total parameter scaling. This is the first time we have Chinese models surpassing 1 billion parameters with a 262k context window, but user feedback so far has been very mixed. It’s one thing to increase the context window and parameters, and another to make a model that actually can make good use of it. A common theme of Alibaba’s models also seem to be to always drive them towards maximum efficiency, which is probably required due to the lack of NVIDIA GPU’s within mainland China.

    Google Launches A2A Extensions for Agent-to-Agent Protocol Customization

    https://developers.googleblog.com/en/a2a-extensions-empowering-custom-agent-functionality

    The News:

    • Google announced A2A Extensions, enabling developers to add custom functionality to the Agent-to-Agent protocol beyond core communication features.
    • Extensions work through Agent Cards (JSON files) where agents declare supported custom methods and requirements, identified by unique URIs.
    • The system supports three extension types: data-only extensions for compliance information, profile extensions for protocol requirements, and method extensions for new RPC functions.
    • Twilio created a latency extension allowing agents to broadcast response times, enabling intelligent routing to the most responsive agent available.
    • Adobe adopted A2A for agent interoperability across enterprise systems, while S&P Global Market Intelligence uses it for scalable agent communication.

    My take: A2A got critique after its launch in April about its one-size-fits-all approach, and this new A2A Extensions layer aims to fix that. Developer response has so far been mixed, and even while some companies say they are using it, very few actual production examples exists so far. It’s just a very complicated architecture for very rare use cases.

    Stability AI Launches Enterprise-Focused Audio Generation Mode

    https://stability.ai/news/stability-ai-introduces-stable-audio-25-the-first-audio-model-built-for-enterprise-sound-production-at-scale

    The News:

    • Stability AI released Stable Audio 2.5, an audio generation model designed specifically for enterprise-grade sound production and music creation, addressing the gap where only 6% of creative campaigns incorporate custom audio despite custom sound making brands eight times more memorable (according to Ipsos research).
    • The model generates tracks up to three minutes in length with inference speeds under two seconds on H100 GPUs, using Adversarial Relativistic-Contrastive (ARC) training method developed by Stability AI’s research team.
    • Audio inpainting capability allows users to upload existing audio files, select a starting point, and have the model generate the continuation using contextual information to complete tracks.
    • The system produces multi-part musical compositions with structured intro, development, and outro sections, while responding to mood descriptors like “uplifting” and musical terminology such as “lush synthesizers”.
    • Stability AI partnered with WPP’s audio agency Amp to provide custom sound identity services for enterprise clients, with the model trained on fully licensed datasets to ensure commercial compliance.

    My take: As soon as you read “enterprise-focused” you know it probably means higher price, slower release cycles and lower quality than the consumer-focused options. And in this case it is right on the spot. If you are looking for bland generic audio bits to paste into your internal corporate presentations then this is the audio engine for you. For almost everyone else there are much better alternatives out there.

    Read more: