Tech Insights 2026 Week 4

Last week Anthropic launched a new feature called Cowork: “Give Claude access to a folder, set a task, and let it work”. You can ask Claude to go through your receipts and give you a summary, sort your files, find a specific text in a note you have somewhere on your desktop, create a presentation based on your draft documents, and so on. Instead of having to upload documents and files through a chat window, Claude can now work directly on your computer, with your files.

This specific feature, Claude Cowork, was fully developed in just 1.5 weeks. And what’s even more interesting is that it was 100% written by AI. All of it. And this is for a product with over 30 million active users. For a software program that works with files on the user’s local hard drive.

People I talk to have a really hard time understanding how this can be possible.

This is not vibe-coding. This is agentic coding, and it’s something completely different. You set up strict rules for how your code should be created and you follow it up with even stricter rules that the AI must follow. It’s a completely new world opening up for those who know how to use it, and I see an enormous gap growing between those who adopt this technology and those who don’t even understand how it’s possible.

If you are one of those people who have a hard time understanding how this is possible – book three hours with yourself and download both OpenAI Codex and Anthropic Claude Code. Ask Claude to create something visual, like a web page. Then ask Codex to add logic to it. Make sure you are using Opus 4.5 and GPT-5.2-High. I promise that you will see the world slightly different after these three hours.

Thank you for being a Tech Insights subscriber!

Listen to Tech Insights on Spotify: Tech Insights 2026 Week 4 on Spotify

THIS WEEK’S NEWS:

  1. Chinese AI Developers Say They Can’t Beat America Without Better Chips
  2. OpenAI Partners with Cerebras for Low-Latency AI Inference
  3. OpenAI Launches Open Responses Specification
  4. Anthropic Releases Cowork for Non-Coding AI Agent Tasks
  5. Anthropic Launches Claude for Healthcare
  6. Google Launches Universal Commerce Protocol and Agentic Shopping Tools
  7. Replit Launches Native Mobile App Development with Natural Language Prompts
  8. TranslateGemma: A new suite of open translation models
  9. Google Gemini Adds Personal Intelligence with Connected Apps
  10. Black Forest Labs Releases FLUX.2 [klein] Image Generation Models

Chinese AI Developers Say They Can’t Beat America Without Better Chips

https://www.wsj.com/tech/ai/china-ai-race-us-chips-9e74b957

The News:

  • Chinese AI researchers stated at a Beijing conference that the technology gap with the United States is widening, not closing, due to severe chip shortages caused by US export restrictions.
  • Tang Jie, founder of Zhipu AI, said at the AGI-Next summit on January 10 that “the reality might be that the divide is actually increasing” despite China’s progress in specific sectors.
  • Justin Lin, technical lead of Alibaba’s Qwen AI models, estimated Chinese companies have less than 20% probability of matching OpenAI and Anthropic within three to five years.
  • Nvidia launched its Rubin hardware in January without including any Chinese AI developers among its clients, as US regulations prevent direct sales to China.
  • Chinese firms are considering leasing computing power from data centers in Southeast Asia and the Middle East to access Rubin chips, following similar workarounds attempted for Nvidia’s Blackwell series.
  • Lin noted that “a massive amount of OpenAI’s compute is dedicated to next-generation research, whereas we are stretched thin, just meeting delivery demands consumes most of our resources”.
  • The approved H200 chip for China is two generations behind the Rubin series and lacks sufficient power for training top-tier AI models.

My take: The new NVIDIA Vera Rubin series I wrote about in my last newsletter is a massive performance leap for training next generation AI models in the 10-trillion parameter range. Current state-of-the-art Chinese large language models (DeepSeek) are in the 0.7 trillion parameter range, and I don’t see how they can push these models up to 10+ trillion parameters without new hardware. It will just be too slow.

It’s the same with open source models. You could theoretically run the 400 billion parameter models on your own expensive hardware setups, but going above that requires massive investments that few companies are willing to make. My personal prediction is that 2027 will belong to the massive closed source models developed by Anthropic, Google and OpenAI.

OpenAI Partners with Cerebras for Low-Latency AI Inference

https://openai.com/index/cerebras-partnership

The News:

  • OpenAI partnered with Cerebras to deploy 750 megawatts of AI compute capacity through 2028, valued at approximately $10 billion according to sources.
  • Cerebras builds wafer-scale processors that place compute, memory, and bandwidth on a single dinner-plate sized chip, eliminating bottlenecks from conventional GPU clusters.
  • The deployment will roll out in multiple phases beginning in 2026, targeting workloads that require faster response times such as code generation, image creation, and AI agents.
  • Cerebras claims up to 15x faster inference compared to traditional GPU systems. Benchmarks from Artificial Analysis show Llama 4 Maverick (400B parameters) generates 2,522 tokens per second on Cerebras versus 1,038 tokens per second on Nvidia Blackwell, a 2.5x advantage.
  • For GPT-OSS-120B, Cerebras reaches 3,000+ tokens per second compared to Blackwell’s 650 tokens per second, approximately 5x faster.
  • OpenAI’s Sachin Katti stated the partnership adds “dedicated low-latency inference” to their compute portfolio, which includes existing agreements with Nvidia, AMD, and Broadcom.

My take: About the only downside with OpenAI GPT-5.2 in Codex is the speed. It is extremely slow compared to something like Claude Opus 4.5. This partnership should improve things considerably. OpenAI Codex with GPT-5.2-High is already good enough for almost every single task you can do in programming today, it just needs to be a little bit quicker. I am very much looking forward to Cerebras rollout out to OpenAI users during 2026.

OpenAI Launches Open Responses Specification

https://www.openresponses.org

The News:

  • OpenAI released Open Responses, an open specification for building multi-provider LLM interfaces based on the OpenAI Responses API launched in March 2025.
  • The specification standardizes primitives like messages, tool calls, and streaming events across providers including OpenAI, Anthropic, Gemini, and local models.
  • Streaming follows semantic events rather than raw text deltas, with 23 defined event types that provide structured updates during model execution.
  • The specification formalizes agentic loops where models autonomously execute tool calls, receive results, and continue reasoning within a single request.
  • Open Responses exposes three reasoning content fields: raw reasoning traces, encrypted content, and summaries, allowing open weight models to stream full reasoning while proprietary models can continue using encrypted content.
  • Providers can include model-specific configuration parameters through extensible fields without fragmenting the core specification.
  • OpenRouter, vLLM, and Ollama announced support at launch, with Hugging Face providing early access routing endpoints.

My take: If you have been building AI agents, you know that you need to customize the calls you make to each AI provider to handle different message formats, streaming events, and tool calling patterns. This can make it difficult to compare models or use multiple providers. Open Responses defines a shared schema that maps to multiple providers. You write your application once using the Open Responses format, and routing services like OpenRouter translate it to work with OpenAI, Anthropic, or local models without changing your code. The specification standardizes 23 streaming event types, tool invocation patterns, and response structures that previously varied between providers.

Let’s all hope this becomes a true global standard since it would make the lives of everyone writing AI agents so much easier. Companies notably absent from the launch include Anthropic and Google DeepMind.

Anthropic Releases Cowork for Non-Coding AI Agent Tasks

https://claude.com/blog/cowork-research-preview

The News:

  • Anthropic released Cowork on January 12, 2026, as a research preview for Claude Max subscribers on macOS. Cowork operates as an AI agent that can read, edit, and create files within user-designated folders.
  • The tool was built on the same foundation as Claude Code but targets non-coding workflows. Users grant Claude access to a specific folder, and Claude completes tasks with minimal back-and-forth interaction.
  • Cowork handles tasks such as reorganizing downloads by sorting and renaming files, creating spreadsheets from screenshot collections, and drafting reports from scattered notes.
  • Users can queue multiple tasks for Claude to complete in parallel. Cowork integrates with existing Claude connectors for external tools and supports new skills for document and presentation creation.
  • When paired with Claude in Chrome, Cowork can complete browser-based tasks by following links and acting within web applications.
  • The tool asks for approval before significant actions, but can execute potentially destructive operations like file deletion if instructed. Anthropic notes prompt injection risks remain an active concern.
  • Anthropic built Cowork in approximately one and a half weeks using Claude Code itself.

My take: This really feels like a good compromise between full computer-use and a chatbot. I have used Claude Code for similar tasks the past year from the terminal, where I launch Claude Code in a specific folder and ask it to organize things, summarize and structure items. It’s perfect for it. Claude Cowork just makes this feature even more accessible and easier to use. If you still haven’t tried Claude yourself then do yourself a favor and download it and ask it to write some texts for you or why not work as an agent in a controlled folder. It’s a very different experience compared to something like Copilot, and I think you will like it.

Anthropic Launches Claude for Healthcare

https://www.anthropic.com/news/healthcare-life-sciences

The News:

  • Anthropic introduced Claude for Healthcare, a HIPAA-ready enterprise AI platform that connects to Medicare databases, ICD-10 codes, and provider registries for administrative and clinical workflows.
  • The system accesses the CMS Coverage Database to verify local and national coverage determinations, supports prior authorization reviews by pulling coverage requirements and checking clinical criteria against patient records, and generates draft determinations with supporting materials.
  • Claude connects to the National Provider Identifier Registry for credentialing and network directory management, ICD-10 for diagnosis and procedure code lookups, and PubMed for biomedical literature retrieval.
  • New Agent Skills include FHIR development tools to reduce interoperability errors when connecting healthcare systems and a sample prior authorization review skill.
  • For consumer use, Claude Pro and Max subscribers in the US can connect personal health data through HealthEx and Function connectors (in beta), with Apple Health and Android Health Connect integrations coming.
  • Healthcare organizations including Banner Health, Stanford Healthcare, Novo Nordisk, Sanofi, and AbbVie are using Claude for administrative automation, regulatory documentation, and clinical trial analysis.

My take: Just one week after OpenAI launched ChatGPT Health, Anthropic launches Claude for Healthcare. Both platforms connect to similar health data sources (Apple Health, Function, MyFitnessPal) and target the same workflows. Choices are good, and my own experience with both GPT and Claude is that they are good at different things, so in the best of worlds medical staff will use both Claude and GPT in their daily work for different perspectives of the patient data.

Google Launches Universal Commerce Protocol and Agentic Shopping Tools

https://blog.google/products/ads-commerce/agentic-commerce-ai-tools-protocol-retailers-platforms

The News:

  • Google introduced the Universal Commerce Protocol (UCP), an open standard that creates a shared language for AI agents to execute commerce tasks across discovery, purchasing, and post-purchase support. UCP was co-developed with Shopify, Etsy, Wayfair, Target, and Walmart, and endorsed by over 20 companies including Adyen, American Express, Best Buy, Mastercard, Stripe, and Visa.
  • UCP works across multiple existing protocols including Agent2Agent (A2A), Agent Payments Protocol (AP2), and Model Context Protocol (MCP). The protocol eliminates the need for unique connections between each individual agent by providing standardized interfaces for checkout, identity verification, and order management.
  • Google launched Business Agent, which lets shoppers chat directly with brands on Search. The feature went live January 12 with Lowe’s, Michael’s, Poshmark, and Reebok. Retailers can activate and customize the agent through Merchant Center, and will later be able to train it on their own data and enable direct purchases within the chat.
  • UCP now powers native checkout within AI Mode in Search and the Gemini app for eligible U.S. retailers. Shoppers can complete purchases using Google Pay or PayPal without leaving the Google interface, while retailers remain the seller of record and can customize the integration.
  • Google introduced Direct Offers, a Google Ads pilot that displays retailer-specific discounts directly in AI Mode when Google’s AI determines a shopper has high purchase intent. The pilot currently features percentage discounts and will expand to include bundles and free shipping, with initial partners including Petco, e.l.f. Cosmetics, Samsonite, and Rugs USA.
  • Google added dozens of new data attributes to Merchant Center for conversational commerce, going beyond traditional keywords to include product question answers, compatible accessories, and substitutes.

My take: Google UCP builds on three other standards which I have mentioned earlier: Product discovery via MCP, Agent coordination via A2A and Secure payment via AP2. The main difference between UCP and for example OpenAI “Buy it in ChatGPT” is that UCP is an open standard, explicitly designed for interoperability across all AI platforms. Many things are going to change in 2026, and e-commerce is one of them. If no-one except AI agents visit your web site, how will that affect your business?

Read more:

Replit Launches Native Mobile App Development with Natural Language Prompts

https://blog.replit.com/mobile-apps

The News:

  • Replit announced Mobile Apps on Replit on January 14, 2026, enabling users to create and publish native iOS applications using natural language prompts without mobile development experience.
  • The platform builds React Native apps through Replit Agent, integrates with Expo Go for instant preview via QR code scanning, and publishes to the App Store with one click.
  • Stripe integration is included, allowing creators to monetize their applications directly.
  • Users can add databases, payment systems, and API integrations like OpenAI during development by describing requirements in plain language.
  • Replit launched a $15,000 Mobile Buildathon competition alongside the announcement to encourage developers to build mobile apps on the platform.
  • The announcement coincides with reports that Replit is raising a $400 million funding round at a $9 billion valuation.

My take: This is like Lovable but for mobile apps. You can build complete full-stack mobile apps with databases, payment systems and API-integrations. One side of me likes it, and another side is terrified of the App Store literally exploding with millions of buggy apps hacked together over a pizza.

TranslateGemma: A new suite of open translation models

https://blog.google/innovation-and-ai/technology/developers-tools/translategemma

The News:

  • Google released TranslateGemma, an open translation model suite built on Gemma 3, available in 4B, 12B, and 27B parameter sizes that covers 55 languages.
  • The 12B model outperforms the Gemma 3 27B baseline on WMT24++ benchmarks using MetricX, achieving higher translation quality with less than half the parameters.
  • The 4B model matches the performance of the baseline 12B model, designed for mobile inference.
  • TranslateGemma uses a two-stage training process that combines supervised fine-tuning on human-translated and synthetic Gemini-generated texts, followed by reinforcement learning using reward models including MetricX-QE and AutoMQM.
  • The models retain multimodal capabilities from Gemma 3, translating text within images on the Vistra benchmark without specific multimodal fine-tuning.
  • The team trained the models on nearly 500 language pairs beyond the 55 evaluated pairs, though these extended pairs lack confirmed evaluation metrics.
  • Human evaluation on the WMT25 test set across 10 language pairs and automatic evaluation on WMT24++ across 55 pairs showed substantial gains over baseline Gemma 3 models.

My take: I love these small and optimized models, and the way it’s trained is state-of-the-art. TranslateGemma is trained using a combination of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), where Google used a mix of human-translated texts and synthetic translations generated by Gemini 3 Pro for SFT and an ensemble of reward models for the RL. If you are creating applications for mobile or desktop where you need instant real-time translations without cloud access, this is the model you have been waiting for.

Google Gemini Adds Personal Intelligence with Connected Apps

https://blog.google/innovation-and-ai/products/gemini-app/personal-intelligence

The News:

  • Google launched Personal Intelligence as a beta feature in the U.S., connecting Gmail, Google Photos, YouTube, and Search to provide personalized responses across user data.
  • The feature remains off by default. Users select which apps to connect and can disconnect or delete chat history at any time.
  • Gemini retrieves specific details from emails, photos, and videos while reasoning across multiple data sources. One example: finding a license plate number from a photo, tire specifications from email records, and vehicle trim information to complete a transaction at a service shop.
  • The system does not train directly on Gmail inboxes or Google Photos libraries. Training uses specific prompts and responses after filtering or obfuscating personal data from conversations.
  • Personal Intelligence rolls out to Google AI Pro and AI Ultra subscribers in the U.S. over one week, with plans to expand to free tier users and additional countries.

My take: This is a very strong offering by Google, and while some people would never want an AI assistant to reason across their search, photo, YouTube and Gmail history, I believe most people will love it. All iOS users will get the same experience later this year when Apple integrates Gemini into Siri, and I’m quite sure it will do the same thing – reason across videos, photos, emails and notes. This is the future, and it will make those who adopt it super efficient.

Black Forest Labs Releases FLUX.2 [klein] Image Generation Models

https://bfl.ai/blog/flux2-klein-towards-interactive-visual-intelligence

The News:

  • Black Forest Labs released FLUX.2 [klein] on January 15, 2026, a family of compact image generation models that unifies text-to-image, image editing, and multi-reference generation in a single architecture with inference under one second.
  • The family includes 4B and 9B parameter variants in distilled and base versions. The distilled models use 4 inference steps and generate images in 0.5 seconds on modern hardware. The base versions preserve complete training signal for fine-tuning and research.
  • FLUX.2 [klein] 4B runs on consumer GPUs with 13GB VRAM (RTX 3090/4070) and is released under Apache 2.0 license. The 9B variant requires more VRAM and uses FLUX Non-Commercial License.
  • Black Forest Labs collaborated with NVIDIA to release FP8 and NVFP4 quantized versions. FP8 quantization achieves up to 1.6x faster inference with 40% less VRAM, while NVFP4 delivers up to 2.7x speedup with 55% less VRAM on RTX GPUs.
  • The 9B model uses an 8B Qwen3 text embedder and supports single-reference editing, multi-reference generation, and complex composition tasks at sub-second speed.

My take: If you want a AI model for image editing that you can use without paying API costs, this is a very good candidate. FLUX.2 klein is quick, provides high quality output, and can run locally on a regular consumer GPU. Every release from BFL is better and faster, and it will be interesting to see just how far they can push things in 2026.