“My name is Steve Chen. I was the CTO and one of the co-founders of YouTube. I love what you’ve built with Notebook Navigator 2.2! I wanted to send out a note of gratitude.” Steve contacted me last Tuesday, and is just one of hundreds of users that have reached out to me saying that Notebook Navigator + Obsidian fundamentally changed the way they are writing and organizing notes. This is how you get “paid” as an open source developer, and it’s one of the best feelings in the world to have created something that other people love to use.
We are witnessing the re-birth of “single person developers”. Just like Dennis Caswell wrote the game Impossible Mission in 10 months back in 1984, or Jordan Mechner created Prince of Persia in 3 years before it was released in 1989, AI is now making it possible for single person developers to create incredibly advanced pieces of software to be released to millions of users.
The best recent example of this is OpenClaw (previously called ClawdBot and MoltBot). OpenClaw was built by Peter Steinberger and was first released in November 2025. By January 2026 the repository had surpassed 100,000 stars, becoming one of the fastest-growing GitHub repositories in history. You can read more about OpenClaw in the newsletter below.
The creative power of individual programming experts augmented with the latest AI models is an amazing tour de force. There are no limitations other than creativity, and a single software development expert can literally produce the same amount of code as a 30-man expert team given the complexities of scaling software teams in size. If you have studied software development methodologies you know the challenges. I think we will see many exciting single-person releases like OpenClaw in 2026, and it has the potential to shake up things considerably.
Thank you for being a Tech Insights subscriber!
Listen to Tech Insights on Spotify: Tech Insights 2026 Week 6 on Spotify
THIS WEEK’S NEWS:
- Clawdbot Reaches 100,000+ GitHub Stars, Rebrands Twice to OpenClaw
- OpenAI Launches Prism, Free LaTeX Editor Powered by GPT-5.2
- Google Launches Gemini 3-Powered Browser Agent in Chrome
- Google Introduces Agentic Vision in Gemini 3 Flash
- Google Launches Project Genie for Interactive World Creation
- Google Invests in Sakana AI, Forms Strategic Partnership
- DeepSeek Launches OCR 2 with Visual Causal Flow Technology
- MCP Apps Brings Interactive Dashboards and Forms Directly Into AI Conversations
- Microsoft Maia 200: AI Inference Accelerator with 30% Cost Improvement
- NVIDIA Releases Earth-2 Family of Open Weather Models
- Alibaba Releases Qwen3-Max-Thinking Reasoning Model
- Moonshot AI releases Kimi K2.5 With Agent Swarm and Multimodal Capabilities
- Vercel Launches React Native Skills Package for AI Coding Agents
- Mistral Vibe 2.0 Adds Custom Subagents and Clarification Prompts
- xAI’s Grok Imagine Takes Top Spot in Video Generation Benchmarks
Clawdbot Reaches 100,000+ GitHub Stars, Rebrands Twice to OpenClaw
https://github.com/openclaw/openclaw

The News:
- Clawdbot launched in late December 2024 as a self-hosted AI assistant that runs on personal hardware and integrates with messaging platforms including WhatsApp, Telegram, Discord, Slack, Signal, and iMessage.
- The project remained relatively obscure until January 26, 2026, when it gained 9,000 GitHub stars in a single day during a viral surge, reaching 10,200 stars by that evening.
- The repository crossed 100,000 stars on January 29, 2026, approximately seven days after going viral, surpassing the growth rate of legendary projects like the Linux kernel.
- Core capabilities include browser control via Chrome/Chromium with CDP, shell command execution, file system operations, persistent memory stored locally in Markdown format, and proactive notifications that reach out to users.
- The system supports local models via Ollama and cloud models from multiple providers, with monthly API costs ranging from free for local-only setups to $70-150 for heavy automation use.
- On January 27, 2026, Anthropic issued a trademark request due to the similarity between “Clawd” and “Claude”, forcing a rebrand to Moltbot, and then another rebrand to OpenClaw.
My take: If you missed Clawdbot, which was renamed to Moltbot, then OpenClaw, then here’s a quick summary. Clawdbot is an AI agent that acts, learns and adapts over time. You install it on your computer and hook it up to any LLM service through APIs or run a local model. You then hook it up to a messaging service like iMessage, Telegram, Slack or Discord and tools like terminal commands and web browser. After that you can basically ask it to do things for you, and it will do that, on your computer. Search for things on the Internet, check how a stock is performing during the day, or why not remind you about things at specific times. OpenClaw remembers and learns, and this is probably the closest you can get to a personal virtual AI assistant today. Just don’t install it on your main computer.
If you’re interesting about technical details about OpenClaw, I highly recommend [this article about the OpenClaw memory system]((20) Manthan Gupta on X: “How Clawdbot Remembers Everything” / X) – it is quite advanced and very efficient.

Read more:
- Fireship: The wild rise of OpenClaw… – YouTube
- Forbes: Moltbot Gets Another New Name, OpenClaw, And Triggers Security Fears And Scams
- Manthan Gupta on X: “How Clawdbot Remembers Everything” / X
- Clawdbot bought me a car | AJ Stuyvenberg
- Alex Finn on X: “Just to see what would happen I texted Henry my Clawdbot to make a reservation for me next Saturday at a restaurant” / X
- Shruti on X: “I Spent 40 Hours Researching Clawdbot. Here’s Everything They’re Not Telling You.” / X
OpenAI Launches Prism, Free LaTeX Editor Powered by GPT-5.2
https://openai.com/index/introducing-prism

The News:
- OpenAI released Prism on January 27, 2026, a free cloud-based LaTeX workspace with GPT-5.2 integrated directly into the writing workflow. It addresses scientific writing fragmentation by consolidating drafting, citations, equations, and collaboration in one environment.
- Prism builds on Crixet, a cloud-based LaTeX platform that OpenAI acquired. The acquisition provided a mature writing and collaboration foundation that OpenAI integrated with GPT-5.2 Thinking.
- The platform supports unlimited projects and collaborators at no cost. ChatGPT personal account holders can access Prism immediately without LaTeX installation or environment setup.
- GPT-5.2 works within the project context, accessing paper structure, equations, references, and surrounding text. Users can search arXiv literature, convert whiteboard equations to LaTeX, and make direct in-place document edits through voice or text commands.
- OpenAI demonstrated Prism generating a graduate-level general relativity lesson plan and problem sets. The model can revise text based on newly identified related work and refactor equations, citations, and figures across the paper.
My take: The top online LaTeX editing tool today is Overleaf, and early user feedback from Prism indicates that Overleaf wins on stability and maturity, where OpenAI Prism is best for AI-assisted generation and automation. Having a tool like Prism available for free is amazing, and I am glad to see that LaTeX is still going strong with tools like this being actively developed. If you are a university student then you definitely want to try out Prism and see how it works for you.
Read more:
Google Launches Gemini 3-Powered Browser Agent in Chrome
https://blog.google/products-and-platforms/products/chrome/gemini-3-auto-browse

The News:
- Google released Gemini in Chrome with auto browse capabilities on MacOS, Windows, and Chromebook Plus. Built on Gemini 3, the integration adds an AI assistant in a side panel that handles multi-step web tasks without requiring users to switch tabs.
- Auto browse is available to AI Pro and Ultra subscribers in the U.S. The feature autonomously completes tasks such as researching flight and hotel costs across date ranges, scheduling appointments, filling forms, collecting tax documents, obtaining contractor quotes, and managing subscriptions.
- The system includes multimodal capabilities that identify objects in images, search for similar items, add them to shopping carts within budget constraints, and apply discount codes. When users grant permission, auto browse accesses Google Password Manager to handle sign-in requirements.
- Gemini in Chrome connects with Gmail, Calendar, YouTube, Maps, Google Shopping, and Google Flights through Connected Apps integrations. For example, the assistant retrieves event details from emails, references Google Flights for recommendations, and drafts arrival confirmation messages to colleagues.
- Chrome supports the Universal Commerce Protocol, an open standard co-developed with Shopify, Etsy, Wayfair, and Target. Announced on January 11, 2026, at the National Retail Federation conference, UCP enables AI agents to discover merchant capabilities, negotiate supported functions, and complete transactions.
- The browser includes Nano Banana, an image transformation tool that modifies images directly in the current window without requiring downloads or re-uploads. Auto browse pauses before sensitive actions such as purchases or social media posts to request explicit user confirmation.
My take: In December Anthropic launched Claude for Chrome, and now we have Gemini in Chrome. Both use similar permission-based architectures where the AI observes browser content and requests approval before taking action.
Now, if you have tried Claude for Chrome you know how extremely slow it is before it takes action, and my personal opinion is that it’s still way too early to roll out features like this to end-users. Current models like Gemini 3 and Sonnet 4.5 are simply not good at browsing the web. Sonnet has just reached decent proficiency with the command line terminal, something Gemini 3 still struggles with. In practice I think it will take at least a year before most of us allow AI models to browse the Internet for us, probably longer.
Google Introduces Agentic Vision in Gemini 3 Flash
https://blog.google/innovation-and-ai/technology/developers-tools/agentic-vision-gemini-3-flash

The News:
- Agentic Vision in Gemini 3 Flash converts image understanding from a static process into an active investigation using code execution.
- The model follows a Think, Act, Observe loop where it analyzes the query, generates and executes Python code to manipulate images (cropping, rotating, annotating), and appends transformed images back into its context window.
- Enabling code execution delivers a consistent 5-10% quality boost across most vision benchmarks.
- PlanCheckSolver.com, a building plan validation platform, improved accuracy by 5% by using Gemini 3 Flash to iteratively crop and analyze high-resolution building plans to verify compliance with building codes.
- The model can annotate images by drawing bounding boxes and labels, parse high-density tables and execute Python code to create visualizations, and implicitly zoom when detecting fine-grained details like serial numbers or distant text.
- While zooming is currently implicit, other capabilities like rotating images or performing visual math require explicit prompt nudges but will become fully implicit in future updates.
- Agentic Vision is available via the Gemini API in Google AI Studio and Vertex AI, and is rolling out in the Gemini app under the Thinking model selection.
My take: If you just read the news it can be a bit hard to understand exactly what this does, so let me break it down. When you send an image and a question to Agentic Vision in Gemini 3 Flash, instead of just looking at the entire image, the model can now write Python code to manipulate the image (crop it, rotate it, draw on it, etc), create a new version of the image, and then look at the new image to answer your question.
For example. You upload a photo of a microchip. The model sees there’s a serial number but it’s tiny. It writes Python code to crop just that area (zooming in), runs the code, then reads the serial number from the zoomed crop. You show it a table with numbers. It writes Python code to extract the numbers, perform calculations (like normalizing data), and even generate a matplotlib chart. Then it looks at the chart it created to verify the results. Or you upload a large architectural drawing. The model writes code to crop specific sections (like roof edges), creates those cropped images, analyzes each one separately, then combines findings to check code compliance. If you have ever worked with adaptive crop regions in OpenCV processing chains you quickly see the benefits of this.
Agentic Vision just shows that we now have another scaling parameter to consider when we benchmark how good LLMs are at specific tasks: the agentic scale factor. The better models are at performing agentic tasks (like doing the proper image preprocessing to do a better job at analyzing images) the better the end-results will be. Just like most AI models are quickly moving to thinking variants, I also believe most AI models will become agentic-first models over the year, that are highly skilled in knowing how to use any tool available to provide better results.
Google Launches Project Genie for Interactive World Creation
https://blog.google/innovation-and-ai/models-and-research/google-deepmind/project-genie

The News:
- Google DeepMind released Project Genie, a web prototype that lets Google AI Ultra subscribers in the U.S. create and explore interactive worlds using text prompts and images.
- The system uses Genie 3, a world model that generates environments in real time at 720p and 24fps as users navigate through them.
- World Sketching integrates with Nano Banana Pro, allowing users to preview and modify their world before entering, and to select first-person or third-person perspective.
- Users can remix existing worlds from a curated gallery, adjust the camera during exploration, and download videos of their sessions.
- Sessions are limited to 60 seconds, and Google notes that generated worlds may not always adhere to prompts or real-world physics, with characters experiencing occasional control latency.
My take: When you look at the introduction video for Project Genie, it looks almost magical. Describe your scene, look at a 2D render of it, adjust it as needed, then kick off Genie 3 and you will be able to walk around in a virtual world that looks just the way you described it. Genie 3 uses an autoregressive frame-by-frame approach to generate the next frame based on the current frame and your actions. The main question here is if this will ever be good enough to create interactive “holodecks” that looks exactly like the real world with persistent temporal memory, or if it will be like AR glasses where they never really got to the point where they are comfortable enough, cheap enough and good enough for mass market use.
Read more:
Google Invests in Sakana AI, Forms Strategic Partnership

The News:
- Google invested in Sakana AI, a Tokyo-based AI startup valued at $2.6 billion following its $135 million Series B round in November 2025.
- The partnership grants Sakana AI access to Google’s Gemini and Gemma models for product development, while Sakana will provide customer feedback to improve Google’s AI ecosystem quality.
- Sakana AI will deploy solutions through Google Cloud infrastructure for regulated sectors including financial institutions and government organizations that require strict security and data sovereignty controls.
- The investment positions Google to expand Gemini adoption in Japan’s enterprise market through Sakana’s local relationships and technical expertise.
- Sakana AI has secured service contracts with Mitsubishi UFJ Financial Group and entered the financial sector, while planning to strengthen cooperation with the Japanese government.
My take: Sakana AI is famous for their AI agent that in early 2025 wrote an article that passed the peer-review process at a workshop in a top international AI conference. With the power of Gemini 3 Pro and being able to publish solutions through Google Cloud, I wouldn’t be surprised if we will see fully autonomous research agents being launched during 2026 from Sakana.
Read more:
DeepSeek Launches OCR 2 with Visual Causal Flow Technology
https://github.com/deepseek-ai/DeepSeek-OCR-2

The News:
- DeepSeek released OCR 2 on January 27, 2026, a 3-billion-parameter vision-language model that processes documents using semantic reasoning instead of fixed grid patterns.
- The model introduces DeepEncoder V2 architecture with “visual causal flow” technology that dynamically reorders image components based on context, mimicking human reading patterns.
- OCR 2 scored 91.09% on OmniDocBench v1.5, a 3.73% increase over the previous version, while using only 256-1,120 visual tokens per document page.
- The model reduced repetition rates in production environments from 6.25% to 4.17% for online user images and from 3.69% to 2.88% for PDF processing.
- DeepSeek OCR 2 uses a mixture-of-experts (MoE) decoder architecture and supports dynamic resolution handling from 0-6 tiles at 768×768 plus one tile at 1024×1024.
- Fine-tuning tests on Persian language tasks showed character error rate improvements of 57% for OCR 1 baseline and 86% for OCR 2, with an overall 88.6% improvement.
My take: This model even outperforms Gemini 3 on OmniDocBench, with very low error rates. Compared to Mistral OCR which has 3.7% error rate, DeepSeek OCR 2 only has 2.3% error rate. This is a strong VLM (Vision-Language model) and if you are working with document processing or OCR you should definitely check it out.
Read more:
- How to Use DeepSeek-OCR 2 ?
- Arxiv: DeepSeek-OCR 2: Visual Causal Flow
- deepseek-ai/DeepSeek-OCR-2 · Hugging Face : r/LocalLLaMA
MCP Apps Brings Interactive Dashboards and Forms Directly Into AI Conversations
https://blog.modelcontextprotocol.io/posts/2026-01-26-mcp-apps

The News:
- MCP Apps became the first official extension to the Model Context Protocol on January 26, 2026. The extension lets AI assistants display interactive components like dashboards, forms, visualizations, and multi-step workflows instead of plain text responses.
- A sales analytics tool can return an interactive dashboard where users filter by region, drill into specific accounts, and export reports without leaving the conversation. A deployment tool presents a configuration form with dependent fields that change based on selections. A contract analysis tool displays PDFs inline with highlighted clauses users can click to approve or flag.
- Example applications include 3D visualization, interactive maps, document viewing, real-time system monitoring dashboards, and sheet music notation. The components run in sandboxed environments with security restrictions and auditable communication between the UI and the host application.
- Client support includes Claude (web and desktop), Goose, Visual Studio Code Insiders, and ChatGPT (rolling out this week). Nick Cooper from OpenAI said “We’re proud to support this new open standard and look forward to seeing what developers build with it as we grow the selection of apps available in ChatGPT”.
- MCP server developers can now ship interactive experiences that work across multiple AI clients without writing client-specific code. The extension addresses what David Soria Parra from Anthropic describes as a “context gap between what tools can do and what users can see”.
My take: If you have a minute, go watch the launch video by Anthropic. MCP Apps means that any tool you have access to is now accessible inside your chat client, so it can read and create content in any of these apps directly. If you have been reading my newsletters for a while you know that I believe these “chat clients” over time will evolve into your main portal for everything – shopping, working and personal life. MCP Apps is another great example of this, and you will soon be able to do most of your daily work inside Claude or ChatGPT. Microsoft supports this in VSCode but not Copilot, and I think the way forward for Microsoft here is a tricky one. The more open these standards become, the less incitament it will be for companies to use proprietary office formats.
Read more:
Microsoft Maia 200: AI Inference Accelerator with 30% Cost Improvement
https://blogs.microsoft.com/blog/2026/01/26/maia-200-the-ai-accelerator-built-for-inference

The News:
- Microsoft launched Maia 200, an AI inference accelerator built on TSMC’s 3nm process with 140 billion transistors. The chip delivers 30% better performance per dollar than Microsoft’s current generation hardware and runs GPT-5.2 models from OpenAI in Microsoft Foundry and Microsoft 365 Copilot.
- The chip provides 10 petaFLOPS in FP4 precision and 5 petaFLOPS in FP8 precision within a 750W TDP envelope. Memory includes 216GB HBM3e at 7 TB/s bandwidth and 272MB on-chip SRAM.
- Maia 200 uses a two-tier scale-up network design built on standard Ethernet with 2.8 TB/s of bidirectional bandwidth per accelerator. Clusters scale to 6,144 accelerators without proprietary fabrics.
- Microsoft deployed Maia 200 in its US Central datacenter near Des Moines, Iowa, with US West 3 near Phoenix, Arizona coming next. Time from first silicon to first datacenter rack deployment was reduced to less than half that of comparable AI infrastructure programs.
- The Maia SDK preview includes PyTorch integration, Triton compiler, optimized kernel library, simulator, and cost calculator. The SDK provides low-level programming access through NPL and full Azure integration using the same tooling as GPU fleets.
My take: This is a good, solid hardware release by Microsoft. Inference chips are used to run AI models, and while NVIDIA chips are best-in-class at training new AI models, they are not very efficient at running them. Maia 200 delivers 10 petaFLOPS FP4 and 5 petaFLOPS FP8 at 750W TDP, which is about half the power consumption of Nvidia’s B300 (1400W) while offering better efficiency metrics. For once Microsoft is also getting good at naming these things – MAIA stands for Microsoft AI Accelerator, which is both logical and a good name. Microsoft is pushing AI hardware, AI services and AI models now, and it will be very interesting to see if they are able to catch up in the model race when they launch MAI-2 (their own LLM) later this year.
NVIDIA Releases Earth-2 Family of Open Weather Models
https://blogs.nvidia.com/blog/nvidia-earth-2-open-models

The News:
- NVIDIA released Earth-2, the first fully open AI weather forecasting stack, including pretrained models, inference libraries, training frameworks, and customization tools that run on local infrastructure.
- Earth-2 Medium Range (Atlas architecture) delivers 15-day global forecasts across 70+ weather variables and outperforms leading open models like GenCast on standard industry benchmarks.
- Earth-2 Nowcasting (StormScope architecture) generates zero to six-hour precipitation forecasts at kilometer resolution and represents the first AI system to surpass physics-based models for short-term storm prediction.
- Earth-2 Global Data Assimilation processes raw observation data from satellites and ground stations without requiring traditional data assimilation infrastructure, expected for release later in 2026.
- Israel Meteorological Service reports 90% reduction in compute time at 2.5-kilometer resolution compared to CPU-based numerical weather prediction.
- The stack integrates existing models including FourCastNet, CorrDiff, ClimateNet, and models from ECMWF, Microsoft, and Google.
- Models are available through Earth2Studio, Hugging Face, and GitHub for deployment on sovereign infrastructure.
My take: It was just a few months ago (November 2025) when Google DeepMind launched WeatherNext 2, and now NVIDIA releases a model that’s according to them outperforms Google DeepMind’s GenCast (the predecessor architecture to WeatherNext 2) on more than 70 weather variables. WeatherNext 2 however surpasses the original GenCast-based WeatherNext model on 99.9% of variables, so right now it’s difficult to know how WeatherNext2 and Earth-2 compares against each other. That said, Earth-2 looks like a strong model and models are freely available for anyone to use.
Read more:
Alibaba Releases Qwen3-Max-Thinking Reasoning Model
https://twitter.com/Alibaba_Qwen/status/2015805330652111144

The News:
- Qwen3-Max-Thinking is Alibaba’s flagship reasoning model with over 1 trillion parameters, trained on 36 trillion tokens with a 260k token context window.
- The model matches GPT-5.2-Thinking, Claude Opus 4.5, and Gemini 3 Pro across 19 benchmark tests, covering factual knowledge, complex reasoning, instruction following, and agent capabilities.
- Adaptive tool-use lets the model automatically invoke Search, Memory, and Code Interpreter functions during conversations without manual setup.
- Test-time scaling uses multi-round self-reflection with an “experience-cumulative” mechanism that reuses intermediate reasoning across rounds.
- Benchmark improvements include GPQA Diamond (90.3 to 92.8), LiveCodeBench v6 (88.0 to 91.4), and HMMT February (98.0).
- The model is available at chat.qwen.ai and through an OpenAI-compatible API for developers who register with Alibaba Cloud.
“In several real-world scenarios — especially structured reasoning, technical problem-solving and tool-heavy tasks — Qwen didn’t just keep up with ChatGPT. In some cases, it actually worked better.” Amanda Caswell, Tom’s Guide
My take: From what I have seen so far, Alibaba seems to have cracked the code on how to create a large GPT-5 class model that actually works well for everyday use. Two weeks ago Google DeepMind CEO Demis Hassabis said that Chinese AI companies are around 6 months behind western frontier AI models, and I just can’t stop wondering if he would have said the same thing today after Qwen3-Max-Thinking is released. If you are curious how it feels to chat with Qwen3-Max-Thinking on a daily basis compared to ChatGPT I recommend this article on Tom’s Guide.
Read more:
- I replaced ChatGPT with Alibaba’s new reasoning model for a day — here’s what Qwen3-Max-Thinking does better | Tom’s Guide
- DeepMind CEO Says Chinese AI Firms Are 6 Months Behind the West – Bloomberg
Moonshot AI releases Kimi K2.5 With Agent Swarm and Multimodal Capabilities
https://www.kimi.com/blog/kimi-k2-5.html

The News:
- Moonshot AI released Kimi K2.5 on January 26, 2026, an open-source model with 1 trillion parameters (32 billion active) trained on 15 trillion mixed visual and text tokens. The model targets coding, vision tasks, and multi-agent orchestration.
- Agent Swarm orchestrates up to 100 dynamically created sub-agents across 1,500 tool calls per task. Testing shows 4.5x execution time reduction compared to single-agent setups and 80% reduction in end-to-end runtime for complex workflows.
- The model handles video-to-code generation. In demonstrations, K2.5 reconstructs websites from video input and creates front-end interfaces with scroll-triggered effects and interactive layouts from text prompts.
- Kimi Code ships as an open-source terminal tool that integrates with VS Code, Cursor, and Zed. It accepts images and videos as inputs and automatically discovers existing skills and MCPs in the working environment.
- On SWE-Bench Verified, K2.5 scores 36.8% versus Gemini 3 Pro at 35.4%. On SWE-Bench Multilingual, K2.5 reaches 47.4% compared to GPT 5.2 at 45.3% and Gemini 3 Pro at 44.7%. On VideoMMMU, it scores 70.8% versus GPT 5.2 at 68.3% and Claude Opus 4.5 at 67.9%.
My take: Out of all current state-of-the-art models for code generation today, I only feel comfortable using GPT-5.2 High and Extra High. And those models are much larger than 1 trillion parameters. I would never bother using <1TB models like Kimi for coding simply because the quality of the code is not up to my standards. But for solving quick tasks it’s probably fine. This means that for these open source models to actually deliver production quality source code they need to be much larger, which also means you need a much more complicated hardware stack to run it. Buying an 8x stack of 80GB A100 cards to run Kimi K2.5 is something most companies can do, but getting the hardware to run a 3 terabyte model is something completely different. Very few companies in the world can host these models. This is why Alibaba Qwen3-Max-Thinking is closed source, and this is also why I think most upcoming large models in 2026 will be closed source too, including those from Meta. It makes little sense releasing an open model if almost no-one can run it.
Read more:
Vercel Launches React Native Skills Package for AI Coding Agents
https://skills.sh/vercel-labs/agent-skills/vercel-react-native-skills

The News:
- Vercel released vercel-react-native-skills, a structured knowledge package that provides React Native and Expo best practices to AI coding agents through the Skills.sh ecosystem.
- The package contains 16 rules across 8 priority-ranked categories covering list performance, animations, navigation, UI patterns, state management, rendering, monorepo configuration, and platform-specific optimizations.
- Critical-priority rules include using FlashList for large lists (with memoized items and stabilized callbacks), animating only GPU-friendly properties (transform and opacity), and using native stack navigators over JavaScript-based alternatives.
- High-priority UI patterns specify expo-image for all image rendering, Pressable instead of TouchableOpacity, native context menus and modals, and StyleSheet.create or Nativewind for styling.
- The package installs via “npx skills add vercel-labs/agent-skills” and activates automatically when AI agents detect relevant React Native tasks.
- Each rule includes incorrect and correct code examples with explanations, following the Agent Skills format with SKILL.md activation instructions and AGENTS.md compiled documentation.
My take: I don’t usually write about developer frameworks in this newsletter, but I thought this one was worth mentioning. React Native is an area that has grown a lot recent years, and the way you write React Native-code today is not the way you did it 5-7 years ago. This package is a “skill” that you add to Codex or Claude Code, which basically tells it current best practices for writing React Native applications. I really like the concept of skills, and this is the first time I have seen a package that really makes a difference for how an AI would approach and solve problems actual. If you are working with React Native you should download this skill right after you finished reading the newsletter.
Mistral Vibe 2.0 Adds Custom Subagents and Clarification Prompts
https://mistral.ai/news/mistral-vibe-2-0

The News:
- Mistral released Vibe 2.0 on January 27, 2026, their terminal-native coding agent powered by Devstral 2.
- Custom subagents perform targeted tasks like PR reviews, test generation, and deployment scripts, invoked on demand.
- Multi-choice clarifications prompt users with options when intent is ambiguous instead of guessing.
- Slash commands load preconfigured workflows for common tasks like deploying, linting, or generating documentation.
- Unified agent modes combine tools, permissions, and behaviors in configurations that switch without changing tools.
- Available on Le Chat Pro ($10/month for students, standard pricing not disclosed) and Team plans with pay-as-you-go credits beyond included usage.
- Devstral 2 moved to paid API access, with free usage remaining on the Experiment plan in Mistral Studio.
- Enterprise add-ons include fine-tuning on internal DSLs, reinforcement learning, and code modernization services.
My take: We have had terminal-based coding for less than a year now, and there are already features that everyone expects from these kinds of tools: subagents, multi choice clarifications, slash commands, and so on. It’s great to see Mistral keeps improving Vibe, and for companies that for whatever reason cannot use services from US companies like Anthropic, Google or OpenAI, this is probably your best choice for agentic coding.
xAI’s Grok Imagine Takes Top Spot in Video Generation Benchmarks
https://x.ai/news/grok-imagine-api

The News:
- xAI released Grok Imagine Video via public API, claiming state-of-the-art performance across quality, cost, and latency for AI video generation.
- The model ranks number one on Artificial Analysis Video Arena leaderboards for both text-to-video and image-to-video generation, surpassing competitors including Runway Gen-4.5, Google’s Veo models, and Kling 2.5 Turbo.
- Pricing is set at $4.20 per minute of generated video, including native audio synthesis, representing a competitive rate in the video generation market.
- The API offers three specialized endpoints: text-to-video generation, image-to-video animation, and video editing with AI-powered transformations.
- Videos generate up to 15 seconds in length with synchronized audio, built on xAI’s Aurora model architecture using Mixture-of-Experts framework.
My take: Looking at the example videos I would say Grok Imagine Video is around on par with the other video engines out there, but when it comes to cost it’s the cheapest. For companies creating apps that need AI generated videos through API I think this will be a hit.
Read more:
