Tech Insights 2025 Week 45

Last week Cursor, Cognition and Canva announced their own custom foundation models to reduce their dependencies of Anthropic, Google and OpenAI. Being dependent on other companies AI models to do their processing, the more features these SaaS companies add to their products the harder it will be to negotiate pricing with the model providers. Without owning their own models, their products are worthless unless they pay Anthropic, Google or OpenAI what they ask. A good example of this is Anthropic that last week launched Excel integration in Claude, where it can now work directly within Microsoft Excel just like M365 Copilot. The main difference here is that if you are a Copilot user you will now get a worse experience in Excel since they both use the same provider, and Anthropic can afford more GPU usage and better models when you subscribe to Claude directly from them. This is why also Microsoft are building their own AI models inhouse.

This is an interesting race to follow. Cognition said they trained their foundation model SWE-1.5 for Windsurf on a “state-of-the-art cluster of thousands of GB200 NVL72 chips”. Impressive yes, but next year OpenAI will have literally millions of GPUs to train their next models upon. The race is over before it even started. I see very few companies emerging as winners in this AI race. Most of these AI SaaS companies will never be able to catch up hardware-wise and will somewhere along the race be stuck in an infrastructure they can no longer afford to upgrade. Assuming OpenAI hardware infrastructure construction proceeds as planned, I fully expect most “code tool” AI SaaS services to have to rethink their entire business model in 1-2 years.

Do you use Udio or Suno to make AI generated music? Do you pay for them? Last week Udio announced their new “deal” with Universal Music Group and the same day they also changed their Terms of Service which now explicitly forbids you to download or export the music you have created on their platform. Udio also immediately disabled all downloads on the platform. After some intense hours Udio managed to open up a small 48 hour download window, so all users can download their generated songs on November 3 and 4 (today and tomorrow). If you have songs there, you should go ahead and download them now. If you are a Suno user, you should do the same because they are in the exact same process with the exact same companies. I guess we all could see this would happen, and still people seem so surprised once it happened.

Thank you for being a Tech Insights subscriber!

Listen to Tech Insights on Spotify: Tech Insights 2025 Week 45 on Spotify

THIS WEEK’S NEWS:

  1. Cursor Releases Composer Coding Model and Multi-Agent Interface
  2. Cognition Releases New Coding Agent SWE-1.5 on Windsurf
  3. Canva Launches Design Foundation Model with Editable Layers
  4. GitHub Agent HQ Integrates Multiple Coding Agents
  5. OpenAI Releases Open-Weight Safety Models With Flexible Policy-Based Moderation
  6. OpenAI Launches Aardvark Security Agent Powered by GPT-5
  7. Claude for Excel Beta Launches With Limited Access
  8. Universal Music Group Settles With Udio, Disables All Song Downloads
  9. Microsoft Introduces Researcher with Computer Use in Microsoft 365 Copilot
  10. Microsoft 365 Copilot Adds App Builder and Workflows Agents
  11. Perplexity Launches AI Patent Search Tool
  12. Grokipedia: xAI’s AI-Powered Encyclopedia Launches with 800,000 Articles
  13. MiniMax Releases MiniMax-M2 Open-Source Model
  14. IBM Releases Granite 4.0 Nano Models for Edge Computing

Cursor Releases Composer Coding Model and Multi-Agent Interface

https://cursor.com/blog/2-0

The News:

  • Cursor released Composer, its first proprietary coding model, alongside Cursor 2.0, a redesigned interface centered on multi-agent workflows rather than file-based editing.​
  • Composer completes most coding tasks in under 30 seconds and generates at 250 tokens per second, approximately twice the speed of fast-inference models.​
  • The model was trained using reinforcement learning on real software engineering tasks with production tools including codebase-wide semantic search, file editing, and terminal commands, making it capable of working across large codebases.​
  • Cursor 2.0 supports up to eight agents running in parallel through git worktrees or remote machines, allowing developers to run multiple agents on the same problem and select the best output.​
  • The platform includes a native browser tool with Chrome DevTools integration, allowing agents to test web applications, inspect DOM elements, and iterate on changes without leaving the IDE.​
  • Cursor offers a free Hobby plan with limited features, Pro at $20/month, Pro+ at $60/month with 3x usage credits, and Ultra at $200/month with 20x usage credits.

My take: You know my take on this if you have followed my newsletters for a while now: I believe every single AI SaaS company need to own their LLMs to not be outplayed by the LLM owners in both features and pricing. Just like Microsoft and Windsurf just launched their own custom models, Cursor is now launching their own large language model called Composer. The main problem with this is that all the models launching from Microsoft, Windsurf and now Cursor are much worse performing than even the base models from Anthropic and OpenAI (Sonnet 4.5 and GPT-5) and they are nowhere near the performance of high-end models like Opus 4 and GPT5-CODEX-HIGH.

To me there are two clear paths forward here: the first one is that next generation models from Anthropic, Google and OpenAI will already next year be so good at programming that you no longer need things like Lovable, Cursor, Windsurf or GitHub Copilot. There is actually a very good chance for this, OpenAI GPT5-CODEX-HIGH is already today performing better than the best programmers in the world. If this happens then most of these third party coding tools will need to rethink their business proposition entirely. If this does not happen, and it takes Anthropic, Google and OpenAI 2-3 years to get that level of performance, there is still a small window of opportunity to train their own models to try to catch up. So even when the top models can do ALL programming, if these models can do, say, 80% of that programming at a cheaper price, than that value proposition will appeal to some developers for some use cases. This is the slot these services are aiming for with their custom models – being a cheaper alternative for specific development tasks.

Cognition Releases New Coding Agent SWE-1.5 on Windsurf

https://cognition.ai/blog/swe-1-5

The News:

  • Cognition released SWE-1.5, a software engineering agent model with hundreds of billions of parameters, focusing on speed and robust coding capabilities.
  • Delivered in partnership with Cerebras inference, the model achieves up to 950 tokens per second, 6x the speed of Haiku 4.5 and 13x that of Sonnet 4.5.
  • SWE-1.5 is trained on new GB200 hardware and leverages thousands of concurrent virtual machines for real-time code execution and web access.
  • The model demonstrates near-state-of-the-art performance on SWE-Bench Pro, reportedly reducing error rates in multi-step tasks and completing coding tasks such as Kubernetes manifest edits in under five seconds.
  • Rigorously evaluated through a mix of synthetic, rubric, and human-in-the-loop grading, the training focused on practical, high-fidelity agentic environments.

My take: Compared to Cursor’s new Composer model, the new SWE-1-5 model by Cognition is way faster at 950 tokens per second. However in my experience SWE-Bench Pro does not reflect almost anything when it comes to the actual source code quality the model is able to produce, and how many re-tries it takes to get to the final result. Cognition mentioned that they trained SWE-1.5 on a “state-of-the-art cluster of thousands of GB200 NVL72 chips”, and sure this is an amazing achievement. But compare this to OpenAI who will have access to MILLIONS of GPUs next year, and you kind of see where this is going. If Windsurf is working fine for you today, then good, this new SWE-1.5 model is probably exactly what you are looking for. But if you are not a Windsurf customer I would not recommend that you sign up for a yearly subscription.

Canva Launches Design Foundation Model with Editable Layers

https://techcrunch.com/2025/10/30/canva-launches-its-own-design-model-adds-new-ai-features-to-the-platform/

The News:

  • Canva released a foundational model trained on design elements that generates designs with editable layers and objects rather than flat images. The model addresses the limitation where users must prompt their way to final results in visual mediums.​
  • The model works across social media posts, presentations, whiteboards, and websites. Users can start with a prompt and iterate directly on components themselves.​
  • The company also made its AI assistant available throughout the interface, including design and elements tabs. Users can tag the assistant in comments to get text or media suggestions during collaborative projects.​
  • Canva connected its spreadsheet product with its app-building feature, allowing users to create data visualization widgets from spreadsheet data.​
  • The company launched Canva Grow, an all-in-one marketing platform combining asset creation with performance analytics from its acquired MagicBrief tool. Marketers can publish ads directly to platforms like Meta.​
  • Canva announced Affinity will be free forever and redesigned the interface to merge vector, pixel, and layout tools. The integration allows designers to create objects in Affinity and move them into Canva while accessing Canva AI within Affinity.
  • The announcement positions Canva as a “Creative Operating System” that connects design, collaboration, publishing, and performance. The company serves 95% of the Fortune 500 with $3.5 billion in annualized revenue.

My take: Compared to Adobe Firefly, Canva’s design model is different in that it produces layered, editable designs instead of flat images. Canva’s model is described as medium-sized and trained specifically to understand design structure, style, and spatial relationships, without going into details about parameter counts or benchmark results. As of June 2025 Canva has over 230 million monthly active users, up from 200 million active users in October 2024, where around 10% of those are paying users. The launches from Canva last week are a big deal, both their new foundation model and the release of Affinity for free (Canva bought Affinity last year, and Affinity are famous for their products Affinity Designer, Affinity Photo and Affinity Publisher). If you are working with graphic design today and are not using Canva already, I’d definitely take the time to check it out in detail.

Read more:

GitHub Agent HQ Integrates Multiple Coding Agents

https://github.blog/news-insights/company-news/welcome-home-agents/

The News:

  • GitHub announced Agent HQ, a platform that integrates third-party coding agents from Anthropic, OpenAI, Google, Cognition, and xAI into a single interface within GitHub, VS Code, mobile, and CLI for paid Copilot subscribers.​
  • The platform includes mission control, a centralized dashboard where developers assign tasks to multiple agents, track their progress, and manage agent access with identity controls and audit logging.​
  • Plan Mode in VS Code asks clarifying questions before generating code, building step-by-step project plans to identify gaps and missing decisions before implementation.​​
  • AGENTS.md files let developers customize agent behavior with project-specific instructions, development environment tips, testing commands, and code style guidelines stored in the .github/agents directory.​
  • The platform provides one-click conflict resolution, improved file navigation, code review agents using CodeQL, metrics dashboards, and integrations with Slack and Linear.​
  • OpenAI Codex became available this week for Copilot Pro Plus users in VS Code Insiders, with additional agents from Anthropic, Google, Cognition, and xAI rolling out over the next several months.

My take: This is one of those news that is very hard to understand what it means if you’re not already knee-deep into agentic development. Before Agent HQ existed, vendors like OpenAI had to create and maintain their own interface to “hook into” GitHub Actions. OpenAI has their own repo called codex-action which does exactly this, “Run Codex from a GitHub Actions workflow while keeping tight control over the privileges available to Codex”. The problem with this approach is that some agents are better at some tasks than other – where GPT5-CODEX is amazing for problem solving code structures, Claude 4.5 Sonnet is better at writing readable documentation. This meant you had to integrate different hooks into GitHub Actions, each with their own custom configuration file.

Agent HQ standardizes this. You just configure Agent HQ once, with a standardized AGENTS.md file (which is quickly emerging as the standard way to control agents) and do not have to bother with .cursorrules.windsurfrulesCLAUDE.md and .github/copilot-instructions.md. You can then just enable access for providers like Google Jules, Codex or Claude Code from Agent HQ, and then use the agent that’s best for each task. This makes for a very simple setup with maximum flexibility. This could very well be a game changer for agentic software development, and is one of the biggest threats so far when it comes to isolated tools like Cursor and Windsurf.

Read more:

OpenAI Releases Open-Weight Safety Models With Flexible Policy-Based Moderation

https://openai.com/index/introducing-gpt-oss-safeguard

The News:

  • OpenAI released gpt-oss-safeguard in 120 billion and 20 billion parameter versions as open-weight models under Apache 2.0 license. The models classify content based on developer-provided policies at inference time, eliminating the need to retrain when policies change.​
  • The models use chain-of-thought reasoning to classify user messages, completions, and full chats according to custom policies. Developers can review the reasoning process to understand how each classification decision was reached.​
  • Both models are fine-tuned versions of gpt-oss and available on Hugging Face. The 20 billion parameter model fits into GPUs with 16GB VRAM.​
  • On internal multi-policy accuracy tests, gpt-oss-safeguard-120b achieved 46.3% accuracy compared to 43.2% for GPT5-Thinking, despite the smaller model size.​
  • OpenAI developed this approach internally as “Safety Reasoner”, which accounts for up to 16% of total compute in some recent product launches including GPT-5, ChatGPT Agent, and Sora 2.​
  • OpenAI developed the models with Discord, SafetyKit, and Robust Open Online Safety Tools (ROOST). ROOST is launching a community hub for researchers using AI models to improve online safety.

My take: If you have a public web site and you allow user comments, you know how difficult it is to handle spam posts. Using traditional content moderation classifiers means you have to retrain your models every time your policies change. The gpt-oss-safeguard changes this since it allows platforms to update moderation rules and deploy changes in hours instead of weeks by providing policies at inference time rather than baking them into training. The fact that this model did a better job at multi-policy tasks than GPT5-Thinking is huge, and having it released as open source I expect this model will be integrated and made the default content moderation platform into most online workflows within the next few months.

OpenAI Launches Aardvark Security Agent Powered by GPT-5

https://openai.com/index/introducing-aardvark

The News:

  • OpenAI released Aardvark, an autonomous security agent powered by GPT-5 that scans code repositories to identify, validate, and propose patches for software vulnerabilities.​
  • The agent analyzes entire repositories to build threat models, monitors new commits, and examines code changes against security objectives without relying on traditional fuzzing or static analysis methods.​
  • Aardvark validates potential vulnerabilities by attempting to trigger them in isolated sandbox environments before flagging issues, reducing false positives that typically burden development teams.​
  • The system integrates with GitHub and OpenAI Codex to generate patches that developers can review and deploy with one click, keeping humans in the final approval loop.​
  • In benchmark testing on known vulnerability repositories, Aardvark identified 92% of known and synthetic vulnerabilities.​
  • The agent discovered 10 previously unknown security flaws in open-source projects during testing, which received official CVE identifiers.​
  • OpenAI plans to offer pro-bono scanning for select non-commercial open-source repositories as part of its commitment to securing the software supply chain.​
  • The tool operates in private beta as an invite-only web application, with no public release date announced.

My take: All code we write will soon be written by AI, and all code we submit will soon also be verified by an AI like Aardvark. And AI will do a better job than any human at both these tasks. If you still cannot connect the dots and see where this is heading, consider how much time your organization spend writing code, finding bugs, fixing bugs and verifying code. All that time will go to zero in just 1-2 years. Then plan for that.

Claude for Excel Beta Launches With Limited Access

https://claude.com/claude-for-excel

The News:

  • Anthropic released Claude for Excel, an add-in that integrates Claude directly into Microsoft Excel through a sidebar interface. The tool reads, analyzes, modifies, and creates Excel workbooks with transparency, tracking every cell and formula it references.​
  • The beta research preview limits access to 1,000 users from Max, Team, and Enterprise plan customers through a waitlist.​
  • Claude for Excel provides cell-level citations when answering questions about workbooks, debugs formula errors with root cause identification, and builds new financial models or fills existing templates.​
  • The release includes eight new financial data connectors linking Claude to real-time market data from providers including Moody’s, LSEG, Egnyte, and Aiera.​
  • Users can update workbook assumptions while Claude preserves formula dependencies and navigate multi-tab workbooks.

My take: Ouch, this was not good news for Microsoft. The key selling point for M365 Copilot is the Excel integration, and here comes Anthropic and does exactly the same. And Anthropic will be able to offer premium models to handle data processing, something Microsoft would have to pay a fortune to access over API. I wonder if Microsoft knew about this when they announced that Claude 4.5 Sonnet will be included as an option in M365 Copilot since it’s so good at document processing. This and the new PowerPoint functionality makes Claude a serious contender for Copilot. It just stresses the need for Microsoft to improve their own LLMs, quickly.

Universal Music Group Settles With Udio, Disables All Song Downloads

https://www.universalmusic.com/universal-music-group-and-udio-announce-udios-first-strategic-agreements-for-new-licensed-ai-music-creation-platform

The News:

  • Universal Music Group settled its copyright infringement lawsuit against AI music generator Udio and announced plans to launch a joint licensed AI music platform in 2026.​
  • Udio disabled all song downloads on October 29, 2025 without prior warning, permanently blocking users from exporting audio, video, and stems they created, even songs made before the announcement.​
  • The platform now operates as a walled garden where users can only stream their creations within Udio, with no ability to download or use files externally.​
  • The settlement includes a financial payment and licensing agreements for UMG’s recorded music and publishing catalog, creating new revenue streams for artists and songwriters.​
  • The planned 2026 platform will allow users to remix songs and create music in artist styles, with artists who opt in receiving compensation for both model training and when their songs are remixed.​
  • Following severe user backlash and potential legal threats, Udio announced a temporary 48-hour download window opening November 3, 2025 at 11:00 AM ET, closing November 5, 2025 at 10:59 AM ET.​
  • During this window, users can download existing creations under the previous terms of service that allowed commercial use and full ownership rights.​
  • Udio compensated subscribers with 1,200 bonus credits, increased Pro tier simultaneous song creation limits to 10, and provided 1,000 non-expiring credits to all users.​
  • UMG separately announced a strategic alliance with Stability AI to develop professional music creation tools powered by responsibly trained generative AI.

My take: I guess we all could see this coming, right? You can continue to use Udio, but you are no longer allowed download or export the songs you generate. You are only allowed to listen to them on the Udio web page, everything else is illegal. Udio initially did not plan to make current songs downloadable, but managed to open up a limited time slot for it for 2 days on November 3-4. I can only guess they had to pay a massive amount of money to UMG for this time slot, but maybe it was less than the risk of being sued by thousands of angry paying users. “The other” AI music generation tool Suno is still in legal battle with the same major labels: UMG, Sony Music and Warner, and they still have not announced any download limitations in their service. If you are a Suno user I would very much advice you to stop paying your monthly subscription and go download all your content while you still can. It’s probably just a matter of days or weeks before that too will be blocked.

Read more:

Microsoft Introduces Researcher with Computer Use in Microsoft 365 Copilot

https://techcommunity.microsoft.com/blog/microsoft365copilotblog/introducing-researcher-with-computer-use-in-microsoft-365-copilot/4464766

The News:

  • Microsoft 365 Copilot launched Researcher with Computer Use, an AI agent that operates a virtual computer to navigate web content, access authenticated sites, and generate documents while connecting to enterprise data.​
  • Computer Use provisions an ephemeral Windows 365 virtual machine that runs in an isolated cloud environment with a headless browser, terminal shell, and text browser. The sandbox connects to the web through a network proxy with safety classifiers that validate each request against the user’s original task.​
  • The agent clicks buttons, fills forms, navigates interfaces, and executes code while showing users real-time screenshots of its actions. Users must grant explicit consent before Researcher takes actions or logs into websites.​
  • On the BrowseComp benchmark measuring multi-step web browsing tasks, Researcher with Computer Use improved performance by 44% compared to the previous version. On the GAIA benchmark testing real-world data reasoning, the system achieved a 6% improvement.​
  • Researcher solved a BrowseComp task requiring it to piece together information from financial reports, press releases, and corporate filings across multiple websites to answer “In the late 2010s, a company operating under an unconventional management structure featuring multiple CEOs assisted with brain surgery. How many meetings did the company’s Board of Directors hold in 2022?”.​
  • Access to enterprise data remains disabled by default when Computer Use activates. Users select which data sources to enable through a sources menu. Administrators control which security groups can access the feature and configure domain allow lists and block lists.​
  • The feature rolled out last week through the Frontier program for Microsoft 365 Copilot licensed customers.

My take: Technically, this computer use browser model can access any web site that you have access to, since it can authenticate with your credentials. Legally however, it’s another story. Many web sites do not allow robots to access the web page, but in this case identifying this as an autonomous robot is complicated. The setup runs a headless browser in a virtual machine, and should you start disabling access for this on your web site you also prevent users from using web page preloads etc. If you decide to use this service, then before you enter any credentials at all to this system, please verify in detail what you are allowed to automatically access each service you want to access. If automated access is prohibited you do NOT want to start scraping the web site with your logged in account, this is a good way to get your account permanently banned at any online service.

Microsoft 365 Copilot Adds App Builder and Workflows Agents

https://www.microsoft.com/en-us/microsoft-365/blog/2025/10/28/microsoft-365-copilot-now-enables-you-to-build-apps-and-workflows/

The News:

  • Microsoft 365 Copilot now includes new App Builder and Workflows agents for no-code app creation and automation.
  • App Builder creates dashboards, calculators, and trackers using conversational input and Microsoft Lists, all within the Copilot chat interface.
  • Users can preview, adjust, and share apps directly from Copilot, with support for Excel, Word, PowerPoint, and SharePoint data.
  • Workflows agent generates automations—such as reminders, status updates, or emails—across Outlook, Teams, Planner, and other Microsoft 365 services based on plain language instructions.
  • Both features are rolling out to Frontier program users with data governance managed through existing Microsoft 365 admin controls.

My take: If you are working deep in the Microsoft eco system and have an M365 Copilot license, this is definitely something you should investigate. Both App Builder and Workflows look super interesting in automating simple everyday tasks, and flows can be run across Outlook, Teams, SharePoint, Planner, and services like Approvals. I guess my only concern here is how bloated Copilot starts to become. They really cram everything they can in there, what was once a chat client is now a full blown app builder. I understand the need to move fast, but throwing everything straight into M365 Copilot app is maybe not the right way to proceed with this.

Perplexity Launches AI Patent Search Tool

https://www.perplexity.ai/hub/blog/introducing-perplexity-patents

The News:

  • Perplexity Patents is an AI research agent that searches patent documents using natural language queries instead of keyword strings. The tool is free during beta, with Pro and Max subscribers receiving higher usage quotas and model configuration options.​
  • The system breaks down queries into information retrieval tasks executed on a patent knowledge index. It returns collections of relevant patents with inline viewers and links to original documents.​
  • Semantic search expands beyond exact matches. A search for “fitness trackers” returns patents filed under “activity bands”, “step-counting watches”, and “health monitoring wearables” even when those terms are not in the query.​
  • The tool searches beyond patent databases to include academic papers, software repositories, and other sources where innovations first appear. Follow-up questions maintain context without requiring users to restart searches.​
  • Perplexity Patents covers the U.S. Patent and Trademark Office database. The platform provides AI-generated summaries and suggested follow-up topics.

My take: This offer sits somewhere in-between Google Patents where you can search across multiple patent offices with international coverage, and professional platforms like PatBase, Derwent Innovation, and Patsnap. Perplexity Patents if free during the beta, and I guess the main usage of it is to quickly check if someone else has a patent or related patent for an idea you might have. Still, my experience with Perplexity when it comes to reliable web page analysis is quite bad (it often hallucinates and draws the wrong conclusions) so I am not really sure how good this engine would be for patent work. It would be interesting to hear your experiences with it if you have tried it.

Grokipedia: xAI’s AI-Powered Encyclopedia Launches with 800,000 Articles

https://www.nytimes.com/2025/10/27/technology/grokipedia-launch-elon-musk.html

The News:

  • xAI has launched Grokipedia, an open-source AI encyclopedia featuring over 800,000 articles created by its Grok language model.
  • Grokipedia states its aim as providing an alternative to Wikipedia, with every article generated, checked, and updated by AI instead of human volunteers.
  • The platform’s first version includes articles on technology, politics, and current events; many entries are adapted from Wikipedia under Creative Commons licensing.
  • Users cannot edit entries directly but can submit corrections through a reporting button, with AI reviewing and incorporating verified updates.
  • Access to Grokipedia requires an X Premium or Premium+ subscription, as it is part of the Grok AI assistant integrated into X (formerly Twitter).

My take: Imagine if you took every single page of Wikipedia, and fed it to an LLM, and then asked it to create a new variation of that page. You would then end up with something like Grokipedia. Most pages are very similar to their Wikipedia counterpart, but with lots of errors and the pages are also typically way larger while containing the same amount of actual information. If you have time I recommend that you read the Grokipedia review by Tim Bray, one of the co-authors of the original XML specification. It shows you just how bad AI-generated content can be when used the wrong way.

Read more:

MiniMax Releases MiniMax-M2 Open-Source Model

https://huggingface.co/MiniMaxAI/MiniMax-M2

The News:

  • MiniMax released MiniMax-M2, an open-source mixture-of-experts model with 230 billion total parameters and 10 billion active parameters per inference. The model targets coding and agentic workflows.​
  • MiniMax-M2 ranks first among open-source models on Artificial Analysis benchmarks with a quality score of 61, trailing GPT-5’s score of 68 by 7 points. The gap between top open-source and proprietary models has narrowed from 18 points last year.​
  • The model scores 69.4 on SWE-bench Verified (GPT-5 scores 74.9), 66.8 on ArtifactsBench (above Claude Sonnet 4.5), 77.2 on τ²-Bench (GPT-5 scores 80.1), and 75.7 on GAIA text-only tasks.​
  • MiniMax-M2 uses sparse expert routing that activates only a subset of parameters per token. This architecture reduces memory pressure and latency during multi-step agent workflows.​
  • The model implements interleaved reasoning with <think>…</think> blocks that must remain in conversation history across turns. Removing these blocks degrades performance on multi-step tasks.​
  • MiniMax-M2 is released under MIT license and available on Hugging Face and Ollama.

My take: We have a new #1 open source model, unfortunately it’s still way behind the leading foundation models on most things. At the pace things are evolving at the moment, the only reason why you should give these open source models a look at all is if you for some reason absolutely cannot use an LLM in the cloud and have lots of money to spend. To run this model at full precision you need a setup of at least 4x A100 or H200 GPUs with 320GB VRAM, or you can download a 6.5 bit quantized version that runs on a Mac M3 Ultra with 512GB RAM, where it delivers around 12 tokens per second if your context window is at 6800 tokens. Personally I don’t believe open source models will ever catch up to closed source models within the next 3-5 years. Maybe they are able to reach GPT-5-Pro level of performance in 1-2 years, but the hardware requirements will then be so absurdly large that they will be impossible to run without a dedicated large-scale hosting infrastructure.

IBM Releases Granite 4.0 Nano Models for Edge Computing

https://huggingface.co/blog/ibm-granite/granite-4-nano

The News:

  • IBM launched Granite 4.0 Nano, eight small language models ranging from 350 million to 1.5 billion parameters, designed for edge computing and on-device applications where cloud models are impractical.​
  • The collection includes four hybrid-SSM architecture models and four traditional transformer versions, all trained on over 15 trillion tokens using the same methodology as larger Granite 4.0 models.​
  • All models are released under Apache 2.0 license with ISO 42001 certification for responsible AI development, supporting popular runtimes including vLLM, llama.cpp, and MLX.​
  • Benchmark tests show the Nano models outperform similarly sized competitors from Alibaba’s Qwen, LiquidAI’s LFM, and Google’s Gemma across general knowledge, mathematics, coding, and safety tasks.​
  • The models excel in instruction following and tool calling, demonstrating superior accuracy on IFEval and Berkeley Function Calling Leaderboard v3 benchmarks compared to other sub-2B parameter models.

My take: There’s now lots of competition in this small-model space, such as Alibaba’s Qwen series, Google’s Gemma models and LiquidAI’s LFM variants. If you have the need for small edge models, then you should definitely check these out.