Two weeks ago I wrote about my side-project with Claude Code. The project is now done, and it was built 100% with Agentic AI. It’s an app called Notebook Navigator, and it combines the best of note-taking apps like Apple Notes, Bear, Evernote, Google Keep and Notion into one single app. It is built as a plugin that replaces the user interface for the text editor Obsidian, and user feedback has so far been amazingly good.

The entire application is around 13 000 lines of code and has been translated into 7 languages: English, German, Spanish, French, Japanese, Chinese Simplified and Swedish. It has full support for LTR and RTL languages, themes, and the UX is mobile and tablet optimized. If I would have written this application without AI, it would have taken me around 2-3 months to get up to speed with REACT and virtualizers, and then at least 2-3 months of hard development work to get the app into its current stable shape.

But the app did not take me 6 months to create. It took me 3 weeks of spare time using a combination of Google Gemini 2.5 Pro and Claude Code . I have continuously reviewed the architecture and code quality and even I find it hard to believe that it’s possible to create such large amounts of high quality source code in such a short time. AI agents in the hands of someone who is good at what they do gives that person super powers almost beyond comprehension. Now apply this knowledge to all areas within a company and you quickly understand why every single company is going all-in on Generative AI, AI Agents and multi-agent systems! If you are curious and want to see the source code and my CLAUDE.md file you can check it out on GitHub, it’s open source.

Last week it was Amazon‘s turn to announce they are going AI first. Andy Jassy, CEO of Amazon wrote in an email last week to all employees: “AI agents will change how we all work and live. Agents let you tell them what you want (often in natural language), and do things like scour the web (and various data sources) and summarize results, engage in deep research, write code, find anomalies, highlight interesting insights, translate language and code into other variants, and automate a lot of tasks that consume our time. There will be billions of these agents, across every company and in every imaginable field. There will also be agents that routinely do things for you outside of work, from shopping to travel to daily chores and tasks. Many of these agents have yet to be built, but make no mistake, they’re coming, and coming fast”.

Listen to Tech Insights on Spotify: Tech Insights 2025 Week 26 on Spotify

Thank you for being a Tech Insights subscriber!

THIS WEEK’S NEWS:

Cursor Updates Pro Tier and Launches $200 Ultra Plan
MIT Study Reveals ChatGPT Use Weakens Student Brain Activity and Memory
Harvard Releases Nearly One Million Historic Books for AI Training
Essential AI Releases 24-Trillion-Token Dataset to Simplify Data Curation
MIT Develops SEAL Framework for Self-Adapting Language Models
Anthropic Explains Multi-Agent Research System Architecture for Claude
OpenAI Demonstrates Rapid Re-alignment Method for Misaligned GPT-4o Models
Claude Code Adds Remote MCP Server Support for Third-Party Tool Integration
Midjourney Launches First AI Video Generator Amid Disney Copyright Lawsuit
Google Launches Gemini 2.5 Flash-Lite for High-Volume Tasks
MiniMax Releases M1 Open-Weight Reasoning Model with Hybrid Attention Architecture

Cursor Updates Pro Tier and Launches $200 Ultra Plan

https://www.cursor.com/en/blog/new-tier

The News:

Cursor updated its Pro plan to an “unlimited-with-rate-limits model” and removed all restrictions on tool calls, though existing users can choose to keep their current 500-request limit but with new restrictions (see below).
Cursor also introduced Ultra, a $200 monthly subscription that provides 20 times more usage than the Pro plan, but it’s unclear on how rate limits are calculated (time window, context etc) and it is also unclear if the rate limits apply differently to different models.
Anysphere (the company behind Cursor) did not post any information about current rate limits, context windows, and how their offer compare to Claude Code which also has rate limits with a similar price point.

My take: Anysphere really messed this one up. First of all, if you do not want to switch your subscription to this new “very-slow-and-rate-limited” Pro license, now suddenly every single tool call counts as one request. This means that if you do not move over to this new plan you will most probably run out of your 500 monthly requests in just the first day. And if you do switch over to the new plan, access to Claude 4 within Cursor is now heavily throttled, so it will be close to unusable in terms of speed. Unless you pay $200 per month. In comparison with Anthropic, which are very transparent on how sessions and usage limits apply to Claude Code, Anysphere has not disclosed any information how sessions are measured or how requests are counted. The information that each tool usage now counts as a unique request is new and it was discovered by users, not communicated officially. If you are working as full-time developer it seems like the days of paying just $20 per month for unlimited access to Claude is nearing the end, and you now have to choose between paying $200 per month for Cursor Ultra or $200 for Claude Max.

Read more:

MIT Study Reveals ChatGPT Use Weakens Student Brain Activity and Memory

https://arxiv.org/pdf/2506.08872v1

The News:

MIT Media Lab researchers tracked brain activity of 54 students aged 18-39 using EEG technology while writing SAT essays over four months, finding that ChatGPT users showed significantly reduced neural connectivity compared to those writing without assistance.
Students were divided into three groups: ChatGPT users, Google search users, and brain-only writers who used no external tools during the essay writing process.
ChatGPT users produced essays that English teachers described as “soulless” and lacking original thought, with participants increasingly relying on the AI to complete entire essays by their third session.
Brain-only writers exhibited the strongest neural connectivity in alpha, theta, and delta brain wave bands associated with creativity, memory processing, and semantic understanding.
When ChatGPT users were later asked to write without AI assistance, 83% could not recall even one sentence from essays they had written minutes earlier, indicating severe memory retention problems.
The study was published as a preprint on June 10, 2025, bypassing traditional peer review because researchers wanted to raise awareness about potential cognitive impacts before widespread AI adoption in education.

My take: Where previous research on AI in education has focused primarily on academic performance, this study is interesting since it also focuses on the underlying cognitive processes when using AI. AI adoption in schools is skyrocketing, and it won’t be long until every single student write most of their essays with LLMs. And we still don’t know what the long term effects of this will be, and how it will affect us. For me personally, this research shows just how much energy we spend of our brain capacity doing boring and unnecessary things like writing texts that no-one really cares about. My own productivity has skyrocketed thanks to AI, simply because I don’t have to spend time and focus writing text and code using a 150 year old input device that was optimized for mechanical devices. I can focus solely on the what and why, how to get it into the computer is solved by AI.

Read more:

ChatGPT’s Impact On Our Brains According to an MIT Study | TIME

Harvard Releases Nearly One Million Historic Books for AI Training

https://huggingface.co/datasets/institutional/institutional-books-1.0

The News:

Harvard University released Institutional Books 1.0, a dataset containing 983,004 public domain books for AI model training, addressing the scarcity of high-quality, legally accessible training data.
The collection spans from the 15th century to modern times across 254 languages, totaling 242 billion tokens and 386 million pages of OCR-extracted text.
Books include works by Shakespeare, Dickens, and Dante alongside specialized texts like Czech mathematics textbooks and Welsh dictionaries, covering subjects from literature and philosophy to law and agriculture.
The dataset provides both original and post-processed OCR text with extensive metadata including language detection, topic classification, and quality scores.
Tech companies can use these public domain materials without copyright concerns, offering a safer alternative to scraping modern creative works that have triggered lawsuits.
The initiative received funding from Microsoft and OpenAI through Harvard’s Institutional Data Initiative, aiming to “level the playing field” for smaller AI developers and researchers

My take: Just to put this in perspective, this new book dataset is around five times larger than the controversial Books3 dataset that was used to train Meta’s Llama 3 models. But where the Books3 dataset faced legal challenges over copyright infringement, Harvard’s collection consists entirely of public domain works, eliminating all legal risks for AI developers. Greg Leppert, executive director of the Institutional Data Initiative, even compared the potential impact from this dataset to Linux in operating systems, providing a foundational resource that companies can build upon with additional licensed data for competitive advantages.

Read more:

Harvard Unveils Extensive AI Dataset Backed by OpenAI and Microsoft – AICYCLOPEDIA | The latest AI News, Tools, & Learning

Essential AI Releases 24-Trillion-Token Dataset to Simplify Data Curation

https://www.arxiv.org/abs/2506.14111

The News:

Essential AI released Essential-Web v1.0, a 24-trillion-token dataset containing 23.6 billion web documents, designed to streamline the traditionally complex process of curating training data for language models.
Each document includes annotations across 12 categories covering topic, format, content complexity, and quality, generated by EAI-Distill-0.5b, a compact 0.5 billion-parameter model that achieves annotator agreement within 3% of the larger Qwen2.5-32B-Instruct model.
Researchers can now use SQL-style filters to extract domain-specific datasets in hours or days instead of weeks or months, eliminating the need to train custom classifiers.
The dataset demonstrates competitive performance across specialized domains, showing improvements of 14.3% in web code, 24.5% in STEM, and 8.6% in medical content compared to existing web-curated datasets.
The complete dataset is freely available on HuggingFace at EssentialAI/essential-web-v1.0.

My take: How many tokens do you really need to train new models? A lot. Last year when Meta announced their then-new Llama 3 series of language models, Meta said that it was trained on 15 trillion tokens collected from “publicly available sources”. So while 24 trillion sounds like a lot, it’s probably not that much compared to what is used to train the SOTA AI models we use today. But for anyone wanting to create custom classifiers for specific datasets for specific purposes, this dataset will make life so much easier!

Read more:

Paper: [2506.14111] Essential-Web v1.0: 24T tokens of organized web data
Model and data: EssentialAI (Essential AI)
Repo: GitHub – Essential-AI/eai-taxonomy

MIT Develops SEAL Framework for Self-Adapting Language Models

https://arxiv.org/abs/2506.10943

The News:

MIT researchers introduced SEAL (Self-Adapting Language Models), a framework that enables large language models to generate their own training data and update their weights autonomously, addressing the static nature of current AI systems that cannot learn from new experiences.
The system works by having models produce “self-edits” when given new input, which can restructure information, specify optimization parameters, or invoke data augmentation tools that result in persistent weight updates through supervised fine-tuning.
SEAL trains itself to make better modifications by using a reward system based on how well it performs on actual tasks. The model learns which self-edits work best through trial and error, without needing additional software components or helper networks to guide the process.
When tested on reasoning puzzles and learning new facts from limited examples, SEAL solved 72.5% of problems correctly on a simplified version of the ARC reasoning benchmark. This significantly outperformed traditional methods: standard in-context learning failed completely (0% success), while self-editing without the reward-based training achieved only 20% success.
In tests where models had to learn and use new factual information, SEAL performed as well as GPT-4.1 after processing just 50 text passages and completing only two training cycles. This shows SEAL can quickly absorb new knowledge with minimal data and training time.

My take: SEAL is a step toward AI systems that can autonomously improve themselves. Short-term effects include a significant reduction in human oversight requirements for AI model improvements, as models can identify their own knowledge gaps and generate appropriate training materials. Also, models can be updated much more quickly – models using SEAL can incorporate new information way faster than what was possible before. The key limitation however is that repeated updates with SEAL cause models to forget earlier knowledge, which means that “performance on earlier tasks gradually declines as the number of edits increases” (as stated in the paper). So while this system seems to work well for some uses, SEAL is at least not now the solution to AI systems that continuously continues to improve themselves again and again. But it can probably save massive amounts of time in the short-term when it comes to keeping new foundation models updated.

Read more:

MIT SEAL Framework: Self-Improving AI Models Guide | Konceptual AI

Anthropic Explains Multi-Agent Research System Architecture for Claude

https://www.anthropic.com/engineering/built-multi-agent-research-system

The News:

Anthropic released technical details about Claude’s Research feature, which uses multiple AI agents working in parallel to handle complex search tasks that exceed single-agent capabilities.
The system employs an orchestrator-worker pattern where a lead agent (Claude Opus 4) analyzes queries, develops strategies, and spawns specialized subagents (Claude Sonnet 4) to explore different aspects simultaneously.
Internal evaluations show the multi-agent system outperformed single-agent Claude Opus 4 by 90.2% on research tasks, particularly excelling at breadth-first queries requiring multiple independent directions.
Token usage explains 80% of performance variance, with multi-agent systems consuming 15 times more tokens than standard chats, making them economically viable only for high-value tasks.
The system uses parallel tool calling and dynamic search strategies, cutting research time by up to 90% for complex queries while covering more information sources.

My take: AI Agents are the future, and they behave significantly different from traditional Retrieval-Augmented Generation models (RAG). Traditional RAG systems operate with a rigid, single-pass approach where the user’s query gets sent directly to a retriever without modification. RAG systems typically rely on static vector databases and perform what’s called “static retrieval” – they fetch a predetermined set of document chunks most similar to the input query and use these chunks injected into the prompt to generate a response. There are many issues with traditional RAG systems – they process raw input exactly as received, without analyzing user intent or optimizing the query for better retrieval, where real-world queries are often vague, too specific, or miss key terminology, leading to poor retrieval results. RAG systems also typically only rely on a single data source (typically static vector database) which limits the scope and timeliness of the information. Multi-Agent systems are different. Instead of using queries as-is, retrieval agents analyze user intent and reshape queries to make them more effective. This involves rewriting queries in clearer terms, adding relevant keywords, or expanding them to incorporate domain-specific vocabulary. And rather than a single retrieval pass, AI Agents use iterative search processes that adapt to new findings and analyze results to formulate higher-quality answers. The results shown by Anthropic speak for themselves. Multi-agent systems are the main reason why I always recommend companies to not go 100% Copilot Studio exclusive with their AI strategy, since that will significantly reduce the possibilities to grow your AI stack in the future. If you are about to rollout AI agents within your company, make sure you also have a plan to support coding platforms like AutoGen, LangGraph or CrewAI, because multi-agent systems are key in getting the absolutely best results with AI agents.

OpenAI Demonstrates Rapid Re-alignment Method for Misaligned GPT-4o Models

https://openai.com/index/emergent-misalignment

The News:

OpenAI published research showing how to correct “emergent misalignment” in GPT-4o models, where fine-tuning on specific tasks can cause models to produce harmful or insecure outputs even when the training data appears benign.
The company demonstrated that just 30 supervised fine-tuning steps using 120 examples can restore a misaligned model to 0% misalignment without modifying the underlying model weights.
The study used GPT-4o models fine-tuned on insecure code completions as a test case, then applied secure code examples to reverse the misalignment.
OpenAI’s method involves steering the model by modifying internal activations rather than retraining weights, showing that alignment generalizes as strongly as misalignment does.

My take: It’s a known problem where fine-tuning an AI model can lead to unexpected compromises on model safety. A 2023 study by researchers at University of Utah proposed restoring weight parameters from original aligned models, but this method required identifying and modifying specific weights through gradient descent which is costly and inefficient. OpenAI’s approach works with internal activations rather than weights which is more efficient. The research by OpenAI also shows that the content quality of your fine-tuning data matters more than you might expect. Even if your dataset appears harmless, underlying patterns can shift the model’s behavior in unexpected ways. This means that if you decide to fine tune an AI model, you must not only audit the data carefully but also audit the data for any implicit harmful patterns or biases that could generalize. This report is highly recommended if you work with any kind of LLM fine-tuning.

Claude Code Adds Remote MCP Server Support for Third-Party Tool Integration

https://www.anthropic.com/news/claude-code-remote-mcp

The News:

Claude Code now connects directly to remote MCP (Model Context Protocol) servers, eliminating the need to set up and manage local servers that connect with external services.
Developers can integrate third-party services like Sentry for error tracking, Linear for project management, and Context7 for documentation access by simply adding vendor URLs and authenticating through OAuth.
The integration uses OAuth 2.0 for secure authentication, allowing one-time setup where vendors handle server updates, scaling, and maintenance automatically.
Linear’s Head of Engineering Tom Moor states: “With structured, real-time context from Linear, Claude Code can pull in issue details and project status – engineers can now stay in flow when moving between planning, writing code, and managing issues. Fewer tabs, less copy-paste. Better software, faster”.
The new MCP specification introduces OAuth authentication between Claude and MCP servers, structured outputs, server elicitation capabilities, and security enhancements through Resource Indicators

My take: This is actually two news in one: First, the MCP protocol was updated to support OAuth and structured output, a huge win for enterprise MCP rollout. Secondly Claude Code added support for the new MCP protocol version the day after the new version was announced. If you are developing in C# and Java, stop what you are doing now and go install Claude Code with the Context7 remote MCP server. If you don’t know what Context7 is, it’s an API that let’s your LLM fetch “up-to-date, version-specific documentation and code examples straight from the source — and places them directly into your prompt”. So if you have struggled with Claude not understanding the SDKs you are using, that is now a thing of the past. Just hook up Claude Code to Context7 and it will find everything it needs.

Read more:

Midjourney Launches First AI Video Generator Amid Disney Copyright Lawsuit

https://updates.midjourney.com/introducing-our-v1-video-model

The News:

Midjourney released V1 Video, its first AI video generation model that converts static images into 5-second animated clips, allowing users to animate their existing image libraries or create new content for social media and marketing.
The system works exclusively through image-to-video conversion, requiring users to first create or upload an image before pressing an “Animate” button to generate four video variations.
Users can extend videos in 4-second increments up to four times, creating clips up to 21 seconds long, with options for automatic motion or manual text prompts to control specific movements.
The service costs $10 per month for basic access, with video generation consuming eight times more processing power than image creation, roughly equivalent to “one image worth of cost” per second of video.
Videos output at 480p resolution across different aspect ratios, with 16:9 videos reaching 832×464 pixels, though downloads are upscaled to 1080p for social media use.
The launch occurred just five days after Disney and Universal filed a copyright lawsuit against Midjourney, claiming the company created a “bottomless pit of plagiarism” by training on copyrighted material without permission.

My take: It’s getting crowded in the AI video arena, with top entries like Google Veo 3 costing just $20 per month for 4K video with native audio support. While Midjourney V1 is cheaper at $10 per month, user feedback has so far been very positive, especially for keeping character consistency. Based on the current lawsuit I wouldn’t recommend you to use it for anything commercial, but if you just like to play around with AI for hobby projects then maybe this is what you have been looking for.

Read more:

Google Launches Gemini 2.5 Flash-Lite for High-Volume Tasks

https://blog.google/products/gemini/gemini-2-5-model-family-expands

The News:

Google released Gemini 2.5 Flash-Lite in preview, designed as the most cost-efficient and fastest model in the 2.5 series for high-volume, latency-sensitive tasks like translation and classification.
The model costs $0.10 per million input tokens and $0.40 per million output tokens, with “thinking” disabled by default to optimize speed and cost.
Flash-Lite supports a 1 million-token context window, multimodal inputs, and integrates with Google Search, code execution, and function calling tools.
Google also moved Gemini 2.5 Flash and Pro models from preview to general availability, with companies like Snap, SmartBear, Spline, and Rooms already using them in production.

My take: Google has three different Gemini models now: Gemini 2.5 Pro, Gemini 2.5 Flash and Gemini 2.5 Flash-Lite. Gemini 2.5 Pro is their top model that can do everything from coding, to reasoning, to solving math problems and much more. Gemini 2.5 Flash is smaller but still very capable and still has a thinking mode, making it suitable for agentic use cases. Gemini 2.5 Lite is best for low-latency applications, like classification, translation, and summarization. What’s noteworthy about all Gemini models is that they all support a massive 1 million token context window, which is great when dealing with large amounts of source material.

MiniMax Releases M1 Open-Weight Reasoning Model with Hybrid Attention Architecture

https://github.com/MiniMax-AI/MiniMax-M1

The News:

MiniMax released M1, an open-weight reasoning model with 456 billion parameters that uses hybrid attention to reduce computational costs by 75% compared to DeepSeek-R1 when generating 100,000 tokens.
The model supports 1 million token context length, eight times larger than DeepSeek-R1, and activates 45.9 billion parameters per token using a Mixture-of-Experts architecture.
MiniMax trained M1 using a new CISPO reinforcement learning algorithm that clips importance sampling weights instead of token updates, completing the entire RL phase on 512 H800 GPUs in three weeks for $537,400.
The company offers two versions with 40,000 and 80,000 token thinking budgets, with the 80K version scoring 86% on AIME 2024 mathematics and 56% on SWE-bench Verified software engineering tasks.
M1 uses “Lightning Attention” mechanism combined with standard softmax attention, placing one standard transformer block after every seven lightning attention blocks.

My take: This is an interesting model – it performs comparable with DeepSeek R1 in some tests but uses 75% less computational power! And while performing good in some tests, it falls flat in tests like SimpleQA for factual accuracy, where it scored 18.5% compared to DeepSeek-R1’s 30.1%. MiniMax, the company, was founded in Shanghai in 2021 and is backed by industry giants such as Alibaba Group, Tencent, Hillhouse Investment, HongShan, IDG Capital, and MiHoYo, with total investment over $850 million. If you want to run the model locally you need a serious rig with eight H800 or eight H20 GPUs (The H20 is the Chinese version of H800) so if you look for something to run quantized on your local Mac Mini machine this is not the model for you.

Read more: