• Tech Insights 2025 Week 7

    Let’s say you need to identify products in a store using AI. Legacy methods rely on manually labeled training data, requiring months to adapt to new product packaging or layouts, while struggling with accuracy below 95% in cluttered, low-light store conditions. These systems often fail when products appear sideways, are partially obscured, or appear in unpredictable merchandising configurations. Despite recent advancements in convolutional networks and deep learning, these issues have remained, until now.

    Last week Andrew Ng introduced Agentic Object Detection, a groundbreaking approach that allows AI systems to identify objects in images using just simple text prompts like “unripe strawberries” or “Kellogg’s branded cereal”. This represents a significant shift from traditional object detection methods by eliminating the need for labeled training data. The framework uses chain-of-thought reasoning just like recent models by Google, OpenAI and DeepSeek, and it provides maybe the best glimpse into a future where AI models can perform virtually any task humans can do, but better.

    Many of you have heard me talk about how most software code will be written by AI sooner than most of you believe is possible, and last week Sam Altman reinforced this in an interview: (1) o3 is currently ranked as #175 best competitive programmer in the world, (2) OpenAI has a new internal model ranked as #50 in the world, and (3) OpenAI might hit #1 by the end of the year. This means that an AI model will be the best programmer in the world within 12 months and the models will only get better from there. And much like Tim O’Reilly wrote in his post on February 4 I don’t believe software developers will be without jobs in a year, but instead their daily tasks will change completely. Work will be more fun, more rewarding and more focused on problem solving.

    Finally, Andrej Karpathy, founding member of OpenAI and former Director of AI at Tesla, just released a 3.5-hour comprehensive video course called ‘Deep Dive into LLMs like ChatGPT’ explaining the inner workings of Large Language Models like ChatGPT. The course covers the complete LLM lifecycle, from data collection to deployment, including tokenization, neural network architecture, inference, and advanced topics like reinforcement learning from human feedback (RLHF). If you are interested in how LLMs work on the inside I really recommend it! The course is designed for a general audience and requires no programming experience.

    Thank you for being a Tech Insights subscriber!

    THIS WEEK’S NEWS:

    1. Andrew Ng Unveils Agentic Object Detection – Requiring Zero Labeling
    2. OpenAI Launches “Deep Research” Agent for Automated Analysis
    3. GitHub Copilot Launches “Agent Mode”
    4. Google Launches Gemini 2.0
    5. OmniHuman-1: Generate Realistic Human Videos from a Single Photo
    6. Pika Labs Launches Pikadditions to Add Anyone or Anything to Videos
    7. Mistral AI Upgrades Le Chat with Web Search, Better Performance and Mobile Apps
    8. Lyft Partners with Anthropic Reduces Customer Service Time by 87%
    9. OpenAI Makes Chain-of-Thought Visible in o3-mini Models
    10. Google Drops AI Weapons Ban

    Andrew Ng Unveils Agentic Object Detection – Requiring Zero Labeling

    https://landing.ai/agentic-object-detection

    The News:

    • Andrew Ng announced Agentic Object Detection, a groundbreaking technology that can identify objects in images using text prompts without requiring labeled training data.
    • The system can detect specific objects based on natural language descriptions, such as “unripe strawberries” or “Kellogg’s branded cereal,” making it highly versatile for a wide range of applications.
    • The technology is now available for public testing through Landing AI’s website, allowing users to explore its capabilities through an interactive demo platform.
    • The system combines various AI techniques including memory, planning, environmental sensing, and built-in safety guidelines to autonomously carry out object detection tasks.

    My take: This is a significant breakthrough in computer vision by eliminating the need for extensive labeled datasets, which has traditionally been a major bottleneck in AI development. For businesses, this means faster deployment of AI vision systems with reduced development costs and broader applications across industries. I believe this technology has the potential to revolutionize everything from retail inventory management to quality control in manufacturing, making advanced computer vision more accessible to organizations of all sizes. If you have 2 minutes, check their Youtube launch video. This is truly next level image recognition.

    Read more:

    OpenAI Launches “Deep Research” Agent for Automated Analysis

    https://openai.com/index/introducing-deep-research

    The News:

    • Last week OpenAI launched Deep Research, a new ChatGPT feature that autonomously conducts complex research tasks by analyzing hundreds of online sources and produces comprehensive reports in 5-30 minutes, completing work that would typically take several hours or days.
    • The tool is powered by OpenAI’s upcoming full o3 model, specifically optimized for web browsing and data analysis. It can process text, images, and PDFs while adapting its research approach in real-time.
    • Deep Research achieved a record-breaking 26.6% score on “Humanity’s Last Exam,” significantly outperforming GPT-4’s previous score of 3.3%.
    • Initially available to ChatGPT Pro users ($200/month) with 100 queries per month, with planned expansion to Plus, Team, and Enterprise users. Mobile and desktop app integration expected by end of February 2025.
    • Deep Research primarily targets professionals in finance, science, policy, and engineering, but also serves consumers seeking detailed product recommendations for purchases like cars and appliances.

    My take: If you are currently working in a traditional market research and business intelligence firm you should go buy Deep Research right now. Learn what it can do, learn what it cannot do, and expect it to be exponentially better in the next year. Based on my own tests and discussions with experts in many fields, Deep Research is still not as good as human experts. So right now it does not really matter if the market research report takes 30 minutes instead of 3 days if you cannot use it for your strategic roadmap. But it is getting there. I am willing to bet that Deep Research will surpass any market research company or researcher within 12 months in terms of report quality, and it will be able to do this work in any area you can think of, in minutes instead of days or weeks. It’s very hard to imagine what the actual effect will be once all companies have access to this technology, but it will shake up things a lot.

    GitHub Copilot Launches “Agent Mode”

    https://github.blog/news-insights/product-news/github-copilot-the-agent-awakens/

    The News:

    • GitHub just announced a major upgrade to Copilot, transforming it from a pair programmer to an autonomous coding assistant with the introduction of Agent Mode.
    • Agent Mode enables Copilot to iterate on its own code, recognize errors, and fix them automatically. It can suggest terminal commands and analyze runtime errors with self-healing capabilities.
    • GitHub also introduced that Copilot Edits is now generally available, allowing developers to make changes across multiple files using natural language prompts.
    • In addition to agent mode, a new Vision feature for Copilot enables developers to generate code from images, screenshots, or diagrams, making it easier to implement visual changes.
    • GitHub also teased Project Padawan, an autonomous software engineering agent that will allow developers to assign issues directly to Copilot and have it produce fully-tested pull requests.
    • Finally, Copilot expanded its model offerings by adding Google’s Gemini 2.0 Flash and OpenAI’s o3-mini to the model picker for all Copilot users.

    My take: Feature-wise “on paper” GitHub Copilot seems to be catching up well with Cursor. But start a project in both environments, send in the same prompts, use Claude as back-end, and wow what a difference. Not only does Claude in Cursor generate better code than Claude in Copilot, but the speed difference is like night and day. Both the actual code generation as well as merging is much slower in Copilot, and I find that it still often makes errors when merging. My advice is easy – if you want the best AI coding environment today it’s still Cursor + Claude by a big margin. Don’t get stuck on benchmarks and features.

    Google Launches Gemini 2.0

    https://blog.google/technology/google-deepmind/gemini-model-updates-february-2025

    The News:

    • Google released Gemini 2.0, its most advanced AI model family to date, featuring multiple variants including Gemini 2.0 Flash, 2.0 Pro Experimental, and 2.0 Flash-Lite.
    • Gemini 2.0 Flash, now generally available, processes requests twice as fast as its predecessor and features native tool integration with Google Search, Maps, and third-party applications.
    • The new Pro Experimental version excels at coding tasks and complex prompts, featuring a massive 2-million token context window capable of processing approximately 1.5 million words at once.
    • Multimodal capabilities allow the model to generate and process text, images, and multilingual audio, including steerable text-to-speech in eight distinct voices.
    • Enhanced “agentic” capabilities enable the AI to anticipate needs, plan multi-step actions, and take initiative under user supervision.

    My take: In all benchmarks Gemini 2.0 is the clear leader, but in my own (limited) testing and from what I could read on multiple forums the “good old” Claude 3.5 Sonnet still writes better code than the latest Gemini 2.0 by Google. Anthropic did something with Claude in terms of coding quality that no-one else so far has been able to replicate. It has a consistency in the generated code that I have not experienced in any other engine. So who is this release for? Every company using Google Workspace will probably use Gemini 2.0 as their main model. Most Microsoft companies will probably continue with a mix of Copilot, ChatGPT and Librechat + GPT-4o. And most end-users will probably stick to ChatGPT. I do however wish more companies kept their options more open, especially when it comes to AI-assisted coding, the combination of Cursor + Claude is still completely unlike any other solution on the market.

    OmniHuman-1: Generate Realistic Human Videos from a Single Photo

    https://omnihuman-lab.github.io

    The News:

    • ByteDance (owners of TikTok) released OmniHuman-1, an AI model that creates highly realistic videos of people talking, singing, and moving from just a single photo and motion signals.
    • The model supports any aspect ratio (portrait, half-body, or full-body) and can generate natural movements, gestures, and precise lip-syncing.
    • Trained on over 18,000 hours of human-related data, OmniHuman-1 processes multiple input types including text, audio, and body movements to create natural-looking videos.
    • The technology has been demonstrated through several examples, including videos of Albert Einstein delivering speeches and showing challenging body positions with hand gestures.
    • The model is currently in research phase and not yet available to the public.

    My take: If you have 5 minutes to spare today, you should definitely go check out all the examples on https://omnihuman-lab.github.io/. This is truly next-level video generation, powered by the recent improvements in compute power we now have available on a massive scale. Really amazing work done by ByteDance.

    Pika Labs Launches Pikadditions to Add Anyone or Anything to Videos

    https://pikadditions.org

    The News:

    • Pikadditions is a new AI-powered video editing tool that lets users integrate any object, person, or character into existing videos while preserving the original audio and video quality.
    • The feature works across all subscription tiers, with Basic (free) users receiving 15 free generations per month.
    • Users can enhance videos in two ways: adding new elements to their own footage or inserting themselves into iconic clips like movies or historical moments.
    • Pikadditions requires just three steps: uploading a video (minimum 5 seconds), selecting an object or character to add, and providing a text prompt describing the desired integration.
    • Pikadditions is available for multiple Pika models (1.5, 2.0, 2.1, and Turbo) and supports both iOS and Android devices.

    My take: Pika has been on a roll recently. In Tech Insights week 52 I wrote about Pika 2.0 where you could mix yourself into the AI-generated videos, and now with Pikadditions you can mix real-life video footage with AI generated footage with quite good results! If you have a video that you want to spice up with something extra I would definitely recommend that you give Pikadditions a good try!

    Mistral AI Upgrades Le Chat with Web Search, Better Performance and Mobile Apps

    https://mistral.ai/en/news/all-new-le-chat

    The News:

    • Mistral AI just released a major update to Le Chat, providing faster processing speeds and lots of new features.
    • Mistral also introduced a new Pro plan for $14.99, a Team plan för $24.99 and an Enterprise plan that is currently in private preview.
    • Enterprise customers gain unique deployment flexibility with options for on-premise installation and custom model implementation.
    • The main new feature is Flash Answers, which processes text at an impressive speed of up to 1 000 words per second, making responses much faster than anything else on the market.
    • Document handling has been enhanced with improved OCR and vision models, better processing of PDFs, spreadsheets, and images with higher accuracy. Le Chat can now also search the web for answers.
    • Le Chat now includes a built-in Code Interpreter for executing code in a sandboxed environment, enabling real-time data analysis and visualization.
    • Image generation capabilities have been upgraded through integration with Black Forest Labs Flux Ultra.
    • Finally, Mistral is rolling out a personal Memory feature that helps the assistant learn user preferences for personalized recommendations and tracking progress.

    My take: Mistral keeps chugging out releases, and it will be interesting to see if people will pay for their monthly subscription or switch to another platform. Getting users is easy once your product does not cost anything, but now Mistral is giving much more features out to Pro users, and while it’s cheaper and has almost caught up in features with ChatGPT it still lacks the amazing Voice Mode and the excellent desktop apps. Right now I think the main features of Le Chat are its price, its speed and its flexible deployment options for enterprise users. All these together form an interesting package especially for agentic use.

    Lyft Partners with Anthropic Reduces Customer Service Time by 87%

    https://www.anthropic.com/news/lyft-announcement

    The News:

    • Lyft has partnered with Anthropic to integrate Claude AI across its platform, starting with customer service for its 40 million riders and 1 million drivers. The AI assistant has already reduced customer service resolution time by 87%.
    • The customer care AI assistant, powered by Claude via Amazon Bedrock, handles thousands of daily inquiries and seamlessly transitions complex cases to human specialists.
    • Lyft’s engineering team also generates up to 25% of code using AI technologies.
    • The partnership includes early access to Anthropic’s latest AI models and technologies, with Lyft participating in research testing to ensure solutions align with driver and rider needs.

    My take: Customer service interaction is definitely the low-hanging fruit when it comes to quick and easy AI integration in companies. 87% reduced resolution time is the best I have seen so far in large organizations. 25% of code generated using AI is around standard for current tab-complete solutions, but GitHub Copilot just got advanced features like agents and better AI code merging so I am quite sure this figure will grow rapidly across all companies using Copilot in the next few months.

    OpenAI Makes Chain-of-Thought Visible in o3-mini Models

    https://twitter.com/OpenAI/status/1887616278661112259

    The News:

    • OpenAI just updated its o3-mini models to display the AI’s reasoning process, making the model’s “chain of thought” visible to both free and paid users.
    • The feature however includes a post-processing step where the model reviews its raw chain of thought, removes unsafe content, simplifies complex ideas, and translates the reasoning into the user’s native language.
    • While not revealing the complete reasoning steps like competitor DeepSeek-R1, OpenAI has found a middle ground where o3-mini can “think freely” and then organize its thoughts into detailed summaries.

    My take: One of the most fun things I have had working with a computer recently is asking DeepSeek difficult questions and then follow along when it reasons with itself. If anything this gives you a glimpse into the future where machines appear to have their own mind, and will think and reason with themselves before answering. It’s both fascinating and scary at the same time. So while I’m glad OpenAI added this to their o3-mini models, I don’t like the way they filter the output before it’s presented to the user. I would have loved to see the unfiltered internal reasoning like with DeepSeek, that actually curses when it discovers it has gone in the wrong direction. The internal discussion of DeepSeek is amazing to watch, the output by o3-mini is not.

    Google Drops AI Weapons Ban

    https://blog.google/technology/ai/responsible-ai-2024-report-ongoing-work

    The News:

    • Google removed its previous pledge not to use artificial intelligence for weapons development and surveillance from its AI ethics guidelines, marking a significant departure from its 2018 commitment.
    • The company’s updated policy now emphasizes “responsible” AI development aligned with “international law and human rights” instead of specific prohibitions against harmful applications.
    • Google DeepMind chief Demis Hassabis and SVP James Manyika stated that “democracies should lead in AI development” and companies should work together to create AI that “supports national security”.
    • The timing coincides with recent political changes, as the announcement came shortly after Google CEO Sundar Pichai attended President Donald Trump’s inauguration, who subsequently rescinded previous AI safety executive orders.

    My take: This is actually a fundamental change in how Big Tech companies approach AI ethics and military applications. The removal of explicit restrictions on weapons and surveillance development actually aligns Google with competitors who already work with military and intelligence agencies. It will be interesting to see where this leads, and if other AI companies will follow.