• Tech Insights 2025 Week 18

    Last week Swedish startup Lovable launched version 2.0 with a new feature called “multiplayer workspaces” where you can now vibe code apps together. Their motto is “idea to app in seconds, with your personal full stack engineer”. What started as a prototype called “GPT Engineer” is now a product used by over 300 000 monthly active users, and Lovable is still run by a small team of less than 20 people.

    Behind the scenes Lovable does not use their own LLMs, but rely on SOTA models like GPT-4o, Claude and Gemini. Lovable is a layer between the user and the model, with unique functions such as shared prompting using Workspaces, clear separation of edits and analysis (using the new Chat mode), and a nice user interface to quickly edit web page properties using Visual Edits. You can even ask Lovable to deploy web pages it creates using custom domain names. Behind the scenes however it’s still using Claude, Gemini or GPT-4o to create the code, which means that if you as a user have no clue how to instruct the LLM properly the results you get will vary greatly.

    Lovable has a guide called Prompting 1.1 where they encourage users to be very specific in their prompts, for example by explicitly stating which UI libraries you want the model to use, “Create a responsive navigation bar using the shadcn/ui library with Tailwind CSS for styling” (example from Prompting 1.1). If I were to guess I do not think the typical Lovable user has any clue whatsoever what shadcn/ui is. And that will also greatly affect what people get out of Lovable when prompting. 

    If you are building with AI or plan to get started building with AI, I strongly recommend that you spend one hour going through these two resources: the first one is a small guide posted by Andrej Karpathy last week on how he designs his prompt chains for AI-based development, based on 6 months of hands-on experience. If you do not understand what he writes, ask ChatGPT to explain it to you, for example how the excellent files-to-prompt tool works. Once you have read through that one, go watch Andrej Ondrej’s truly amazing I spent 400+ hours in Cursor, here’s what I learned video at Youtube. After that well-spent hour you will know everything you need to know about how to use LLM context efficiently, how to define and setup rules files for LLMs that will be included in each prompt, how to ask LLMs to always come up with two solutions before you pick the best one, and much more. 

    Creating applications with prompting is super efficient and fun if you know what you are doing. Google now produces over 30% of all its code with AI, and that figure is increasing rapidly. But do not expect the LLMs to make you a skilled developer if you know nothing about programming, this is not how they work. The same goes for Lovable, do not expect Lovable to build a complex fullstack application that can be used for production if you know nothing about development. You need to know both how to develop software efficiently as well as how to prompt efficiently to get the full leverage from AI-based development.

    Thank you for being a Tech Insights subscriber!

    THIS WEEK’S NEWS:

    1. Lovable 2.0 Brings Multiplayer Workspaces and Enhanced AI to App Development
    2. ByteDance’s UI-TARS-1.5 Sets New Standards for AI GUI Interaction
    3. OpenAI Releases GPT-Image-1 API for Developers
    4. NVIDIA Releases DAM-3B: Region-Based Image and Video Captioning Model
    5. Tencent Releases Hunyuan 3D 2.5 with 10x Parameter Increase and Ultra-HD Modeling
    6. Adobe Expands Firefly AI Platform with New Models and Third-Party Integrations
    7. Google DeepMind Launches Lyria 2 Music Generation Model
    8. Perplexity Launches Voice Assistant for iPhone Users
    9. Grok 3 and Grok 3 Mini Now Available via API

    Lovable 2.0 Brings Multiplayer Workspaces and Enhanced AI to App Development

    https://lovable.dev/blog/lovable-2-0

    The News:

    • Lovable has released version 2.0 of its AI-powered platform that lets users build apps and websites through conversation with AI, adding collaboration features and enhanced intelligence.
    • The update introduces multiplayer workspaces where Pro subscribers can invite up to 2 collaborators per project, while Teams subscribers can have up to 20 users sharing a workspace and credit pool.
    • A new Chat Mode Agent provides improved reasoning capabilities without making direct code edits, allowing users to ask questions, plan projects, and debug by searching files, inspecting logs, and querying databases.
    • Security Scan functionality automatically detects vulnerabilities before publishing, particularly for apps connected with Supabase, addressing a common concern about AI-generated code security.
    • Additional features include a refreshed brand identity with a cleaner UI, Dev Mode for direct code editing, enhanced Visual Edits for style customization, and built-in custom domain integration that has already connected over 10,000 domains.

    What you might have missed: Feedback from users on the 2.0 release have so far been mostly negative. There are multiple threads like Lovable I love you, but what the hell did you guys do 😔 at Reddit with users chiming in saying “Yeah, they’ve really messed things up royally. Nothing works any more”, and “Same here. It creates a lot of errors and charges lot of credits to fix its own problems. I’ve not seen it yet as an improvement”. My own guess is that Lovable switched LLMs in 2.0, and that they much like the Cursor team that upgraded from Claude 3.5 to 3.7 need to adjust lots of things in the process to get things running smoothly again.

    My take: We still do not have state-of-the-art LLMs that are good enough to “let loose” and go build software program by themselves. This is however how Lovable presents their platform, as “your personal full stack engineer”. But no matter how many design patterns they try to stick in front of the LLMs it’s still a matter of luck if you end up with something usable, and you have to get even more lucky to get something that is actually maintainable as well. To get good results with LLMs you need to be very explicit with what they should do, and approach solutions in small steps. Every week I try 100% “vibe coding” with tools like OpenAI Codex, Claude Code and Cursor, but every time I let them loose in Agent mode they all end up doing poor design decisions and ugly hacks to solve issues instead of trying to find the core issues. A typical prompt trying to fix an issue such as “I have a note with the text here is some text and it seems to have problems parsing the em tags”, more often than not ends up with something like “if text == ‘here is some text’” to try to fix the issue. Lovable actually has a great prompting guide where they encourage users to be extremely specific about what they want the LLMs to do, but it kind of goes against their sales pitch where Lovable is your “personal full stack engineer”. To use Lovable properly according to their prompting guide you actually need to be an experienced full stack engineer yourself to get good results. Still, for quick prototypes and web pages Lovable seems to work very well, so if you are a web designer and have not yet checked out Lovable, now would probably be a very good time to do so.

    Read more:

    ByteDance’s UI-TARS-1.5 Sets New Standards for AI GUI Interaction

    https://github.com/bytedance/UI-TARS

    The News:

    • ByteDance has released UI-TARS-1.5, an open-source multimodal agent that enables AI to interact with graphical user interfaces across desktop, web, and gaming environments.
    • The model is built on Qwen2.5-VL-7B and processes screenshots at 1120×1120 resolution, achieving less than 5 pixel coordinate error when interacting with UI elements.
    • UI-TARS-1.5 achieves 61.6% accuracy on the ScreenSpotPro benchmark, significantly outperforming Claude-3 (27.7%) and GPT-4o (41.2%).
    • The agent’s “think-before-act” approach enables complex reasoning before taking action, reducing errors by 38% in Minecraft navigation tasks compared to direct action prediction.
    • UI-TARS-1.5 achieves perfect scores in 14 Poki.com mini-games 2.4 times faster than human players, including Infinity-Loop and Maze: Path of Light.

    What you might have missed: UI-TARS stands for “User Interface – Text-based Autonomous Recourse System”.

    My take: Now this is very quickly getting interesting. LLMs do not have to be better than most humans at navigating a computer, they just need to be good enough at specific tasks. Then they suddenly start becoming very useful. Instead of having to write APIs to access information, or write crawlers to search the web for information, you can now just let UI-TARS-1.5 loose and do its thing while you sleep. UI-TARS 1.5 is open source, and ByteDance released it for free on the official Hugging Face repository. If you have been testing computer-use or browser-use with LLMs previously with varied results, you should definitely boot up your old repo again and give it a new try with UI-TARS-1.5. 

    OpenAI Releases GPT-Image-1 API for Developers

    https://openai.com/index/image-generation-api

    The News:

    • OpenAI has made its viral image generation technology, GPT-image-1, available to developers through its API, allowing third-party platforms to integrate advanced image creation capabilities into their applications and services.
    • The model gained massive popularity in March 2025, with over 130 million users creating more than 700 million images in just one week after its initial release in ChatGPT.
    • GPT-image-1 is a natively multimodal model that can process both text and images, create art in various styles, follow specific prompts, generate readable text within images, and support batch creation.
    • Pricing is $5 per million tokens for text input, $10 per million tokens for image input, and $40 per million tokens for output, translating to approximately 2-19 cents per image depending on quality.
    • Major companies including Adobe, Airtable, Figma, Gamma, Canva, GoDaddy, and HubSpot have already integrated the model into their platforms for enhanced design capabilities, marketing content creation, and educational materials.

    My take: There are so many things you can do with GPT-image-1! I have already seen users developing chrome extensions for virtual dressing rooms (upload an image of yourself and see how the product you are looking at would look on you) and numerous other examples, and I think we will see lots of innovations as a result from OpenAI opening up their GPT-image-1 to the general public trough the API. The fact that Adobe, Figma, Gamma and Canva have already integrated it into their platforms speak volumes for this.

    Read more:

    NVIDIA Releases DAM-3B: Region-Based Image and Video Captioning Model

    https://huggingface.co/nvidia/DAM-3B

    The News:

    • NVIDIA has released Describe Anything Model 3B (DAM-3B), a multimodal large language model that generates detailed descriptions of user-specified regions within images and videos.
    • The model accepts inputs in various forms including points, boxes, scribbles, or masks to specify regions of interest, then generates contextually rich descriptions that integrate both full-image context and fine-grained local details.
    • The 3 billion parameter model is built on ViT and Llama architectures, with NVIDIA also releasing a video-specific version called DAM-3B-Video.
    • NVIDIA has open-sourced the code, model weights, dataset, and a new evaluation benchmark, though the model is restricted to research and non-commercial use only.

    My take: The goal of this project is “detailed localized caption” and the way it works is that you select any object in an image or a video clip (very much like Meta Segment Anything), and DAM-3B will then give you a detailed description of just that object. If it’s a moving object it will describe how it moves in the video sequence. There are so many options for this technology – take any image, break it down into logical components (segments) and then describe the segments. Then let an LLM decide which segments to work with and then send those segments into something like the new GPT-image-1 mentioned above. Amazing!

    Read more:

    Tencent Releases Hunyuan 3D 2.5 with 10x Parameter Increase and Ultra-HD Modeling

    https://twitter.com/TencentHunyuan/status/1915026828013850791

    The News:

    • Tencent has officially released version 2.5 of its Hunyuan 3D generation model, increasing its number of parameters from 1 billion to 10 billion.
    • The new model achieves ultra-high-definition geometric detail modeling with an effective geometric resolution of 1024, delivering smoother surfaces, sharper edges, and richer details in generated 3D models.
    • Hunyuan 3D v2.5 supports 4K high-definition textures and fine-grained bump mapping, and is the first to achieve multi-view input generation of PBR (Physically Based Rendering) models for more realistic lighting and reflection effects.
    • The update optimizes the skeletal skinning system with automatic bone binding and skinning weight assignment under non-standard poses.
    • Tencent has doubled the free generation quota to 20 times per day and officially launched the Hunyuan 3D generation API on Tencent Cloud for enterprises and developers.

    My take: This 3D generator is by far the best quality I have seen so far. If you have a minute, go check their launch video. It’s crazy good. If you work with digital twins or game prototyping you should go try it out right now, it looks amazingly good.

    Read more:

    Adobe Expands Firefly AI Platform with New Models and Third-Party Integrations

    https://blog.adobe.com/en/publish/2025/04/24/adobe-firefly-next-evolution-creative-ai-is-here

    The News:

    • Adobe launched a major update to its Firefly AI platform at MAX London, unifying tools for image, video, audio, and vector generation into a single platform that has already generated over 20 billion assets worldwide.
    • The update introduces two new image generation models: Firefly Image Model 4 for high-quality images with better definition and realism, and Image Model 4 Ultra for professional-grade results with superior detail, particularly for human portraits and complex scenes.
    • Adobe is also integrating third-party AI models into the Firefly app, including Google’s Imagen3 and Veo2, OpenAI’s GPT image generation and Black Forest Lab’s Flux 1.1 Pro.
    • The company made its Firefly Video Model generally available, enabling users to generate video clips based on text prompts or images with control over camera angles, shot framing, and motion design at resolutions up to 1080p.
    • A new collaborative workspace called Firefly Boards is now in public beta, offering an AI-first moodboarding tool for visualizing ideas and exploring creative concepts.
    • Adobe will soon release a Firefly mobile app for iOS and Android devices, allowing users to generate high-quality images and videos on the go with seamless integration to Creative Cloud for cross-device workflow.

    My take: Compared to previous versions, Image model 4 definitely looks much better. But it’s still very far from realistic looking. Every single portrait I have seen from Firefly Image Model 4 looks like a mixture between a painting and actual photograph. Maybe it’s because I have taken photos with full frame cameras for 20 years, but I have problems seeing where these fake-looking photos can be practically used. I’m genuinely curious, if you check their web page, what would you use Firefly Image Model 4 / Ultra for today? 

    Read more:

    Google DeepMind Launches Lyria 2 Music Generation Model

    https://deepmind.google/discover/blog/music-ai-sandbox-now-with-new-features-and-broader-access

    The News:

    • Google DeepMind released Lyria 2, its latest music generation model, integrated into an upgraded Music AI Sandbox that helps musicians, producers, and songwriters create professional-grade audio across various genres.
    • The platform introduces three key features: “Create” for generating music from text or lyrics, “Extend” for continuing audio clips, and “Edit” for transforming audio moods or styles through text prompts.
    • Lyria 2 produces high-fidelity 48kHz stereo audio supporting different instruments and play styles, allowing granular creative control over key, BPM, and other musical characteristics.
    • A companion model called Lyria RealTime enables interactive, real-time music creation and manipulation, letting users blend genres and shape audio on the fly.
    • All music generated by Lyria 2 incorporates SynthID watermarking technology, embedding imperceptible digital watermarks to identify AI-generated content and prevent potential copyright disputes.

    My take: If you have ever spent some serious time trying to create music, you know how easy it is to get stuck for hours trying out new ideas for songs, or transforming a series of beats into a full production. There is still very little material published about Lyria 2, the only material I could find where they actually show someone working with it is the small video below. But the music examples on their web page sound
    very promising, with early users saying “It’s like an infinite sample library. It’s a totally new way for me to make my records”. I have already signed up for the early version, and if you are interested in music production you can apply as tester here.

    Read more:

    Perplexity Launches Voice Assistant for iPhone Users

    https://twitter.com/perplexity_ai/status/1915064472391336071

    The News:

    • Perplexity has released its AI voice assistant for iOS devices, enabling iPhone users to perform tasks like setting reminders, sending emails, and making reservations through voice commands.
    • The assistant integrates with Apple’s native apps including Apple Mail, Calendar, Maps, Music, and Podcasts to execute commands directly within these applications.
    • Unlike Apple Intelligence which require newer iPhone models, Perplexity’s assistant works on older devices including the iPhone 13 mini.
    • Users can add shortcuts to the assistant on their home screen or lock screen for quick access, and all conversations are saved as “Threads” in the app.

    My take: If you have a minute, you can checkout their launch video. This assistant is pretty nice – ask it about anything, and it will search the web and give you results. If you want to book a table it will send you to the right web page in default web browser, or if you need to send an email it will prefill an email and open in default mail app. What I like about this assistant is that it does not try to do everything itself, but instead integrates nicely with all the great built-in apps already in your phone. I use Perplexity a lot in my mobile (multiple times every day), and I will definitely be using the new voice assistant even more.

    Read more:

    Grok 3 and Grok 3 Mini Now Available via API

    https://docs.x.ai/docs/overview

    The News:

    • xAI has released API access to Grok 3 and Grok 3 Mini, allowing developers to integrate these AI models into their applications without requiring an X Premium+ subscription.
    • Grok 3 features advanced reasoning capabilities, with its flagship model achieving 93.3% accuracy on the 2025 American Invitational Mathematics Examination and 84.6% on graduate-level expert reasoning tasks.
    • The lightweight Grok 3 Mini variant offers faster response times while maintaining strong performance, particularly for logic-based tasks that don’t require deep domain knowledge
    • Both models are available through platforms like OpenRouter, Chatbase, and CometAPI with different pricing tiers – Grok 3 costs $15/million output tokens while Grok 3 Mini is significantly cheaper at $0.5/million output tokens.
    • “Fast” versions of both models are also available, using the same underlying architecture but running on optimized infrastructure for quicker response times at a premium price”.

    My take: Grok seems to be performing quite good on most benchmarks. I find myself switching regularly between Claude, Gemini and GPT-4o almost on a daily basis, but am not yet sure how I will fit Grok into all this. Have you used Grok 3 yourself, and what model or models does it replace for you? What is your motivation for choosing it instead of something like Claude, Gemini or GPT-4o?