Unlock the Full “In 5 Steps” Series

This step-by-step guide is exclusively available for Lead with AI PRO membership.
‍
🚀 With Lead with AI PRO, you’ll get:
‍✅ Access to expert-crafted step-by-step guides
✅ AI-powered workflows to boost productivity
✅ Exclusive tools and resources for smarter work
‍
Upgrade to Lead with AI PRO and access all premium content instantly.

Already a member?

Access on our Members' Site

May 27, 2025

Claude 4 surpasses ChatGPT on agentic benchmarks

Anthropic has launched its latest AI models, Claude Opus 4 and Claude Sonnet 4, marking a significant advancement in AI’s ability to handle complex tasks with sustained focus and improved reasoning.

Evelyn Le

Strategic Product Lead

Presented by

Claude 4: Anthropic’s Leap Forward in AI Capabilities

Benchmark table comparing Opus 4 and Sonnet 4 to other LLM

Claude 4 significantly outperforms ChatGPT (GPT-4.1) in agentic tasks, scoring 72.5–72.7% on SWE-bench compared to ChatGPT’s 54.6%. It also leads in tool use and decision-making, especially in complex retail workflows (81.4% vs. 68.0%).

Key Features and Capabilities:

Claude Opus 4:
- Designed for complex challenges, Opus 4 can perform thousands of steps over extended periods without losing focus.
- “The world’s best coding model”: Excels in coding, reasoning, and document analysis, outperforming previous models in sustained performance.
- Introduces “extended thinking” with tool use, allowing the model to alternate between reasoning and utilizing tools like web search to enhance responses.
- Demonstrates improved memory capabilities, extracting and saving key facts to maintain continuity over time.
Claude Sonnet 4:
- An upgrade from Sonnet 3.7, offering superior coding and reasoning while responding more precisely to instructions.
- Balances performance and efficiency, making it suitable for a wide range of applications.

While Claude 4 offers significant benefits, it’s important to note that during internal testing, Claude Opus 4 exhibited concerning behavior under extreme scenarios, such as attempting to manipulate outcomes to avoid shutdown. Anthropic has implemented additional safety measures to mitigate such risks.

A prompt to try out Claude 4’s multi-step reasoning:

“You’re an AI consultant for a mid-sized logistics company planning to expand operations into Southeast Asia. Create a step-by-step strategic plan including market entry options, legal/regulatory considerations by country, competitive analysis, and AI tools that can improve supply chain efficiency in the region. Use external search tools where needed. Present the final output as an executive briefing.”

‍

Flagship AI Newsletter

The AI Newsletter That Makes You Smarter, Not Busier

Join over 30,000 leaders and receive our insights on AI platforms, implementations, and organizational change management.

Your AI Team: Perplexity's Academic Hompage, Google’s AI Agents, and NotebookLM’s Video Overviews.

Every week, I report on the top updates to your favorite AI tools. This week:

Perplexity launches Academic Homepage

Perplexity just introduced a new Academic Homepage, signaling its effort toward becoming a trusted tool for scientific research and higher education.

Here are the key updates:

Academic Homepage: You can now explore scientific papers, peer-reviewed journals, and academic sources via a streamlined, dedicated interface.
Curated Discovery: The page features hand-picked trending topics across fields like computer science, economics, and finance, making it easy to dive into emerging research areas.
Suggested Questions: Perplexity helps users kickstart their research with pre-filled queries relevant to the field, ideal for students, educators, or lifelong learners.
Sidebar Shortcut: Academic mode now lives in the sidebar of the web app for quick access.

This move sets Perplexity apart from general-purpose AI chatbots and brings it closer to academic tools like Google Scholar, with the added benefit of an AI assistant guiding the way.

Smart leaders in 2025 aren’t just “learning AI”, they’re automating half their workload.

What if you could: → Cut your writing, researching, or planning time in half? → Walk into client meetings with AI-prepared presentations & insights? → Get ChatGPT to be your thinking partner, and answer 10x smarter → Free up hours weekly with 10+ personal AI assistants?

That’s what happens in Lead with AI Executive Bootcamp.

We’ve helped 500+ leaders design AI-powered workflows that save hours and boost impact. Now it’s your turn.

You’ll leave with fully personalized AI assistants, tailored to your role, plus an AI Leader Certification to prove it.

Reserve your seat for the June 6 or July 11 cohort now – limited slots remaining!

👉 Join June 6 cohort

👉 Join July 11 cohort

(Want to reach 25,000+ business leaders applying AI in their work, teams, and organizations? Advertise with us.)‍

Want to reach 30,000+ business leaders applying AI in their work, teams, and organizations?
Advertise with us.

Quick Hits from your favorite AI tools:

Google integrates AI Agents across Search and Gemini. Google’s AI Mode can now summarize web pages, complete tasks, and generate research reports. Google also introduced Project Marine, which can handle 10 tasks at once.
OpenAI upgrades Operator with the o3 model. OpenAI’s autonomous web agent, Operator, now utilizes the o3 model, enhancing its reasoning capabilities and performance in complex tasks.
NotebookLM shows a preview of Video Overviews. Google’s NotebookLM now offers Video Overviews, allowing users to generate concise video summaries from their notes and sources.
Google Meet launches real-time speech translation. Google Meet’s new feature provides near real-time translation of spoken language during meetings, preserving the speaker’s voice and tone, initially supporting English and Spanish.
Gemini app receives major updates. The Gemini app now includes real-time AI video generation with Veo 3, enhanced Deep Research capabilities, and improved integration with Google services like Gmail and Docs.
Microsoft’s Notepad can write new content using Generative AI. You can now quickly draft text based on a prompt, or build upon existing content.

Read more news at the end.

Flagship AI Newsletter

The AI Newsletter That Makes You Smarter, Not Busier

Join over 30,000 leaders and receive our insights on AI platforms, implementations, and organizational change management.

Tutorial: Build a Custom GPT to Onboard New Employees

In 5 Steps: Build a Custom GPT to Onboard New Employees

Instead of digging through PDFs or waiting for someone in HR to reply, your new hires could simply ask a friendly AI assistant any onboarding question - anytime, anywhere. By turning your company handbook and internal docs into an interactive GPT, you’re not just saving your HR team hours of repetitive explaining, you’re giving every new employee a smoother, more confident start from day one.

New AI Tools to Try: LLM SEO Monitor, GoBuildMyApp, and GistStack

Looking for something fresh to add to your creative or business toolkit? These three AI tools caught my attention this week:

LLM SEO Monitor: AI Tracking for Your Search Rankings

LLM SEO Monitor tracks how AI answers your brand-related queries across platforms like ChatGPT, Perplexity, and Gemini. Think of it as SEO analytics for the age of generative AI.

👉 Monitor your brand in AI results

GoBuildMyApp: From Idea to App

GoBuildMyApp turns plain-English prompts into working mobile or web apps. No code, no drag-and-drop, just describe what you want, and watch it build.

👉 Build your app with GoBuildMyApp

GistStack: Never Run Out of Content Again

GistStack pulls from your favorite sources to instantly create on-brand, scroll-stopping social posts.

👉 Create content with GistStack

AI for Strategy, Responsible Adoption, and Prototyping: From the Community

Every day, Lead with AI PRO members discuss practical ways to benefit from AI in their work and organizations. This week's highlights include:

🗓️ Happening this Thursday, May 29: Join our member-led Canva AI Demo with Max Schumann, followed by Community Office Hours. Whether you're new to Canva AI or looking to level up, this is a great chance to see it in action and swap insights on the latest AI trends with fellow members. Inquire HERE.
Henrik Jarleskog shared that Claude just got a major upgrade—and it might be your most thoughtful AI partner yet. With this latest release, Claude can now code smarter, think deeper, and co-create with remarkable flair. You can also choose the model that fits your task the most. Check out the latest upgrade HERE.
Not sure which ChatGPT plan is right for you? Daan and the Lead with AI team put together a handy breakdown comparing them all. Check out Daan’s quick recap HERE.
Wyatt Barnett surfaced a great read in the community this week: “I stopped chasing AI tools and started building AI spaces.” The article explores how shifting from experimenting with endless tools to intentionally designing AI-first workflows can unlock deeper value. Read it HERE.

Don't want to miss more insights and conversations like these?

Then it's time to upgrade to PRO:

Want to reach 30,000+ business leaders applying AI in their work, teams, and organizations?
Advertise with us.

OpenAI and Jony Ive’s new project, AI Agents in the workforce, along with more crucial AI stories

Every day, Daan, Wendy, and I read all the AI news so that you don't have to.

Here are the must-read stories of the week:

Anthropic CEO claims AI models hallucinate less than humansDario Amodei argues that AI hallucinations should be seen in context, pointing out humans often make more mistakes and emphasizing the need to compare AI to real-world human error, not perfection.

Details leak about Jony Ive’s new ‘screen-free’ OpenAI device

OpenAI and former Apple designer Jony Ive are reportedly working on a new AI hardware device, blending sleek consumer design with conversational AI as an alternative to screen-heavy tech.

Agentic AI Is Already Changing the Workforce

As agentic AI takes on more decision-making and task execution, companies must rethink roles, processes, and trust models for a future where AI acts more like a team member than a tool.

Evelyn Le

Strategic Product Lead

Latest Newsletters

We track every case study and platform update to send you only the essential.

View all

Unlock the Full “In 5 Steps” Series

Claude 4 surpasses ChatGPT on agentic benchmarks

Claude 4: Anthropic’s Leap Forward in AI Capabilities

Your AI Team: Perplexity's Academic Hompage, Google’s AI Agents, and NotebookLM’s Video Overviews.

Smart leaders in 2025 aren’t just “learning AI”, they’re automating half their workload.

Quick Hits from your favorite AI tools:

Tutorial: Build a Custom GPT to Onboard New Employees

In 5 Steps: Build a Custom GPT to Onboard New Employees

New AI Tools to Try: LLM SEO Monitor, GoBuildMyApp, and GistStack

AI for Strategy, Responsible Adoption, and Prototyping: From the Community

OpenAI and Jony Ive’s new project, AI Agents in the workforce, along with more crucial AI stories

Latest Newsletters

Build Your AI Executive Daily Briefing

Set up your personal strategic advisor

Automatically turn your plan into calendar blocks

Unlock the Full “In 5 Steps” Series

Claude 4: Anthropic’s Leap Forward in AI Capabilities

Your AI Team: Perplexity's Academic Hompage, Google’s AI Agents, and NotebookLM’s Video Overviews.

Smart leaders in 2025 aren’t just “learning AI”, they’re automating half their workload.

Quick Hits from your favorite AI tools:

Tutorial: Build a Custom GPT to Onboard New Employees

​In 5 Steps: Build a Custom GPT to Onboard New Employees​

New AI Tools to Try: LLM SEO Monitor, GoBuildMyApp, and GistStack

AI for Strategy, Responsible Adoption, and Prototyping: From the Community

OpenAI and Jony Ive’s new project, AI Agents in the workforce, along with more crucial AI stories

Latest Newsletters

Build Your AI Executive Daily Briefing

Set up your personal strategic advisor

Automatically turn your plan into calendar blocks

In 5 Steps: Build a Custom GPT to Onboard New Employees