Automating SDLC with LangChain, LangSmith, Gemini

Year: 2026 · ▶ Watch on YouTube

Stephanie Wong (Global Lead, Developer Programs) · Harrison Chase (Co-Founder and CEO)

Switch language → zh

Segments (6)

  • 00:00:00 · Introduction — Stephanie Wong
    • Introduction of Harrison Chase, CEO and Co-founder of LangChain, to discuss building applications with LLMs.
  • 00:00:20 · The Agent Harness Layer — Harrison Chase
    • An agent harness is the scaffold around an LLM that connects it to tools and the environment, and engineering this layer is often more effective than fine-tuning model weights.
  • 00:03:47 · Combining Open Source with Managed Infrastructure — Harrison Chase
    • Combining open-source frameworks like LangChain with managed runtimes like Google’s Reasoning Engine solves the major challenges of scaling, state management, and reliability when moving agents from prototype to production.
  • 00:05:48 · Improving Harness Code with Traces and Evals — Harrison Chase
    • Using traces and evals (both explicit and inferred from user feedback) is crucial for identifying when to optimize agent code, a process facilitated by tools like LangSmith.
  • 00:09:18 · How Foundational Model Capabilities Impact Harness Engineering — Harrison Chase
    • Advances in foundational models (e.g., long context, multimodality) simplify or change the nature of harness engineering, but the core need for observability and evaluation remains constant.
  • 00:11:29 · The Future: Meta-Harnesses and the ‘AI AI Engineer’ — Harrison Chase
    • The future involves a ‘meta-harness’ or an ‘AI AI Engineer’—an automated loop where agents analyze their own performance traces and use tools like Gemini Code Assist to rewrite and improve their own code.

Products Announced (4)

  • 00:03:54 · Reasoning Engine on Google Cloud (Discussed)
    • Secure, managed environment for deploying LangChain and LangGraph applications · Handles scaling, state management, and reliability for agentic workflows · Part of the Gemini Enterprise agent platform
    • Available on Google Cloud
  • 01:06:40 · LangSmith (Discussed)
    • Observability and tracing for LLM applications · Evaluation (Evals) framework for testing agent performance · Supports online evals and custom evaluators
    • Available from LangChain
  • 01:06:09 · Gemini Code Assist (Discussed)
    • AI-powered code assistance · Can be used to rewrite agent code as part of an automated improvement loop · Integrated across the SDLC
    • Available on Google Cloud
  • 01:08:48 · LangGraph (Discussed)
    • Library for building stateful, multi-actor applications with LLMs · Allows for creating more deterministic workflows and cycles · Used for building complex harnesses
    • Open Source

Competitor Mentions / Comparisons (3)

  • 00:02:37 · vs ChatGPT — Mentioned as a general-purpose baseline that a specialized agent, with its specific context and tools, differentiates itself from.
  • 00:11:46 · vs OpenAI — Mentioned in the context of researching how agent harnesses might need to be adapted for different foundational models (OpenAI vs. Anthropic vs. Google).
  • 00:11:47 · vs Anthropic — Mentioned in the context of researching how agent harnesses might need to be adapted for different foundational models (OpenAI vs. Anthropic vs. Google).

Benchmarks Shown (1)

  • 00:01:51 · Terminal-Bench: 5th place
    • Improved from 30th to 5th place by only tuning the DevAgents harness, with no changes to the underlying model.

Notable Quotes (4)

  • 00:01:39 — Harrison Chase:

    Changing that harness can be just as effective, and often times way easier, than changing the weights of the underlying model.

  • 00:11:29 — Harrison Chase:

    Everything in the SDLC is getting automated, and so is that like, turning of the flywheel.

  • 00:12:21 — Harrison Chase:

    We’re really creating this like, AI AI engineer.

  • 00:13:50 — Harrison Chase:

    You can’t really improve what you don’t know what happened, and that’s where observability comes in.

Visual Signals

On-screen (3)

  • 00:00:05 · Lower third: 'Stephanie Wong, Global Lead, Developer Programs, Google Cloud'
    • Identifies the host and her role.
  • 00:00:48 · Lower third: 'Harrison Chase, Co-Founder and CEO, LangChain'
    • Identifies the guest speaker and his role.
  • 00:20:37 · Google Cloud Next '26 logo
    • End card for the session video.

Stage (1)

  • 00:00:00 · The interview takes place in a studio setting at the Google Cloud Next ‘26 event, with two speakers sitting at a desk with microphones.

Key Topics

AI Agents · LangChain · Agent Harness · Harness Engineering · LLMs · Foundational Models · Observability · Evals (Evaluation) · LangSmith · LangGraph · Google Cloud · Reasoning Engine · Gemini · SDLC for AI · Meta-Harness

Takeaways

  • The ‘agent harness’—the scaffolding of prompts, tools, and memory around an LLM—is a critical layer for building effective AI agents, and engineering it can yield more performance gains than changing model weights.
  • Moving AI agents from prototype to production requires solving for reliability, state management, and scalability, which is where managed infrastructure like Google’s Reasoning Engine adds significant value to open-source frameworks like LangChain.
  • The development lifecycle for agents is an iterative loop: build the agent, observe its behavior with traces (e.g., in LangSmith), evaluate its performance, and then use those insights to improve the agent’s code or harness.
  • User feedback, both explicit (thumbs up/down) and implicit (corrective language), is a key signal for evaluating agent performance and can be automated with ‘online evals’.
  • The evolution of foundational models (e.g., longer context windows, multimodality) directly impacts harness design, often simplifying it but reinforcing the need for robust observability and evaluation.
  • The future of agent development points towards a ‘meta-harness’ or an ‘AI AI Engineer’—a self-improving system where an agent analyzes its own performance and automatically suggests or applies code changes to its own logic.
  • While models are becoming more capable, the core challenges of observability and evaluation are constant and essential for building reliable, production-grade agentic systems.