Uber LangEffect: Agent Side Effect Management

Key Takeaways

LangEffect: Uber’s internal framework for tracking and reversing agent-caused side effects
Problem: agents executing multi-step tasks create side effects that are hard to rollback atomically
Solution: effect log + compensating transactions pattern borrowed from distributed systems
Saga pattern applied to agent workflows: each action has a compensating undo action
Production result: 99.2% successful rollback rate for interrupted agent tasks at Uber

- LangEffect：Uber 用于追踪及撤销智能体（Agent）所致副作用的内部框架 - 问题：智能体在执行多步骤任务时会产生副作用，这些副作用难以通过原子操作进行回滚 - 解决方案：借鉴分布式系统中的效果日志（Effect Log）与补偿事务（Compensating Transactions）模式 - 将 Saga 模式应用于智能体工作流：为每个动作定义相应的补偿撤销动作 - 生产环境结果：Uber 平台上被中断的智能体任务实现了 99.2% 的成功回滚率

Summary

Uber’s infrastructure team developed LangEffect to solve a specific production problem: agents executing complex, multi-step tasks on production systems (rider matching, driver assignment, payment processing) would sometimes fail mid-task, leaving systems in inconsistent states. Unlike traditional software transactions, agent tasks are long-running and involve external API calls that don’t support traditional ACID rollback.

Uber 的基础设施团队开发了 LangEffect，旨在解决一个特定的生产环境问题：在生产系统（如乘客匹配、司机分配、支付处理）上执行复杂多步骤任务的智能体有时会在任务中途失败，导致系统处于不一致状态。与传统软件事务不同，智能体任务具有长时运行的特点，且涉及不支持传统 ACID 回滚的外部 API 调用。

LangEffect adapts the distributed systems Saga pattern for agent workflows. The core idea: each action an agent takes is registered in an effect log with two entries — the forward action and its compensating transaction (an undo operation). If the agent task fails or is interrupted, LangEffect executes the compensating transactions in reverse order, returning the system to a known-good state.

LangEffect 将分布式系统的 Saga 模式适配于智能体工作流。其核心理念是：智能体采取的每个动作都会在效应日志中注册两个条目——正向操作及其补偿事务（即撤销操作）。若智能体任务失败或被中断，LangEffect 将按逆序执行补偿事务，将系统恢复至已知良好状态。

Implementation details: agents are instrumented via a middleware layer that intercepts tool calls and registers them in the effect log before execution. Compensating transactions are either: (1) pre-specified for known operations (cancel a payment → refund), (2) generated by the LLM for novel operations (with human review for high-value compensations), or (3) flagged as “non-compensable” requiring human intervention.

实现细节：代理通过中间件层进行插桩，该层拦截工具调用并在执行前将其记录到效果日志中。补偿事务包括：(1) 针对已知操作预先指定（例如：取消支付 → 退款），(2) 由 LLM 针对新颖操作生成（针对高价值补偿需经人工审查），或 (3) 标记为“不可补偿”，需人工干预。

The 99.2% rollback success rate is measured on production incidents over 6 months. The 0.8% failure cases are non-compensable operations where external systems had already processed the agent’s actions beyond the point of reversal.

99.2% 的回滚成功率是基于 6 个月的生产事故测算得出的。0.8% 的失败案例属于不可补偿操作，即外部系统已在无法逆转的时间点之后处理了智能体的操作。

Relevant Concepts

Relevant Entities

Uber LangEffect

LLM Wiki

探索

Uber LangEffect: Agent Side Effect Management

Uber LangEffect: Agent Side Effect Management

Key Takeaways

Summary

Relevant Concepts

Relevant Entities

关系图谱

目录

反向链接