On-Premise Infrastructure

Definition

定义

On-premise infrastructure refers to the hardware, software, and networking components deployed within an enterprise’s own data centers for running AI agents, as opposed to using public cloud services.

On-premise 基础设施是指在企业的自建数据中心内部署的硬件、软件和网络组件,用于运行 AI Agent,而非使用公有云服务。

Details

详情

On-premise deployment is the gold standard for Chinese enterprises with strict security and compliance requirements. The infrastructure stack consists of:

对于有严格安全与合规要求的中国企业而言,On-premise 部署是黄金标准。其基础设施 Stack 包含:

Compute Layer

  • GPU clusters: NVIDIA A100/H100 or Huawei Ascend NPU
  • CPU servers: For orchestration and non-inference workloads
  • Storage: Distributed file systems (Ceph, GlusterFS) for model weights
  • Networking: 100Gbps InfiniBand for GPU-to-GPU communication
**计算层** - **GPU 集群**:NVIDIA A100/H100 或华为 Ascend NPU - **CPU 服务器**:用于编排和非推理工作负载 - **存储**:用于存储模型权重的分布式文件系统(Ceph、GlusterFS) - **网络**:用于 GPU 间通信的 100Gbps InfiniBand

Model Serving Layer

  • Inference engines: vLLM, TensorRT-LLM, TGI (Text Generation Inference)
  • Load balancing: Distribute requests across GPU replicas
  • Caching: KV cache optimization for repeated queries
  • Batching: Dynamic batching to maximize GPU utilization
**模型服务层** - **推理引擎**:vLLM、TensorRT-LLM、TGI (Text Generation Inference) - **负载均衡**:在多个 GPU 副本间分发请求 - **缓存**:针对重复查询的 KV cache 优化 - **批处理**:通过动态批处理(Dynamic batching)最大化 GPU 利用率

Agent Runtime Layer

  • Orchestration: OpenClaw, LangChain, or proprietary frameworks
  • Skill registry: Internal catalog of approved skills
  • Harness: Permission enforcement, audit logging
  • Sandbox: Firecracker microVMs for isolated execution
**Agent 运行时层** - **编排**:OpenClaw、LangChain 或自研框架 - **技能注册表**:已批准技能的内部目录 - **Harness**:权限强制执行与审计日志记录 - **沙箱**:用于隔离执行的 Firecracker microVMs

Integration Layer

  • API gateway: Kong, Tyk, or Nginx for routing
  • Message queue: RabbitMQ, Kafka for async workflows
  • Database: PostgreSQL, MongoDB for agent state
  • Monitoring: Prometheus, Grafana, OpenTelemetry
**集成层** - **API gateway**:Kong、Tyk 或 Nginx,用于路由 - **消息队列**:RabbitMQ、Kafka,用于异步工作流 - **数据库**:PostgreSQL、MongoDB,用于存储 Agent 状态 - **监控**:Prometheus、Grafana、OpenTelemetry

Key Challenges

**主要挑战**

GPU Shortage

  • NVIDIA export restrictions limit H100 availability in China
  • Domestic alternatives (Huawei Ascend, Cambricon) less mature
  • Long lead times (6-12 months) for GPU procurement
  • High cost: 40K per H100 GPU
**GPU 短缺** - NVIDIA 出口限制导致 H100 在中国供应受限 - 国产替代方案(华为 Ascend、Cambricon)成熟度较低 - GPU 采购周期长(6-12 个月) - 成本高昂:每块 H100 GPU 售价 3 万至 4 万美元

Talent Gap

  • Shortage of engineers who can deploy and maintain infrastructure
  • Need expertise in: GPU programming, distributed systems, LLM serving
  • Training takes 6-12 months minimum
  • Competition for talent drives up salaries
**人才缺口** - 缺乏能够部署和维护基础设施的工程师 - 需要具备以下领域的专业知识:GPU 编程、分布式系统、LLM serving - 培训周期至少需要 6-12 个月 - 人才争夺战推高了薪资水平

Cost

  • 10-100x more expensive than public cloud APIs
  • Upfront capital expenditure for hardware
  • Ongoing costs: power, cooling, maintenance
  • Underutilization during off-peak hours
- 成本比公有云 API 高出 10 到 100 倍 - 硬件需要预先资本投入 - 持续成本:电力、散热、维护 - 非高峰时段的资源利用率低

Maintenance

  • Model updates require redeployment
  • Security patches for OS, drivers, frameworks
  • Hardware failures and replacements
  • Capacity planning for growth
**维护** - 模型更新需要重新部署 - 针对 OS、驱动程序和框架的安全补丁 - 硬件故障与更换 - 针对业务增长的容量规划

Advantages

**优势**

Security

  • Full control over data, no external access
  • Prevent data exfiltration and IP theft
  • Meet air-gap requirements for defense/government
**安全性** - 完全掌控数据,无外部访问 - 防止数据泄露和知识产权盗窃 - 满足国防及政府领域的物理隔离要求

Compliance

  • Satisfy MLPS 2.0, PIPL data localization requirements
  • Audit logs under enterprise control
  • No foreign cloud provider dependencies
**合规性** - 满足 MLPS 2.0 及 PIPL 数据本地化要求 - 审计日志由企业自主管控 - 不依赖国外云厂商

Performance

  • Lower latency for internal applications
  • Predictable performance, no noisy neighbors
  • Optimized for specific workloads
**性能** - 降低内部应用延迟 - 性能表现可预测,无邻居干扰 - 针对特定工作负载进行优化

Cost (at scale)

  • Cheaper than public cloud for sustained high usage
  • No egress fees or API call charges
  • Amortize hardware cost over 3-5 years
**成本(规模化)** - 对于持续高负载使用,成本低于公有云 - 无流量费或 API 调用费 - 硬件成本可在 3-5 年内摊销

Deployment Patterns

部署模式

Pattern A: Centralized Data Center

  • All infrastructure in single location
  • Easier to manage, lower cost
  • Single point of failure
  • Higher latency for remote offices
**模式 A:集中式数据中心** - 所有基础设施位于单一位置 - 易于管理,成本更低 - 存在单点故障 - 远程办公地点延迟较高

Pattern B: Distributed Edge

  • Infrastructure in multiple regional data centers
  • Lower latency, higher availability
  • Complex synchronization
  • Higher cost
**模式 B:分布式边缘** - 基础设施分布在多个区域数据中心 - 更低延迟,更高可用性 - 同步机制复杂 - 成本较高

Pattern C: Hybrid

  • Core infrastructure on-premise
  • Burst to domestic cloud for peak loads
  • Balance cost and control
  • Requires secure connectivity
**模式 C:混合模式** - 核心基础设施 On-premise - 高峰负载突发至国内云 - 平衡成本与控制 - 需要安全的网络连接

Connections

连接
- 相关内容:[[enterprise-agent-china/concepts/private-deployment-architecture|私有化部署架构]],[[enterprise-agent-china/concepts/china-agent-landscape|中国 Agent 格局]] - 提及于:[[enterprise-agent-china/sources/ai-infrastructure-industry-report|AI 基础设施行业报告]]