Harbor x TensorLake: Infrastructure for Agentic Evals

We are thrilled to unveil the integration of TensorLake as a first-class environment provider in Harbor. This integration unlocks a new tier of scalability for agent evaluation, allowing developers to run thousands of concurrent benchmarks in secure, ephemeral MicroVMs designed specifically for the next generation of AI workloads, currently under review Pull Request #1237.
By combining Harbor's rigorous evaluation framework with TensorLake's high-performance infrastructure, we are defining the standard for reliable, scalable, and secure agent benchmarking.
What is TensorLake?
TensorLake is a specialized compute infrastructure for AI agents. Tensorlake provides stateful sandbox infrastructure with many dyanmic capabilites which makes it easy to deploy agents and creating RL Environments:
- MicroVM Isolation: Firecracker VMs with sub 200 milliseconds startup time.
- Stateful Suspend and Resume: Sandboxes are suspended automatically when they finish, and resume in case you want to re-use the VM for debugging or starting another task.
- Clone: Running sandboxes can be cloned across the cluster to replicate an environment after setting it up.
Key Integration Features
1. Drop-in Scalability
Scale from 1 to 1,000 concurrent agents instantly. Switching to TensorLake in Harbor is as simple as changing a CLI flag.
harbor run --task-name [my-benchmark] --dataset [my-dataset] --env tensorlake2. MicroVM Security
TensorLake uses MicroVMs to ensure that code executed by agents is completely isolated from your host infrastructure. This is critical when evaluating agents on untrusted code or complex benchmarks where "rm -rf /" might be a valid (but dangerous) agent action.
3. Resource Control & GPU Support
The integration supports fine-grained control over the sandbox resources directly from your Harbor config:
- Compute: Configurable vCPUs and RAM.
- Storage: Ephemeral disk sizing.
- GPUs: Native support for GPU-accelerated workloads, essential for agents performing local inference or data science tasks.
4. State Management with Snapshots
Harbor leverages TensorLake's snapshot capabilities. You can start evaluations from pre-warmed states, significantly reducing setup time for complex environments that require heavy dependency installation.
TensorLake vs. Other Environments
Why choose TensorLake?
- Vs. Daytona: While Daytona is excellent for persistent developer environments (long-running workspaces), TensorLake is optimized for the high-churn, ephemeral nature of agent loops where environments are created and destroyed rapidly.
- Vs. E2B: Both offer excellent MicroVM sandboxing. TensorLake is particularly distinct in its broader ecosystem integration (Indexify) for extraction and workflow orchestration, making it a strong choice if your agents are part of a larger data processing pipeline.
- Vs. Modal: Modal excels at serverless GPU compete and batch ML jobs. TensorLake is optimized for stateful, long-running agent loops, with native suspend/resume, live migration, and cloning that Modal doesn’t support. If your agents need to persist state across requests rather than isolated jobs, Tensorlake is the better fit.
Getting Started
1. Install the SDK:
pip install tensorlake2. Set your API Key:
export TENSORLAKE_API_KEY="tl_..."3. Run your first task (you need to set up the keys for your model):
harbor run --env tensorlake --task-name adaptive-rejection-sampler --dataset terminal-bench@2.0 --agent claude-code --model anthropic/claude-sonnet-4-6Debugging
Need to see what the agent is doing inside the sandbox? Harbor exposes TensorLake's native debugging tools:
# Drops you directly into the running sandbox shell
harbor env attach <session_id>Related articles
Get server-less runtime for agents and data ingestion
Tensorlake is the Agentic Compute Runtime the durable serverless platform that runs Agents at scale.