How to Add PII Redaction to OpenClaw Before Data Leaves Your Environment

← Back to Blog

Agentic Security

How to Add PII Redaction to OpenClaw Before Data Leaves Your Environment

OpenClaw is one of the most capable open-source agent frameworks available: 300K+ GitHub stars, persistent memory, browser control, file system access, and integrations with every major model provider. That capability is exactly why PII redaction deserves attention. An agent that can read files, browse the web, recall past conversations, and execute code will inevitably touch sensitive data. The question is whether that data leaves your environment with or without protection.

This article walks through a practical middleware pattern for redacting personally identifiable information in OpenClaw workflows. The approach works with the hooks available today and sets up a clean migration path for when first-class mutation hooks mature. The goal is not to argue that OpenClaw is risky; it is to show that the framework already has the right shape for a redaction layer, and teams can add one now.

Why PII Redaction Matters in OpenClaw

OpenClaw agents are not limited to answering questions. They operate on real data:

  • Files: agents read, write, and edit documents in their workspace
  • Messages: conversation history persists across sessions and can include customer details, case numbers, or account information
  • Browser content: agents can navigate pages, fill forms, and extract text from sites that may contain PII
  • Memory: OpenClaw writes session summaries to markdown files that may reference individuals by name, email, or identifier

None of this is a problem as long as the data stays local. The trust question starts when the agent sends a prompt to an external model provider. At that point, the provider's data retention policy, training practices, and subprocessor chain all become relevant. For teams handling customer data, medical records, financial information, or employee files, that handoff is the moment that needs a control layer.

Organizations that already maintain strict controls over how documents enter an AI platform need to apply the same discipline to agentic workflows, where the surface area is larger and the data paths are less predictable.

Where Data Can Leak in the OpenClaw Flow

PII can enter the outbound request at four distinct points in the OpenClaw agent loop. Understanding each one matters because a redaction layer that covers only the prompt misses half the surface area.

Leak point What happens Example
Before prompt build System prompt, skill files, and bootstrap context are assembled. PII can exist in workspace files that get loaded as context. A MEMORY.md file referencing "John Smith, [email protected]" is included as agent context.
Provider-bound model input The full message array (system prompt + conversation history + tool results) is sent to the LLM API. A user message asks the agent to "summarize Sarah's contract," and the conversation history includes her SSN from a prior turn.
Model output The model's response may echo, summarize, or restructure PII that was present in the input. The model returns "Sarah Chen (SSN: 123-45-6789) has the following contract terms..." in its response.
Tool outputs and error messages Results from tool execution (file reads, web fetches, code output) are passed back to the model for the next turn. A bash tool reads a CSV and returns rows containing customer email addresses and phone numbers.

A complete redaction strategy covers all four points. Most teams start with provider-bound input (the highest-volume leak) and tool outputs (the least predictable), then add output filtering once the inbound side is solid.

What OpenClaw Supports Today

OpenClaw's plugin system exposes several hooks that are relevant to PII redaction. They fall into two categories: observational hooks that let you log or inspect data, and modifying hooks that let you change it before it moves to the next stage.

Observational hooks

  • llm_input / llm_output: fire before and after model calls. Useful for logging and monitoring, but they do not support mutation. You can see the PII but you cannot remove it.

Modifying hooks

  • before_prompt_build: runs after session load, before prompt assembly. Can inject or modify system context, but operates before conversation messages are attached.
  • before_llm_call: fires before every LLM API call. Can modify messages, systemPrompt, and tools. Can also block the call entirely. This is the primary hook for inbound PII redaction.
  • tool_result_before_model: fires after tool execution, before the result is passed to the model. Can modify the result field (requires plugins.allowResultModification: true).
  • before_response_emit: fires before the final response is delivered. Can modify content. Useful for output-side redaction, with the caveat that already-streamed text_delta chunks cannot be retracted.

What is incomplete

The hook system is powerful but has three gaps that matter for redaction:

  1. First-writer-wins merge semantics: if two plugins both try to modify messages in before_llm_call, only the first plugin's changes take effect. A redaction plugin must be registered first in the plugin chain.
  2. Streaming gap: before_response_emit acts on the final response, but text_delta chunks sent during streaming have already reached the client. PII in a streamed partial response cannot be recalled.
  3. No built-in redaction primitives: the hooks give you access to the data, but detecting and replacing PII is entirely your responsibility. There is no redact() function in the plugin API.

These gaps do not make the hooks useless. They mean that plugin-based redaction works as a development-time solution, but production deployments benefit from a purpose-built redaction layer that sits outside the plugin chain.

A Practical Redaction Pattern That Works Now

The following four-step pattern can be implemented as an OpenClaw plugin using before_llm_call and tool_result_before_model. It handles the two highest-risk leak points (provider-bound input and tool outputs) with straightforward regex-based detection.

Pattern: detect, tokenize, map, rehydrate
  1. Detect PII in the outbound payload using regex patterns for structured identifiers (emails, SSNs, phone numbers) and optionally a lightweight named-entity recognizer for names.
  2. Replace each match with a stable token that preserves the category and instance: [EMAIL_1], [EMAIL_2], [SSN_1], [PHONE_3]. Stable tokens let the model reason about relationships ("send the invoice to [EMAIL_1]") without seeing the actual value.
  3. Store the mapping in a session-scoped dictionary that maps tokens back to original values. This map never leaves your environment.
  4. Rehydrate only when safe: when the model's response is delivered to a trusted internal surface (not forwarded to another external service), replace tokens with originals for human readability. If the response will be logged, stored, or forwarded, keep tokens in place.

Pseudocode for a before_llm_call plugin

// Plugin: pii-redact // Hook: before_llm_call const piiPatterns = { email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, ssn: /\b\d{3}-\d{2}-\d{4}\b/g, phone: /\b(\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b/g, }; function redact(text, sessionMap) { for (const [type, regex] of Object.entries(piiPatterns)) { text = text.replace(regex, (match) => { const key = `${type}:${match}`; if (!sessionMap.has(key)) { const idx = sessionMap.size + 1; sessionMap.set(key, `[${type.toUpperCase()}_${idx}]`); } return sessionMap.get(key); }); } return text; } // In the hook handler: for (const msg of messages) { msg.content = redact(msg.content, session.piiMap); } systemPrompt = redact(systemPrompt, session.piiMap);

This pseudocode is intentionally simplified. A production implementation would handle multi-part message content (arrays of text and image blocks), nested tool call arguments, and edge cases like PII split across message boundaries. The core concept, a deterministic scan-and-replace before every outbound call, is the part that matters.

What to Redact First

Not all PII carries the same risk or detection difficulty. Prioritize by the combination of regulatory exposure and regex reliability:

PII type Detection reliability Why it matters
Email addresses Very high (regex) Unique identifier, commonly appears in files and messages, trivial to match
Phone numbers High (pattern with formatting variants) Appears in contact lists, CRM exports, support transcripts
SSNs / tax IDs High (fixed format) Severe regulatory consequences under state breach notification laws and federal guidelines
Account numbers Medium (varies by format) Bank accounts, insurance policy numbers, internal customer IDs
Names (customer-facing) Lower (requires NER or lookup) Contextual risk: a name alone may not be PII, but a name paired with a medical record or financial account is

Start with emails, phone numbers, and SSNs. These three categories cover the highest-risk, highest-confidence detections. Add account numbers and names as your redaction pipeline matures and you have a way to handle false positives (names that are also common words, account numbers that look like dates).

What Redaction Does Not Solve

PII redaction is one layer in a security posture, not the entire posture. Three adjacent risks require separate controls:

  • Prompt injection: a malicious input that causes the agent to ignore instructions, leak system prompts, or take unintended actions. Redaction does not inspect intent; it inspects content. Prompt injection requires input validation, instruction hierarchy enforcement, and output filtering.
  • Unsafe tool calls: an agent that can run bash, write files, or make HTTP requests can be directed to exfiltrate data through channels that bypass the redaction layer entirely. Tool allowlisting and sandboxing (per-tool permissions, network egress controls) are the correct mitigations.
  • Exfiltration after tool execution: even if inbound prompts are clean, a tool result could contain sensitive data that the agent writes to an external service in a subsequent step. Egress monitoring and tool-result inspection (via tool_result_before_model) address this, but the coverage depends on the tool inventory.

A mature agentic security stack addresses redaction, injection, tool governance, and egress control as four distinct layers. Teams building on OpenClaw should treat PII redaction as the first layer, not the only one. For the tool governance layer, see our guide on building a tool call audit log. For the action control layer, see how to add human approval gates before sensitive operations execute.

How Zedly Shield Fits

The plugin-based pattern above works for teams comfortable building and maintaining their own redaction infrastructure. Zedly Shield is the production-grade alternative: a sidecar service that sits between the OpenClaw agent and the model provider, handling redaction without requiring changes to the agent code or plugin chain.

  • Sidecar redaction before provider calls: Shield intercepts outbound API requests at the network level, not the plugin level. This means it catches every request regardless of which hook fired (or failed to fire), including requests from tools that make their own LLM calls. No gaps from merge semantics or registration order.
  • Policy-based scrubbing: define redaction policies per data type, per workflow, or per model provider. A policy might redact SSNs from all outbound calls but allow email addresses when the provider has a signed DPA. Policies are versioned and auditable.
  • Evidence packets: every redaction event produces a timestamped evidence packet that records what was redacted, which policy applied, and what token replaced the original value. The packet does not contain the original PII. Compliance teams can verify that redaction occurred without accessing the sensitive data, and the packets integrate with existing audit and SIEM workflows.

Shield is designed for teams that need redaction to be a guarantee rather than a best-effort plugin. If your agentic workflow handles data subject to HIPAA, PCI-DSS, GDPR, or state privacy laws, the redaction layer needs to be outside the application's control flow, where it cannot be bypassed by a misconfigured plugin or a new tool that was not wired into the hook chain.

Implementation Checklist

Whether you build a custom plugin or deploy Zedly Shield, these steps cover the baseline for PII redaction in an OpenClaw environment:

  1. Inventory your data sources. List every file type, message channel, and tool that your OpenClaw agents interact with. Identify which ones are likely to contain PII.
  2. Map your model providers. Document which external APIs your agents call, what data retention each provider commits to, and whether a DPA or BAA is in place.
  3. Implement regex-based detection for emails, phone numbers, and SSNs as a minimum. Test against a sample of real agent transcripts (redacted for testing) to measure false-positive and false-negative rates.
  4. Register your redaction plugin first in the OpenClaw plugin chain. Due to first-writer-wins semantics, the redaction plugin must modify messages before any other plugin touches them.
  5. Enable tool_result_before_model with plugins.allowResultModification: true to intercept tool outputs before they reach the model.
  6. Build a session-scoped token map that persists for the duration of a conversation but is never written to disk or sent to an external service. Destroy the map when the session ends.
  7. Decide your rehydration policy. Define which surfaces (internal UI, API responses, exported reports) receive original values and which receive tokens. Default to tokens everywhere and allowlist specific surfaces.
  8. Log redaction events (type, count, timestamp, policy) without logging the original values. These logs are your evidence trail for compliance audits.
  9. Test with adversarial inputs: PII split across messages, PII in tool call arguments, PII in error stack traces, and PII in base64-encoded content. Each of these bypasses naive regex if not handled.
  10. Review quarterly. New tools, new model providers, and new data sources change the leak surface. The redaction policy should be reviewed at least every quarter.

Run the OpenClaw Risk Check

See where your workflow can leak PII and what to redact before model calls. Our team will map your OpenClaw agent configuration, identify unprotected data paths, and recommend a redaction strategy tailored to your compliance requirements.

Explore Zedly Shield

Frequently Asked Questions

Does OpenClaw send my data to external model providers?

Yes, unless you run a fully local model through Ollama or a similar provider. When OpenClaw is configured to use OpenAI, Anthropic, Google, or any hosted LLM, the full prompt (including file contents, messages, memory, and tool results) is sent to that provider's API. The provider's data handling policy then governs what happens next. This is why redaction before the model call matters: once data leaves your environment, your control over it depends entirely on the provider's terms.

Can I redact PII using OpenClaw's built-in hooks?

Partially. OpenClaw's before_llm_call hook (introduced in PR #39206) can modify messages and the system prompt before they reach the model provider. The tool_result_before_model hook can sanitize tool outputs. However, streaming responses that have already been emitted cannot be retracted, and merge semantics are first-writer-wins, meaning only one plugin can modify a given field. For production use, a dedicated sidecar that intercepts all outbound traffic is more reliable than relying on individual hooks.

What PII types should I redact first in an agentic workflow?

Start with the highest-risk, easiest-to-detect categories: email addresses (regex is highly reliable), phone numbers (formatted patterns), and government identifiers like SSNs and tax IDs (fixed format, severe consequences if leaked). Account numbers and customer names come next. Names are harder to detect with regex alone and often require named-entity recognition, so they are typically a second-phase addition.

Does PII redaction protect against prompt injection?

No. PII redaction and prompt injection are separate concerns. Redaction removes sensitive data from outbound requests; it does not prevent a malicious prompt from causing the agent to take unintended actions. A complete security posture for agentic AI requires both: redaction to protect data, and input validation, tool allowlisting, and output filtering to protect against adversarial prompts.

What is a redaction evidence packet?

A redaction evidence packet is an audit artifact that records what was redacted, when, by which policy, and what token replaced the original value. It does not contain the original PII. The packet lets compliance teams verify that redaction occurred without exposing the sensitive data, and it supports incident investigation if a breach is suspected. Zedly Shield generates these packets automatically for every redaction event.

Ready to get started?

Runtime safety for agentic AI. PII redaction, policy-based blocking, and tamper-evident audit logs for OpenClaw.