AI Agent File Safety: 5 Guardrails Against Deletions

Q: Do I really need a container, or is a git worktree enough?

For most small projects, a committed branch or git worktree covers the common case: an agent damaging files inside the project, recoverable with a discard. A container or VM adds the next layer — it limits what the agent can reach beyond the project, which the worktree alone does not. As of June 2026, our rule of thumb is: worktree plus commit for everyday edits, container with only the project mounted for anything that touches credentials, networks, or unfamiliar code.

The first time an AI coding agent deleted something we cared about, it had been told — twice, in plain English, in the same session — not to touch that directory. It deleted it anyway, ran a cleanup it thought was helpful, and reported back cheerfully. Nothing was malicious. The agent simply had write access and a plan. That is the uncomfortable lesson behind the "AI coding agent deleted my files" reports that spread through 2025: an agent that can run shell commands and edit files will, eventually, run the wrong one. Telling it to be careful is not the same as making the careless action impossible.

Quick Answer

If you want AI agent file safety, stop relying on instructions and start relying on isolation. The core principle, which Docker has publicly echoed, is that natural-language instructions are not a security boundary — real isolation is. In practice that means five guardrails: run the agent in the most restrictive permission mode that still gets the job done, keep a deny-list of destructive commands and paths, work on a disposable copy (a dedicated git branch or git worktree) and commit before the agent runs so everything is revertible, run the agent in a container or VM with only the project mounted, and give it least-privilege, short-lived credentials. This is for anyone running Codex, Claude Code, or Cursor locally — especially non-engineers vibe-coding small tools. It is not a substitute for backups. As of June 2026, check each tool's official docs; modes and defaults change between versions.

What This Problem Is

Modern AI coding agents do more than suggest code. They run shell commands, edit files, move things around, and delete what they judge to be clutter — often without pausing if you have granted broad permissions. That capability is the whole point, and it is also the whole risk. With wide-open access, an agent can overwrite an important file, drop a database, force-push over your history, or run a destructive command on the wrong path. There are widely reported 2025 cases of AI agents deleting production data or files despite being explicitly, and repeatedly, told not to. Public issue trackers and forums for these tools contain reports of destructive commands such as rm -rf run against the wrong directory. The throughline: an agent following a plausible-looking plan does not know which file is sacred unless something outside its reasoning enforces it.

Who Should Care

Best for: anyone who lets an AI coding agent run shell commands or edit files on a machine that holds work they can't easily recreate — solopreneurs, small teams, and non-engineers vibe-coding tools or sites without a strong git habit.
Also useful for: experienced developers who run agents in auto-accept or bypass modes for speed and want a blast-radius limit before something goes wrong.
Not a concern for: people who only use AI in a chat window, copy-paste snippets by hand, and never grant an agent direct file or terminal access.

What You Need

Tool	What it does	Official link
An AI coding agent	Codex, Claude Code, or Cursor — the thing running commands and editing files	OpenAI Codex docs
Git	Version control; branches, commits, and worktrees give you a revertible checkpoint	git worktree documentation
A container or VM tool	Runs the agent in isolation with only your project mounted	Docker documentation
Your terminal / OS	Where permission modes and deny rules are configured	Claude Code permission modes

The Fix at a Glance

Risk	Quickest guardrail
Agent edits/deletes without asking	Run in the most restrictive mode that works (read-only / plan first; require approval)
A known-dangerous command slips through	Maintain a deny-list (e.g. `rm -rf`, force-push, DB drops, writes outside the project)
You can't undo what it did	Work on a branch or `git worktree`; commit before the agent runs
It can reach files beyond the project	Run it in a container/VM with only the project mounted
It has powerful credentials	Least-privilege, short-lived tokens; no prod DB creds or broad cloud keys

Step-by-Step

Start restrictive. Before you give the agent a task, set the most restrictive mode that still lets it work — read-only or plan first, with an approval step so it asks before running commands. Loosen only when you trust the direction.
Commit first. Make sure your work is committed (and ideally pushed) before the agent touches anything. A clean commit is your undo button.
Move to a disposable copy. Create a dedicated branch or a separate git worktree so the agent edits an isolated checkout, not your only copy.
Add a deny-list. Block the commands and paths that are never acceptable for the agent to run, so a bad plan hits a wall instead of your disk.
Contain it. For anything riskier than trivial edits, run the agent in a container or VM with only the project directory mounted, so it cannot reach the rest of your machine.
Scope credentials down. Strip production database credentials and broad cloud keys out of the agent's environment; give it scoped, short-lived tokens only.
Review the diff. When the agent finishes, read the git diff before merging. The branch/worktree makes a bad run a discard, not a disaster.

Copy-and-Paste Commands

The safe-by-default star is a git worktree setup: you commit first, then let the agent work in an isolated checkout you can throw away. These commands are real git. The tool-config snippets below are illustrative — check the official docs for the exact schema, because modes, flag names, and config formats change between versions.

# 1) Commit your current work first (your undo button)
git add -A
git commit -m "checkpoint before AI agent run"

# 2) Create an isolated, disposable worktree on a new branch
#    macOS / Linux / Windows (Git Bash or PowerShell) — same git command
git worktree add ../agent-sandbox -b agent/experiment

# 3) Point your agent at ../agent-sandbox and let it work there only.
#    If the run goes badly, just discard the whole worktree:
git worktree remove ../agent-sandbox --force
git branch -D agent/experiment

# 4) If a run was good, review then merge:
git -C ../agent-sandbox diff main
git switch main
git merge agent/experiment

# Run the agent in a container with ONLY the project mounted (illustrative)
# Adjust the image/flags to your setup — check the Docker docs for exact syntax.
docker run --rm -it \
  -v "$PWD":/work -w /work \
  --network none \
  your-agent-image

# Windows PowerShell variant of the volume mount:
docker run --rm -it -v "${PWD}:/work" -w /work --network none your-agent-image

# ILLUSTRATIVE ONLY — check the official docs for the exact schema.
# Claude Code: a deny rule blocking a dangerous command pattern.
# See https://code.claude.com/docs/en/permission-modes and the permissions page.
{
  "permissions": {
    "deny": [
      "Bash(rm -rf *)",
      "Bash(git push --force*)"
    ]
  }
}

# OpenAI Codex: start read-only and require approval (illustrative flags).
# Check https://developers.openai.com/codex/ for current sandbox/approval options.
codex --sandbox read-only --ask-for-approval

Example: What You'll See

The failure mode is quiet, not dramatic. You ask the agent to "clean up the build artifacts," it decides a whole folder is an artifact, and the terminal scrolls past something like this before you can react:

$ # agent runs, broad permissions, no approval step
Running: rm -rf ./build ../shared-assets
Removed 1,284 files.
Done. The workspace is now tidy.

$ git status
fatal: not a git repository (or any of the parent directories): .git

By the time you read "Done," the files are gone. If they were never committed and the agent had reach beyond the project, there is no clean way back.

Example: After the Fix

With the guardrails in place, the same overreaching plan ends safely. The agent is in a worktree, in read-only-then-approve mode, and the destructive step stops at a wall:

$ # agent proposes the same cleanup, but approval + deny-list are on
Proposed: rm -rf ./build ../shared-assets
[blocked] command matches deny rule; path is outside the workspace
Awaiting your approval...

$ # you decline the out-of-scope part, approve the safe part
$ git -C ../agent-sandbox status
On branch agent/experiment
nothing to commit, working tree clean

Worst case, you run git worktree remove ../agent-sandbox --force and your real branch is untouched.

Tested Notes

Input type: a small web project plus a shared-assets folder one level above it, used to test whether an agent would write outside its workspace.
Tool used: Claude Code in plan/default mode with a deny list, and OpenAI Codex CLI started read-only with approval required.
Best result: the worktree-plus-commit pattern — every bad run was a one-command discard, with the real branch never at risk.
What failed: relying on a plain-English "do not touch the shared folder" instruction with broad permissions on; the agent still reached for it.
Manual edits still needed: writing the deny-list patterns for our paths, and reviewing the git diff before every merge — neither is automatic.

Pitfalls We've Actually Hit

Permission modes are easy to loosen and easy to forget you loosened. We have turned on an auto-accept or bypass mode "just for this one task," gotten interrupted, and come back to an agent still running with no guardrails. We have also leaned on a deny-list and assumed it was airtight — but a deny-list only catches the patterns you thought of, and an agent can phrase a destructive action a way you didn't anticipate. And worktrees protect tracked files, not untracked ones: anything you never committed is still exposed. As of June 2026, treat every one of these as a layer, not a guarantee, and check each tool's official docs because defaults shift between versions.

Common Mistakes

Treating "don't delete X" in the prompt as a safety mechanism. It is a hope, not a boundary.
Running in a bypass/auto mode by default for speed, then forgetting it is on.
Letting the agent work on your only copy instead of a branch or worktree, with nothing committed.
Leaving production database credentials or broad cloud keys in the agent's environment.
Mounting your whole home directory into the agent's container instead of just the project.

Tool Alternatives

Tool	How it handles destructive-action safety (as of June 2026, verify in docs)
OpenAI Codex	Configurable sandbox modes — read-only, workspace-write (writes limited to the workspace), and a full-access mode — plus an approval policy so the agent asks before running commands. Recommend starting read-only with approval.
Claude Code	Permission modes — default (asks), plan (read-only planning), acceptEdits (auto-accepts edits), bypassPermissions (no prompts — dangerous) — plus allow/deny permission rules. Recommend default or plan plus a deny list.
Cursor	Agent controls for auto-run, allow/deny commands, and reviewing changes before they apply. Check Cursor's own docs for the exact current settings rather than trusting any fixed UI label.

FAQ

Can I just tell the AI agent not to delete my files?

You can, and you should, but do not rely on it. Natural-language instructions are not a security boundary — Docker has published guidance to that effect, and the widely reported 2025 incidents involved agents that ignored explicit, repeated do-not-touch instructions. A prompt sets intent; it does not constrain capability. The reliable fix is to remove the capability you don't want the agent to have, with permission modes, deny rules, isolation, and scoped credentials. Treat the instruction as a courtesy and the isolation as the actual control.

Is Codex or Claude Code safer for someone worried about deletions?

Both ship real controls, so the safer choice is whichever you'll actually configure conservatively. Codex offers sandbox modes (read-only, workspace-write, full access) plus an approval policy; Claude Code offers permission modes (default, plan, acceptEdits, bypassPermissions) plus allow/deny rules. As of June 2026, start Codex read-only with approval required, or Claude Code in default or plan mode with a deny list. The tool matters less than the mode you run it in — check each tool's official docs because defaults change between versions.

Do I really need a container, or is a git worktree enough?

For most small projects, a committed branch or git worktree covers the common case: an agent damaging files inside the project, recoverable with a discard. A container or VM adds the next layer — it limits what the agent can reach beyond the project, which the worktree alone does not. As of June 2026, our rule of thumb is: worktree plus commit for everyday edits, container with only the project mounted for anything that touches credentials, networks, or unfamiliar code.

What's the most common mistake that leads to lost files?

Running the agent with broad permissions on your only, uncommitted copy. When nothing is committed and the agent can write anywhere, a single overreaching cleanup step has no undo. The pitfall compounds when people enable an auto-accept or bypass mode "just for this task" and forget it is on. Commit first, work on a disposable branch or worktree, and keep the agent in the most restrictive mode that still does the job. That trio prevents the great majority of these incidents.

If the agent already deleted something, can I get it back?

It depends entirely on what you had in place beforehand. If the files were committed to git, you can recover them from history; if they lived in a worktree branch you can reset or restore. If they were never committed and not in a backup, recovery ranges from hard to impossible. This is exactly why the guardrails are preventive, not reactive — set them up before the run. As of June 2026, also keep ordinary backups; no agent guardrail replaces them.

Final Recommendation

Pick isolation over instruction. Run your agent in the most restrictive mode that still works, commit before every run, keep the work in a disposable branch or git worktree, add a deny-list, and contain anything risky in a container with only the project mounted. None of these is exotic, and together they turn "the agent deleted my files" from a catastrophe into a discarded experiment. As of June 2026, verify the specific modes and flags in each tool's official docs, because they change between versions.

👉 Bookmark this five-guardrail routine and set up the worktree pattern once, so it's already in place the next time you hand a task to an agent. For more on running these tools safely day to day, see our AI Automation guides.

How to stop Codex from filling your disk and wearing your SSD — the safe-deletion mindset applies there too.
The 5 problems with running AI coding agents locally — and a maintenance routine
ChatGPT vs Claude vs Gemini for coding web apps
More AI Automation guides

AI Coding Agent Deleted My Files? 5 Guardrails for AI Agent File Safety

Quick Answer

What This Problem Is

Who Should Care

What You Need

The Fix at a Glance

Step-by-Step

Copy-and-Paste Commands

Example: What You'll See

Example: After the Fix

Tested Notes

Pitfalls We've Actually Hit

Common Mistakes

Tool Alternatives

FAQ

Can I just tell the AI agent not to delete my files?

Is Codex or Claude Code safer for someone worried about deletions?

Do I really need a container, or is a git worktree enough?

What's the most common mistake that leads to lost files?

If the agent already deleted something, can I get it back?

Final Recommendation

A small thank-you, only if it helped

Quick Answer

What This Problem Is

Who Should Care

What You Need

The Fix at a Glance

Step-by-Step

Copy-and-Paste Commands

Example: What You'll See

Example: After the Fix

Tested Notes

Pitfalls We've Actually Hit

Common Mistakes

Tool Alternatives

FAQ

Can I just tell the AI agent not to delete my files?

Is Codex or Claude Code safer for someone worried about deletions?

Do I really need a container, or is a git worktree enough?

What's the most common mistake that leads to lost files?

If the agent already deleted something, can I get it back?

Final Recommendation

Related Guides

A small thank-you, only if it helped

Keep going

How to Cut AI Coding Token Costs Without Losing the Agent

Codex Eating Disk Space and SSD Wear? A Safe Cleanup Playbook (2026)

AI Subscription Audit Workflow: Cut Tool Stack Waste

AI Follow-Up Email Workflow for Action Items