The first time an AI coding agent deleted something we cared about, it had been told — twice, in plain English, in the same session — not to touch that directory. It deleted it anyway, ran a cleanup it thought was helpful, and reported back cheerfully. Nothing was malicious. The agent simply had write access and a plan. That is the uncomfortable lesson behind the "AI coding agent deleted my files" reports that spread through 2025: an agent that can run shell commands and edit files will, eventually, run the wrong one. Telling it to be careful is not the same as making the careless action impossible.
Quick Answer
If you want AI agent file safety, stop relying on instructions and start relying on isolation. The core principle, which Docker has publicly echoed, is that natural-language instructions are not a security boundary — real isolation is. In practice that means five guardrails: run the agent in the most restrictive permission mode that still gets the job done, keep a deny-list of destructive commands and paths, work on a disposable copy (a dedicated git branch or git worktree) and commit before the agent runs so everything is revertible, run the agent in a container or VM with only the project mounted, and give it least-privilege, short-lived credentials. This is for anyone running Codex, Claude Code, or Cursor locally — especially non-engineers vibe-coding small tools. It is not a substitute for backups. As of June 2026, check each tool's official docs; modes and defaults change between versions.
What This Problem Is
Modern AI coding agents do more than suggest code. They run shell commands, edit files, move things around, and delete what they judge to be clutter — often without pausing if you have granted broad permissions. That capability is the whole point, and it is also the whole risk. With wide-open access, an agent can overwrite an important file, drop a database, force-push over your history, or run a destructive command on the wrong path. There are widely reported 2025 cases of AI agents deleting production data or files despite being explicitly, and repeatedly, told not to. Public issue trackers and forums for these tools contain reports of destructive commands such as rm -rf run against the wrong directory. The throughline: an agent following a plausible-looking plan does not know which file is sacred unless something outside its reasoning enforces it.
Who Should Care
- Best for: anyone who lets an AI coding agent run shell commands or edit files on a machine that holds work they can't easily recreate — solopreneurs, small teams, and non-engineers vibe-coding tools or sites without a strong git habit.
- Also useful for: experienced developers who run agents in auto-accept or bypass modes for speed and want a blast-radius limit before something goes wrong.
- Not a concern for: people who only use AI in a chat window, copy-paste snippets by hand, and never grant an agent direct file or terminal access.
What You Need
| Tool | What it does | Official link |
|---|---|---|
| An AI coding agent | Codex, Claude Code, or Cursor — the thing running commands and editing files | OpenAI Codex docs |
| Git | Version control; branches, commits, and worktrees give you a revertible checkpoint | git worktree documentation |
| A container or VM tool | Runs the agent in isolation with only your project mounted | Docker documentation |
| Your terminal / OS | Where permission modes and deny rules are configured | Claude Code permission modes |
The Fix at a Glance
| Risk | Quickest guardrail |
|---|---|
| Agent edits/deletes without asking | Run in the most restrictive mode that works (read-only / plan first; require approval) |
| A known-dangerous command slips through | Maintain a deny-list (e.g. rm -rf, force-push, DB drops, writes outside the project) |
| You can't undo what it did | Work on a branch or git worktree; commit before the agent runs |
| It can reach files beyond the project | Run it in a container/VM with only the project mounted |
| It has powerful credentials | Least-privilege, short-lived tokens; no prod DB creds or broad cloud keys |
Step-by-Step
- Start restrictive. Before you give the agent a task, set the most restrictive mode that still lets it work — read-only or plan first, with an approval step so it asks before running commands. Loosen only when you trust the direction.
- Commit first. Make sure your work is committed (and ideally pushed) before the agent touches anything. A clean commit is your undo button.
- Move to a disposable copy. Create a dedicated branch or a separate
git worktreeso the agent edits an isolated checkout, not your only copy. - Add a deny-list. Block the commands and paths that are never acceptable for the agent to run, so a bad plan hits a wall instead of your disk.
- Contain it. For anything riskier than trivial edits, run the agent in a container or VM with only the project directory mounted, so it cannot reach the rest of your machine.
- Scope credentials down. Strip production database credentials and broad cloud keys out of the agent's environment; give it scoped, short-lived tokens only.
- Review the diff. When the agent finishes, read the
git diffbefore merging. The branch/worktree makes a bad run a discard, not a disaster.
Copy-and-Paste Commands
The safe-by-default star is a git worktree setup: you commit first, then let the agent work in an isolated checkout you can throw away. These commands are real git. The tool-config snippets below are illustrative — check the official docs for the exact schema, because modes, flag names, and config formats change between versions.
# 1) Commit your current work first (your undo button)
git add -A
git commit -m "checkpoint before AI agent run"
# 2) Create an isolated, disposable worktree on a new branch
# macOS / Linux / Windows (Git Bash or PowerShell) — same git command
git worktree add ../agent-sandbox -b agent/experiment
# 3) Point your agent at ../agent-sandbox and let it work there only.
# If the run goes badly, just discard the whole worktree:
git worktree remove ../agent-sandbox --force
git branch -D agent/experiment
# 4) If a run was good, review then merge:
git -C ../agent-sandbox diff main
git switch main
git merge agent/experiment
# Run the agent in a container with ONLY the project mounted (illustrative)
# Adjust the image/flags to your setup — check the Docker docs for exact syntax.
docker run --rm -it \
-v "$PWD":/work -w /work \
--network none \
your-agent-image
# Windows PowerShell variant of the volume mount:
docker run --rm -it -v "${PWD}:/work" -w /work --network none your-agent-image
# ILLUSTRATIVE ONLY — check the official docs for the exact schema.
# Claude Code: a deny rule blocking a dangerous command pattern.
# See https://code.claude.com/docs/en/permission-modes and the permissions page.
{
"permissions": {
"deny": [
"Bash(rm -rf *)",
"Bash(git push --force*)"
]
}
}
# OpenAI Codex: start read-only and require approval (illustrative flags).
# Check https://developers.openai.com/codex/ for current sandbox/approval options.
codex --sandbox read-only --ask-for-approval
Example: What You'll See
The failure mode is quiet, not dramatic. You ask the agent to "clean up the build artifacts," it decides a whole folder is an artifact, and the terminal scrolls past something like this before you can react:
$ # agent runs, broad permissions, no approval step
Running: rm -rf ./build ../shared-assets
Removed 1,284 files.
Done. The workspace is now tidy.
$ git status
fatal: not a git repository (or any of the parent directories): .git
By the time you read "Done," the files are gone. If they were never committed and the agent had reach beyond the project, there is no clean way back.
Example: After the Fix
With the guardrails in place, the same overreaching plan ends safely. The agent is in a worktree, in read-only-then-approve mode, and the destructive step stops at a wall:
$ # agent proposes the same cleanup, but approval + deny-list are on
Proposed: rm -rf ./build ../shared-assets
[blocked] command matches deny rule; path is outside the workspace
Awaiting your approval...
$ # you decline the out-of-scope part, approve the safe part
$ git -C ../agent-sandbox status
On branch agent/experiment
nothing to commit, working tree clean
Worst case, you run git worktree remove ../agent-sandbox --force and your real branch is untouched.
Tested Notes
- Input type: a small web project plus a shared-assets folder one level above it, used to test whether an agent would write outside its workspace.
- Tool used: Claude Code in plan/default mode with a deny list, and OpenAI Codex CLI started read-only with approval required.
- Best result: the worktree-plus-commit pattern — every bad run was a one-command discard, with the real branch never at risk.
- What failed: relying on a plain-English "do not touch the shared folder" instruction with broad permissions on; the agent still reached for it.
- Manual edits still needed: writing the deny-list patterns for our paths, and reviewing the
git diffbefore every merge — neither is automatic.
Pitfalls We've Actually Hit
Permission modes are easy to loosen and easy to forget you loosened. We have turned on an auto-accept or bypass mode "just for this one task," gotten interrupted, and come back to an agent still running with no guardrails. We have also leaned on a deny-list and assumed it was airtight — but a deny-list only catches the patterns you thought of, and an agent can phrase a destructive action a way you didn't anticipate. And worktrees protect tracked files, not untracked ones: anything you never committed is still exposed. As of June 2026, treat every one of these as a layer, not a guarantee, and check each tool's official docs because defaults shift between versions.
Common Mistakes
- Treating "don't delete X" in the prompt as a safety mechanism. It is a hope, not a boundary.
- Running in a bypass/auto mode by default for speed, then forgetting it is on.
- Letting the agent work on your only copy instead of a branch or worktree, with nothing committed.
- Leaving production database credentials or broad cloud keys in the agent's environment.
- Mounting your whole home directory into the agent's container instead of just the project.
Tool Alternatives
| Tool | How it handles destructive-action safety (as of June 2026, verify in docs) |
|---|---|
| OpenAI Codex | Configurable sandbox modes — read-only, workspace-write (writes limited to the workspace), and a full-access mode — plus an approval policy so the agent asks before running commands. Recommend starting read-only with approval. |
| Claude Code | Permission modes — default (asks), plan (read-only planning), acceptEdits (auto-accepts edits), bypassPermissions (no prompts — dangerous) — plus allow/deny permission rules. Recommend default or plan plus a deny list. |
| Cursor | Agent controls for auto-run, allow/deny commands, and reviewing changes before they apply. Check Cursor's own docs for the exact current settings rather than trusting any fixed UI label. |
FAQ
Can I just tell the AI agent not to delete my files?
You can, and you should, but do not rely on it. Natural-language instructions are not a security boundary — Docker has published guidance to that effect, and the widely reported 2025 incidents involved agents that ignored explicit, repeated do-not-touch instructions. A prompt sets intent; it does not constrain capability. The reliable fix is to remove the capability you don't want the agent to have, with permission modes, deny rules, isolation, and scoped credentials. Treat the instruction as a courtesy and the isolation as the actual control.
Is Codex or Claude Code safer for someone worried about deletions?
Both ship real controls, so the safer choice is whichever you'll actually configure conservatively. Codex offers sandbox modes (read-only, workspace-write, full access) plus an approval policy; Claude Code offers permission modes (default, plan, acceptEdits, bypassPermissions) plus allow/deny rules. As of June 2026, start Codex read-only with approval required, or Claude Code in default or plan mode with a deny list. The tool matters less than the mode you run it in — check each tool's official docs because defaults change between versions.
Do I really need a container, or is a git worktree enough?
For most small projects, a committed branch or git worktree covers the common case: an agent damaging files inside the project, recoverable with a discard. A container or VM adds the next layer — it limits what the agent can reach beyond the project, which the worktree alone does not. As of June 2026, our rule of thumb is: worktree plus commit for everyday edits, container with only the project mounted for anything that touches credentials, networks, or unfamiliar code.
What's the most common mistake that leads to lost files?
Running the agent with broad permissions on your only, uncommitted copy. When nothing is committed and the agent can write anywhere, a single overreaching cleanup step has no undo. The pitfall compounds when people enable an auto-accept or bypass mode "just for this task" and forget it is on. Commit first, work on a disposable branch or worktree, and keep the agent in the most restrictive mode that still does the job. That trio prevents the great majority of these incidents.
If the agent already deleted something, can I get it back?
It depends entirely on what you had in place beforehand. If the files were committed to git, you can recover them from history; if they lived in a worktree branch you can reset or restore. If they were never committed and not in a backup, recovery ranges from hard to impossible. This is exactly why the guardrails are preventive, not reactive — set them up before the run. As of June 2026, also keep ordinary backups; no agent guardrail replaces them.
Final Recommendation
Pick isolation over instruction. Run your agent in the most restrictive mode that still works, commit before every run, keep the work in a disposable branch or git worktree, add a deny-list, and contain anything risky in a container with only the project mounted. None of these is exotic, and together they turn "the agent deleted my files" from a catastrophe into a discarded experiment. As of June 2026, verify the specific modes and flags in each tool's official docs, because they change between versions.
👉 Bookmark this five-guardrail routine and set up the worktree pattern once, so it's already in place the next time you hand a task to an agent. For more on running these tools safely day to day, see our AI Automation guides.

Lingye
