AI Coding Security

In 9 seconds, an AI agent deleted an entire company's production database. Every backup went with it.

The founder watched it happen in real time. The most recent recoverable state was three months old.

The AI agent later wrote: "I violated every principle I was given."

This is the PocketOS incident, April 2026. The agent was Cursor, powered by Claude Opus. The task was routine: fix a credential mismatch in the staging environment. Nobody expected it to end in production.

Here is what actually happened, and what it means for every team running AI agents against real infrastructure.

What went wrong

The failure was not a hallucination. The AI did not malfunction. It followed a logical sequence to its conclusion - and that conclusion destroyed production data.

Three things had to be true for this to happen. All three were.

Three failures - all required, all present

Failure 1

Overprivileged token in the codebase

A Railway API token created for managing custom domains was stored in an environment file. The token had blanket authority across Railway's entire GraphQL API - no environment scoping, no operation restrictions. The agent found it while searching for credentials and used it.

Failure 2

No staging / production boundary

The agent was working on a staging task. Nothing prevented it from reaching production infrastructure. Same token, same API surface, same blast radius. The word "staging" existed in the task description. It did not exist as a technical constraint.

Failure 3

No confirmation gate on destructive operations

The agent executed a volumeDelete mutation autonomously. One API call. No prompt to the user, no dry-run, no "this is irreversible - confirm?" Railway volume backups were stored inside the same volume. Everything went in a single operation.

None of these failures were new. Every one of them violates principles teams enforce for human engineers. They just were not enforced for AI agents.

The four attack surfaces

The PocketOS incident is the most visible example of a category of problem every team using AI coding agents is exposed to. The attack surfaces are consistent across tools and stacks.

1. Credentials in AI-readable context

AI coding agents read your project files. That is the point - context makes them useful. But the same file access that lets Claude understand your codebase also exposes anything sitting in that codebase.

What gets exposed most often:

.env files checked into the project or sitting in the working directory
Hardcoded credentials in config files, scripts, seed data
CI/CD tokens in .github/workflows files
API keys in CLAUDE.md, AGENTS.md, or memory files
Database connection strings in migration files or test fixtures

The agent does not have to do anything malicious. It finds what it needs to complete its task, uses it, and moves on. If what it found was a prod credential with write access, the risk is the same whether the intent was benign or not.

The fix:

# .gitignore - also add to .claudeignore or equivalent
.env
.env.*
*.pem
*.key
secrets/
.railway/

# AGENTS.md / CLAUDE.md
Never read or use files matching: .env, .env.*, *.key, *.pem, secrets/
If you need a credential for a task, stop and ask. Do not search the codebase for one.

2. No prod / staging boundary

Most teams have separate environments. Few enforce that separation for AI agents.

The PocketOS agent was instructed to fix a staging problem. It found a token with prod access and used it. The agent did not distinguish between "fixing staging" and "deleting production" - because nothing in its environment made that distinction real.

A rule in a prompt is not a boundary. A token that cannot reach prod is a boundary.

The fix:

Separate API tokens per environment. Staging token: staging access only. No exceptions.
Scope MCP server connections to the environment the agent is working in.
If your infrastructure provider has RBAC, use it. If it does not (Railway does not at the token level), treat the token as all-or-nothing and protect accordingly.
Agent-accessible credentials should be created specifically for AI agent use, scoped to the narrowest set of operations that task requires.

3. Destructive operations without confirmation

Agents operating in agentic mode will execute multi-step plans autonomously. They will write files, run migrations, call APIs, and delete resources - without stopping unless instructed to.

Claude's own documentation says agents should pause before "irreversible actions with significant consequences." That guidance is worth nothing if the agent's environment does not enforce it technically.

The fix:

# AGENTS.md / CLAUDE.md
Before any destructive or irreversible operation - deleting files, dropping tables,
calling delete/destroy endpoints, resetting environments - stop and confirm with me.
Show me the exact command you intend to run before running it.
This applies even if I asked you to clean something up.

For Claude Code specifically: Cmd+. stops generation mid-stream. Set that reflex before long agentic sessions. For Cursor: review the action before clicking "Apply."

4. PII and secrets in debug context

When something breaks, developers paste logs, stack traces, and database records into the chat to help the agent debug. Those logs often contain what they should not.

Production logs routinely include: session tokens, user email addresses, internal IDs, payment references, request headers with Authorization values, and database row contents. Once pasted into an AI session, that data is in the context window of a third-party API call.

The fix: Sanitize before you share. A one-line habit before pasting anything:

# Strip common sensitive patterns before pasting
cat app.log | grep -v "Authorization\|Bearer\|token=\|password\|email\|user_id" | tail -50

Most debug problems do not require real data. Reproduce with anonymized fixtures. If you must share real output, strip it first.

What AI should and should not see

Keep out of AI context

Production credentials and API tokens

Environment files (.env, .env.production)

Private keys and certificates

Prod database connection strings

Raw production logs with user data

Internal auth tokens from CI/CD pipelines

Backup and recovery scripts with prod access

Safe to include

Source code and business logic

Schema definitions (without connection strings)

Anonymized or synthetic test data

Staging-only credentials (scoped, read-heavy)

Architecture docs and API contracts

Sanitized error messages and stack traces

CLAUDE.md with rules, not with secrets

How to structure your project for AI agents

Most teams working with Claude Code or Cursor have a setup like this: one repository, multiple environments, a mix of .env files, deployment scripts, and infrastructure tokens scattered across the project. The AI agent reads whatever it can reach. What it can reach is determined by how you structure the project - not by what you tell it.

This distinction matters. CLAUDE.md rules and system prompt instructions are real and worth writing. But they are soft guardrails: the model reads them and tries to comply. They are not a wall. An instruction the model can read is an instruction the model can misread, misapply, or - under the right conditions - override itself to resolve what it sees as a more urgent problem.

The wall has to come first. The instruction comes second.

Layer 1 Hard

Technical controls - the actual wall

Files the agent cannot physically access. Tokens that cannot reach prod. Network rules that block prod endpoints from dev environments. These cannot be overridden by a misread instruction or an overly ambitious plan. The agent simply cannot act on what it cannot reach.

Layer 2 Soft

Prompt and config rules - the reminder

CLAUDE.md instructions, AGENTS.md constraints, system prompts. These matter. A well-written CLAUDE.md catches most mistakes before they happen. But they are only as reliable as the model's compliance - which is high, but not absolute. Never let a soft control be your only control.

What this looks like in practice

A typical project working with Claude Code in a team with dev, staging, and prod environments:

project/
├── .claudeignore          # tool-specific exclusions (soft control)
├── CLAUDE.md              # soft rules - AI reads and follows these
├── .env.example           # committed, no real values
├── .env                   # local dev only, gitignored
├── .env.staging           # gitignored, never in repo
├── .env.production        # gitignored, never in repo
├── scripts/
│   ├── dev-seed.sh        # safe - dev DB only
│   ├── migrate.sh         # safe - runs against $DATABASE_URL
│   └── prod-rollback.sh   # dangerous - should be in .claudeignore
└── infrastructure/
    ├── dev/               # safe
    └── prod/              # should be in .claudeignore

Each major AI coding tool has its own ignore file format:

Tool	File
Claude Code	`.claudeignore`
Cursor	`.cursorignore`
GitHub Copilot	`.copilotignore`
Windsurf	`.windsurfignore`

These are useful and worth configuring. But they are soft controls - tool-specific, enforced by convention, and only as reliable as the tool's implementation. If you switch tools, or use multiple tools on the same project, each needs its own file. More importantly: an ignore file is a request. It is not a permission boundary.

The real wall is tool-agnostic and enforced by the operating system or your infrastructure, not by the agent:

OS-level file permissions. Set production config files to chmod 600 owned by a restricted user. The agent process, running as your normal developer user, cannot read what that user has no permission to read - regardless of what tool is in use or what the ignore file says.

chmod 600 .env.production
chown deploy-only-user:deploy-only-user .env.production

Secrets manager over file-based credentials. Production secrets should not exist as files in your project directory. They should live in a secrets manager - AWS Secrets Manager, HashiCorp Vault, Doppler, or equivalent - and be injected at runtime by your CI/CD pipeline. An agent session that never sees a file cannot leak a file.

.gitignore as a baseline. At minimum, ensure production environment files are gitignored. This does not prevent a local agent from reading them, but it removes them from the repository surface and reduces the chance of accidental exposure across the team.

# .gitignore
.env
.env.*
!.env.example
*.pem
*.key
secrets/

Scoped working directory. Run agent sessions from a subdirectory (src/, app/) rather than the project root when the task is limited to application code. An agent that starts in src/ and stays in src/ never encounters the infrastructure scripts or config files sitting in the root.

With those controls in place, the ignore files become a useful second layer - catching cases the hard controls do not cover, adding visibility, and communicating intent to the agent. Your CLAUDE.md then operates as the third layer: the guidance the agent follows when it encounters ambiguity within the space it is allowed to operate.

Your CLAUDE.md is the third layer. It tells the agent what to do when it finds ambiguity - not as a replacement for hard controls, but as the guidance layer that runs on top of them.

# CLAUDE.md
Environment: development only. You have no access to staging or production.
 
Database: use DATABASE_URL from .env only. Never construct connection strings manually.
Never run migrations without explicit confirmation.
 
Destructive operations: stop and confirm before any delete, drop, truncate, or
infrastructure change. Show me the exact command first.
 
If a task requires production access, stop and tell me. Do not attempt to find
credentials or work around the limitation.

The token strategy

In the PocketOS incident, the Railway token was not in a secret store or a protected file. It was in a regular file the agent had access to, with permissions far beyond what the task required.

The correct structure for AI agent credentials:

Create a separate service account for AI agent use. Not your personal developer token. A dedicated token with the minimum permissions that agent needs for its typical tasks.
For database access: read-only on staging data, write access only to dev. Never prod.
For cloud provider tokens: scope to specific operations (e.g., only volume:read, never volume:delete). If your provider does not support operation-level scoping, treat any token as all-or-nothing and keep it out of the repo entirely.
Rotate agent tokens more aggressively than developer tokens. They have a larger attack surface.

If you cannot scope a token narrowly enough - if the provider forces you to choose between no access and full access - that token does not go in the project. It goes in a secrets manager, accessed by CI/CD only, and never touched by a local agent session.

The rule set

Minimum viable AI security policy

One token per environment, scoped to that environment A staging token that can reach prod is a prod token. Treat it that way.

Add .env and secrets/ to .claudeignore File system access is the first thing agents use. Exclude what they should not see before the session starts.

Require confirmation before any destructive command Put this in AGENTS.md. Repeat it in the task prompt for anything that touches infrastructure.

Sanitize logs before pasting into chat Strip Authorization headers, tokens, emails, and user IDs. Most bugs reproduce without them.

Create dedicated agent credentials Do not reuse developer tokens for agents. Create narrow-scope service accounts specifically for AI agent use.

Off-volume backups, always The PocketOS backups were inside the volume that was deleted. A backup that lives next to the data it backs up is not a backup.

This is not about distrusting AI

The PocketOS agent was not rogue. It was logical. It found a credential, confirmed it worked, used it to complete what it understood to be its task, and succeeded. The task turned out to be destroying production.

That outcome was not inevitable. It required the exact combination of failures listed above. Change any one of them - scoped token, staging-only access, confirmation gate, off-volume backup - and the story ends differently.

AI coding agents are tools that act. Unlike a code suggestion you can ignore, an agent with infrastructure access will do what it concludes it should do. The question is not whether to trust the AI. The question is whether the blast radius, if something goes wrong, is acceptable.

Scope the credentials. Separate the environments. Confirm before deleting. Back up off-volume.

These are not new rules. They are the same rules you enforce for junior engineers on their first week. Enforce them for AI agents on their first session.

In 9 seconds, an AI agent deleted an entire company's production database. Every backup went with it.

The founder watched it happen in real time. The most recent recoverable state was three months old.

The AI agent later wrote: "I violated every principle I was given."

Here is what actually happened, and what it means for every team running AI agents against real infrastructure.

What went wrong

The failure was not a hallucination. The AI did not malfunction. It followed a logical sequence to its conclusion - and that conclusion destroyed production data.

Three things had to be true for this to happen. All three were.

Three failures - all required, all present

Failure 1

Overprivileged token in the codebase

Failure 2

No staging / production boundary

Failure 3

No confirmation gate on destructive operations

None of these failures were new. Every one of them violates principles teams enforce for human engineers. They just were not enforced for AI agents.

The four attack surfaces

The PocketOS incident is the most visible example of a category of problem every team using AI coding agents is exposed to. The attack surfaces are consistent across tools and stacks.

1. Credentials in AI-readable context

What gets exposed most often:

.env files checked into the project or sitting in the working directory
Hardcoded credentials in config files, scripts, seed data
CI/CD tokens in .github/workflows files
API keys in CLAUDE.md, AGENTS.md, or memory files
Database connection strings in migration files or test fixtures

The fix:

# .gitignore - also add to .claudeignore or equivalent
.env
.env.*
*.pem
*.key
secrets/
.railway/

# AGENTS.md / CLAUDE.md
Never read or use files matching: .env, .env.*, *.key, *.pem, secrets/
If you need a credential for a task, stop and ask. Do not search the codebase for one.

2. No prod / staging boundary

Most teams have separate environments. Few enforce that separation for AI agents.

A rule in a prompt is not a boundary. A token that cannot reach prod is a boundary.

The fix:

Separate API tokens per environment. Staging token: staging access only. No exceptions.
Scope MCP server connections to the environment the agent is working in.
If your infrastructure provider has RBAC, use it. If it does not (Railway does not at the token level), treat the token as all-or-nothing and protect accordingly.
Agent-accessible credentials should be created specifically for AI agent use, scoped to the narrowest set of operations that task requires.

3. Destructive operations without confirmation

Agents operating in agentic mode will execute multi-step plans autonomously. They will write files, run migrations, call APIs, and delete resources - without stopping unless instructed to.

The fix:

# AGENTS.md / CLAUDE.md
Before any destructive or irreversible operation - deleting files, dropping tables,
calling delete/destroy endpoints, resetting environments - stop and confirm with me.
Show me the exact command you intend to run before running it.
This applies even if I asked you to clean something up.

For Claude Code specifically: Cmd+. stops generation mid-stream. Set that reflex before long agentic sessions. For Cursor: review the action before clicking "Apply."

4. PII and secrets in debug context

When something breaks, developers paste logs, stack traces, and database records into the chat to help the agent debug. Those logs often contain what they should not.

The fix: Sanitize before you share. A one-line habit before pasting anything:

# Strip common sensitive patterns before pasting
cat app.log | grep -v "Authorization\|Bearer\|token=\|password\|email\|user_id" | tail -50

Most debug problems do not require real data. Reproduce with anonymized fixtures. If you must share real output, strip it first.

What AI should and should not see

Keep out of AI context

Production credentials and API tokens

Environment files (.env, .env.production)

Private keys and certificates

Prod database connection strings

Raw production logs with user data

Internal auth tokens from CI/CD pipelines

Backup and recovery scripts with prod access

Safe to include

Source code and business logic

Schema definitions (without connection strings)

Anonymized or synthetic test data

Staging-only credentials (scoped, read-heavy)

Architecture docs and API contracts

Sanitized error messages and stack traces

CLAUDE.md with rules, not with secrets

How to structure your project for AI agents

The wall has to come first. The instruction comes second.

Layer 1 Hard

Technical controls - the actual wall

Layer 2 Soft

Prompt and config rules - the reminder

What this looks like in practice

A typical project working with Claude Code in a team with dev, staging, and prod environments:

project/
├── .claudeignore          # tool-specific exclusions (soft control)
├── CLAUDE.md              # soft rules - AI reads and follows these
├── .env.example           # committed, no real values
├── .env                   # local dev only, gitignored
├── .env.staging           # gitignored, never in repo
├── .env.production        # gitignored, never in repo
├── scripts/
│   ├── dev-seed.sh        # safe - dev DB only
│   ├── migrate.sh         # safe - runs against $DATABASE_URL
│   └── prod-rollback.sh   # dangerous - should be in .claudeignore
└── infrastructure/
    ├── dev/               # safe
    └── prod/              # should be in .claudeignore

Each major AI coding tool has its own ignore file format:

Tool	File
Claude Code	`.claudeignore`
Cursor	`.cursorignore`
GitHub Copilot	`.copilotignore`
Windsurf	`.windsurfignore`

The real wall is tool-agnostic and enforced by the operating system or your infrastructure, not by the agent:

chmod 600 .env.production
chown deploy-only-user:deploy-only-user .env.production

# .gitignore
.env
.env.*
!.env.example
*.pem
*.key
secrets/

Your CLAUDE.md is the third layer. It tells the agent what to do when it finds ambiguity - not as a replacement for hard controls, but as the guidance layer that runs on top of them.

# CLAUDE.md
Environment: development only. You have no access to staging or production.
 
Database: use DATABASE_URL from .env only. Never construct connection strings manually.
Never run migrations without explicit confirmation.
 
Destructive operations: stop and confirm before any delete, drop, truncate, or
infrastructure change. Show me the exact command first.
 
If a task requires production access, stop and tell me. Do not attempt to find
credentials or work around the limitation.

The token strategy

In the PocketOS incident, the Railway token was not in a secret store or a protected file. It was in a regular file the agent had access to, with permissions far beyond what the task required.

The correct structure for AI agent credentials:

Create a separate service account for AI agent use. Not your personal developer token. A dedicated token with the minimum permissions that agent needs for its typical tasks.
For database access: read-only on staging data, write access only to dev. Never prod.
For cloud provider tokens: scope to specific operations (e.g., only volume:read, never volume:delete). If your provider does not support operation-level scoping, treat any token as all-or-nothing and keep it out of the repo entirely.
Rotate agent tokens more aggressively than developer tokens. They have a larger attack surface.

The rule set

Minimum viable AI security policy

One token per environment, scoped to that environment A staging token that can reach prod is a prod token. Treat it that way.

Add .env and secrets/ to .claudeignore File system access is the first thing agents use. Exclude what they should not see before the session starts.

Require confirmation before any destructive command Put this in AGENTS.md. Repeat it in the task prompt for anything that touches infrastructure.

Sanitize logs before pasting into chat Strip Authorization headers, tokens, emails, and user IDs. Most bugs reproduce without them.

Create dedicated agent credentials Do not reuse developer tokens for agents. Create narrow-scope service accounts specifically for AI agent use.

Off-volume backups, always The PocketOS backups were inside the volume that was deleted. A backup that lives next to the data it backs up is not a backup.

This is not about distrusting AI

Scope the credentials. Separate the environments. Confirm before deleting. Back up off-volume.

These are not new rules. They are the same rules you enforce for junior engineers on their first week. Enforce them for AI agents on their first session.