Building Agent Bridge for Photoshop: A Paternity Leave Side Quest
I’m currently on paternity leave, which is one of the great benefits I get as an Adobe employee. The plan was simple: disconnect, be present, and enjoy every moment with our new baby. And I’ve done that. Mostly.
The problem is that design, coding, and AI are moving at a blistering pace right now, and I find it hard to fully disconnect. New models come out every week. The creative tools landscape is changing fast. Even with the best intentions, it’s hard to resist.
So here I am, writing about the 48-hour side project I built between tummy time sessions and infant music classes.
The OpenClaw Gateway Drug
It started with OpenClaw.
I’ve followed Peter Steinberger since his days building PSPDFKit in the iOS community. When he started sharing his vision for a personal AI agent you could talk to from anywhere — WhatsApp, Telegram, iMessage, your terminal — I was hooked early.
One of the great underappreciated benefits of OpenClaw is the ability to talk to AI agents on your phone with one hand while propping up a napping or nursing baby. In that sense, it came at the absolute perfect time for me. New parenthood is a lot of sitting still and being present with one arm pinned. OpenClaw turned that constraint into an opportunity. I could explore ideas, kick off coding tasks, and iterate on designs — all one-handed on my phone, while my son dozed on my chest.
Peter’s philosophy on agentic engineering resonated deeply with me. In his post “Just Talk To It”, he argues that the best agent tooling is built on simple, transparent primitives. CLIs over MCPs. Help menus over elaborate prompt scaffolding. As he puts it: “Almost all MCPs really should be CLIs.” The reasoning is compelling — models already understand CLI conventions natively. A well-structured help menu teaches the model everything it needs to know instantly, whereas MCPs impose persistent token costs just to maintain context. The skills pattern that Peter championed and proved out in OpenClaw — where a SKILL.md file gives agents the context they need to wield a CLI effectively — became a cornerstone of how I thought about building Agent Bridge.
What Is Agent Bridge for Photoshop?
Agent Bridge for Photoshop is an open-source automation framework that enables AI agents, such as Claude Code, Codex, and other LLM-powered tools, to control Adobe Photoshop through a standardized command interface.
To put it simply, it’s glue. It combines Adobe’s UXP (Unified Extensibility Platform) features in Photoshop with the CLI and the Skills pattern Peter promoted and demonstrated in OpenClaw. This means you can tell your coding agent to “create a PSD for an ecommerce product page” and watch Photoshop come alive, creating documents, adding layers, setting text, applying effects, and exporting renders, all controlled by natural language through a clear operation pipeline.
Before this project, I had never worked with Adobe’s UXP platform. Even though I’m an Adobe employee, I don’t work on Photoshop. I’m an engineering lead for a creative app for kids called Project Aqua. I wanted to see what I could put together with some quick coding using only publicly available tools. No internal APIs, no special access. Just the same UXP docs, Creative Cloud marketplace, and Photoshop that any developer or designer can use.
Architecture: Three Layers of Glue
The architecture is straightforward. Three components work together to bridge the gap between an AI agent’s text-based world and Photoshop’s visual canvas:
1. The psagent CLI
The command-line interface is the front door. Built with TypeScript and Commander.js, it exposes a structured command tree that any agent can discover and use:
psagent session start— Initialize a working sessionpsagent doc open <path>— Open a PSD filepsagent doc manifest— Get the document structure as JSONpsagent layer list— Query layers with optional regex filteringpsagent op apply <envelope.json>— Execute a batch of operationspsagent render— Export to PNG or JPGpsagent checkpoint create/restore— Save and rollback document statepsagent capabilities— Discover what the adapter supports
Following Peter’s philosophy, the CLI is designed so an agent seeing it for the first time can run psagent --help and quickly understand how to use it. No complex prompt engineering needed.
2. The Bridge Daemon
The daemon runs locally on port 43120 and acts as a relay between the CLI and Photoshop. It exposes a simple HTTP-based RPC interface:
- The CLI sends commands to
/rpc - The UXP plugin registers via
/bridge/registerand long-polls/bridge/pollfor queued operations - Results flow back through
/bridge/result
The daemon checks client health with a 20-second timeout and keeps the last 500 events in memory for debugging.
3. The UXP Plugin
This is where the magic happens — the Photoshop panel that receives operation envelopes and translates them into actual Photoshop API calls. It supports over 100 operations across categories:
Documents: Create, open, duplicate, save, resize, crop, rotate, flatten, merge, and change color modes.
Layers: Create pixel layers, groups, text layers, shape layers, and adjustment layers. Move, rename, duplicate, delete, rearrange, merge, rasterize. Set opacity, blend modes, and visibility.
Text: Create text layers, set content, apply styles (font, size, color, alignment, tracking, leading), warp text, and set text on a path.
Smart Objects: Convert layers, replace content, relink sources.
Transforms: Translate, scale, rotate, flip, skew, align, distribute.
Masks & Selection: Create and apply layer masks, clipping masks. Select subjects, color ranges. Expand, contract, feather selections.
Filters & Effects: Gaussian blur, noise, sharpen, motion blur, content-aware fill, and scale. Apply layer effects (drop shadows, strokes, glows).
Export: PNG, JPG, per-layer export, artboard export.
The Operation Envelope
The core abstraction is the operation envelope — a JSON payload that describes a transaction:
{
"transactionId": "hero-banner-update",
"doc": { "ref": "active" },
"ops": [
{ "op": "setText", "layerName": "headline", "text": "Summer Sale" },
{ "op": "replaceSmartObject", "layerName": "hero-image", "file": "./assets/beach.jpg" },
{ "op": "export", "format": "png", "output": "./renders/hero.png" }
],
"safety": {
"checkpoint": true,
"rollbackOnError": true
}
}
Operations inside an envelope run one after another, with a system that lets later steps refer to layers created earlier. Each operation can set its own error policy — either abort or continue — and the envelope supports a dry-run mode to check things without making changes.
This transactional model is essential. Agents can describe complex multi-step workflows in one clear, auditable payload. If something goes wrong, the checkpoint and rollback features act as a safety net.
MCP Server: Speaking the Agent’s Language
While the CLI is the primary interface following Peter’s philosophy, Agent Bridge also ships an MCP (Model Context Protocol) server for agents that work natively with MCP. Running psagent mcp-serve starts a stdio-based server that exposes eight tools:
photoshop_capabilities— Discover what’s availablephotoshop_open_document— Open a filephotoshop_get_manifest— Get document structurephotoshop_query_layers— Search layersphotoshop_apply_ops— Execute operationsphotoshop_render— Exportphotoshop_checkpoint_restore— Undo to checkpointphotoshop_events_tail— Debug recent events
Both interfaces — CLI and MCP — use the same underlying adapter, so behavior is identical regardless of how the agent connects.
Safety First
Working with creative files requires care. One wrong operation could ruin hours of design work. Agent Bridge includes several safety layers:
- Dry-run mode: Validate an entire operation envelope against the schema without touching the document. Catch errors before they happen.
- Checkpoints: Snapshot the document state before risky operations. The system uses Photoshop’s history snapshots as the primary mechanism, with the history state pointer as a fallback.
- Rollback on error: Automatically restore from a checkpoint if any operation fails.
- Per-operation error policies: Each operation in an envelope can independently specify whether to abort the whole transaction or continue past failures.
- Modal safety wrapper: The UXP plugin retries operations up to five times when Photoshop is busy, handling conflicts that arise from concurrent workflows.
The Documentation Story: Mintlify and SKILL.md
Building an agent-facing tool taught me something I already believed but now feel deeply: documentation isn’t just important for humans — it’s critical for agents.
I chose Mintlify for the Agent Bridge documentation site. Mintlify takes an AI-forward approach to documentation that aligns perfectly with agent tooling. Most notably, Mintlify automatically generates a skill.md file from your documentation — a condensed, agent-optimized summary of your product’s capabilities, patterns, and constraints. This file lives at a well-known path (/.well-known/skills/default/skill.md) and stays up to date with every documentation deploy.
The practical upshot is that anyone can install the Agent Bridge skill by pointing directly at the docs domain:
npx skills add https://agent-bridge-for-photoshop.jaredverdi.com
That’s it. The agent receives a comprehensive skill file that teaches it how to use Agent Bridge effectively — what commands to run, which patterns to follow, and which pitfalls to avoid. The link between human-readable docs and agent-readable context happens automatically.
This kind of setup is what makes agent tooling actually work in practice. The best CLI in the world is useless if the agent doesn’t know it exists or how to wield it properly.
Built with Codex, Sharpened by Claude
I want to be open about how this was built. The whole project was vibe-coded with Codex 5.3 Extra High. Every part — the TypeScript CLI, the bridge daemon, the UXP plugin, the MCP server, the tests, even the documentation setup — was created through conversations with Codex. The 20 or so hours of real work were spread over about a week of paternity leave, squeezed between feeds, naps, and the beautiful chaos of a newborn, along with some long-running, unattended Codex overnight sessions.
But building the tool was only half the story. The other half was dogfooding it — and that’s where Claude Code came in.
Once the initial implementation was functional, I switched to Claude Code as a consumer of Agent Bridge to see how a different agent would actually use it. This turned out to be one of the most valuable parts of the whole process. Claude Code immediately started doing things I hadn’t anticipated. It would try operation names that seemed logical but didn’t exist in the catalog. It would structure envelopes in ways that were almost right but didn’t match the schema. It would attempt workflows that made perfect sense from an agent’s perspective, but that I hadn’t thought to support.
The interesting part was watching Claude Code’s innate problem-solving behavior. When an operation failed or produced unexpected results, it wouldn’t just give up — it would create a test PSD, try the operation, export a layer to an image, pull that image back in to visually inspect what happened, and then try again with the new information. This observe-attempt-verify loop was completely unprompted. The agent was developing its own understanding of Photoshop’s behavior through experimentation.
This created a tight feedback cycle. I’d watch Claude Code struggle with something, switch back to Codex to fix the underlying issue, and then return to Claude Code to verify the improvement. The fixes fell into a few categories:
Promoting innate behaviors to first-class support. When Claude Code kept trying an operation name that didn’t exist — but should have — I’d add it. If the agent’s instinct was to call something createTextLayer instead of a longer name, that’s a sign worth paying attention to. The agent’s natural-language sense of what an operation should be called is a better guide to API design than any spec document.
Adding examples and counterexamples to SKILL.md. When Claude Code kept trying a pattern that didn’t work — like setting text color as a property on setText instead of using a separate setTextStyle operation — I added a clear example of the right way to the skill file, plus a counterexample showing the common mistake and why it fails. This kind of documentation acts as guardrails, shaped by real agent behavior instead of guesswork.
Reducing token churn through better defaults. Every failed attempt costs tokens. Every retry uses up the context window. When I restructured operations to accept the format Claude Code naturally used — or to give clearer error messages that helped the agent fix issues on the first try — the overall token efficiency improved significantly. Catering to the agent’s natural behaviors isn’t just good UX; it also saves money when you pay per token.
The lesson was clear: you don’t really know how your agent tool works until you watch another agent try to use it.
Demo Video
Prefer to watch on YouTube? Open the demo video.
Getting Started
If you want to try it:
- Install the skill:
npx skills add https://agent-bridge-for-photoshop.jaredverdi.com - Install the CLI:
npm install -g @jverdi/agent-bridge-for-photoshop - Start the daemon:
psagent bridge daemon - Install the UXP plugin from the Creative Cloud marketplace
- Open Photoshop, open the Agent Bridge panel, and connect
Then tell your agent to create something. Watch it happen.
The whole project is free and open source on GitHub under the MIT license. Contributions and ideas are welcome.
What’s Next
Honestly, more baby time. This was a sprint born from curiosity and the unique conditions of paternity leave — the mix of unstructured time and a strong urge to keep an eye on a fast-changing field.
The project is functional and useful today, but there’s a long tail of improvements: richer text styling primitives, better artboard workflows, tighter MCP integration, and the inevitable stream of Photoshop API updates to keep pace with. I’ll chip away at it.
But for now, the laptop closes and the baby gets held. That’s the whole point of leave, after all.
Agent Bridge for Photoshop is open source and available at github.com/jverdi/agent-bridge-for-photoshop. Documentation is at agent-bridge-for-photoshop.jaredverdi.com. It was built with far too little sleep and an unreasonable amount of enthusiasm.