Collaborative Diagramming with AI: Adding Visual Feedback to draw.io

code-explorations
How adding a screenshot feedback loop to draw.io’s official Claude Code skill enables true collaborative iteration between human and AI
Published

March 6, 2026

Diagramming with an AI assistant should be straightforward: describe what you want, get a diagram, iterate. The official draw.io MCP repository offers four approaches to this problem, but they all share a blind spot: Claude can generate diagrams it can never see. Adding a screenshot feedback loop turns one of those approaches into a genuinely collaborative workflow.

The official approaches

The jgraph/drawio-mcp repo provides four integration paths:

  1. MCP App Server — Renders diagrams inline in chat as interactive iframes. Designed for Claude.ai and VS Code, not the CLI.
  2. MCP Tool Server — Compresses diagram XML into a draw.io URL and opens it in the browser editor. Supports XML, CSV, and Mermaid input. One-way: Claude sends a diagram out but has no mechanism to read back edits or see the rendered result.
  3. Skill + CLI — A Claude Code skill that generates .drawio XML files directly and opens them in the draw.io desktop app. Optional export to PNG/SVG/PDF via the desktop
    1. No MCP setup required.
  4. Project Instructions — Uses Python code execution in Claude.ai Projects to generate shareable draw.io URLs. Not applicable to CLI workflows.

Approach #3 is the closest to what we wanted — it works in Claude Code, produces real .drawio files, and opens them in the desktop app where both human and AI can work with them. We used the MCP Tool Server (#2) initially but found the same core limitation: Claude generates XML and opens it in the editor, but the workflow is effectively one-way. There is no mechanism for Claude to observe the rendered result or pick up changes the human makes in the GUI.

The missing piece: visual feedback

All four approaches share a fundamental gap. Claude generates diagram XML — explicit x/y coordinates, dimensions, edge waypoints — but is blind to what that XML looks like when rendered. The workflow becomes:

  1. Claude generates a diagram
  2. Human opens the file, looks at it
  3. Human describes what’s wrong in natural language
  4. Claude regenerates, blind to the actual visual result
  5. Repeat

This works for trivial diagrams but breaks down for anything requiring spatial reasoning — node positioning, alignment, edge routing, visual balance. Claude is placing shapes at coordinates it can never see the result of.

We also tried using the draw.io web UI as the shared surface, but it fought the workflow. Every external file modification triggered disruptive modal dialogs about unsaved changes, making the frequent-write iteration loop untenable.

Adding the feedback loop

The official Skill + CLI approach (#3) gets the foundation right: generate .drawio XML directly, open it in the desktop app, no MCP overhead. What it lacks is a way for Claude to close the loop — to see the rendered result and pick up human edits.

Our skill extends that approach with two additions:

Screenshot capture. A small Swift script enumerates draw.io windows via CoreGraphics, and macOS screencapture -l <windowID> grabs the window without bringing it to the foreground. Claude reads the screenshot to assess the visual result.

# Find the draw.io window ID (outputs: windowID|windowTitle)
swift find_drawio_window.swift
# 97345|SearchMinutes Sequence Diagram.drawio

# Capture without foregrounding
screencapture -l 97345 -x /tmp/drawio_screenshot.png

Round-trip XML reading. Before each edit, Claude re-reads the .drawio file from disk. If the human made adjustments in the GUI and saved, Claude picks up those changes rather than overwriting them. draw.io silently reformats XML on save (reordering attributes, changing whitespace), so the re-read is essential — Claude cannot assume its last-written version is current.

Together, the iteration loop becomes:

  1. Claude creates or edits the .drawio XML file directly
  2. The human opens or reloads the file in draw.io desktop
  3. Claude captures a screenshot to assess the visual result
  4. Claude re-reads the XML, makes targeted edits, and the loop repeats

This gives both human and AI shared visibility into the diagram’s actual state. Claude can see whether nodes are properly aligned, whether edge routing makes sense, whether labels are readable.

What this workflow actually looks like

A typical iteration:

  1. “Draw a system architecture diagram with three services and a message queue”
  2. Claude writes the .drawio XML — explicit coordinates for every shape, edge, label
  3. Human opens the file. Claude screenshots and sees the result.
  4. “The queue node overlaps the auth service. Move it down and add a label to the edge.”
  5. Claude reads the current XML (the human may have tweaked something), edits the coordinates, writes the file
  6. Human reloads. Claude screenshots. Checks alignment, adjusts.

The .drawio format works well for this because everything is explicit — unlike Mermaid or PlantUML, where an auto-layout engine decides positioning, draw.io XML specifies exact x/y coordinates, dimensions, and edge waypoints. Claude has full spatial control.

The trade-off

The main inconvenience is the explicit sync step. After Claude edits the file, the human needs to reload in draw.io. The desktop app sometimes shows a yellow “file modified” banner that triggers a reload on click; other times the file needs to be closed and reopened. There’s no File > Revert in draw.io desktop, which is a minor irritant.

Going the other direction — human edits to Claude — requires the human to save in the GUI before Claude reads the file.

These are real friction points, but they’re predictable and manageable. The alternative is friction that’s either invisible (Claude generating blind) or actively hostile (web UI fighting external edits).

Takeaways

The official draw.io integrations solve the generation problem well. What they don’t address is the iteration problem — refining a diagram together, which requires shared visual state. The Skill + CLI approach provides the right foundation (direct XML, desktop app, no MCP overhead), and adding screenshot capture closes the feedback loop.

Screenshot-based feedback is an underappreciated pattern for AI tool integration. Any desktop application becomes a collaborative surface if the AI can capture and read its window. The approach generalizes beyond draw.io to any GUI tool where the AI manipulates underlying files while observing rendered output.

The explicit sync step, while mildly inconvenient, enforces a useful discipline: both parties always work from the same saved state. There’s no drift between what Claude thinks the diagram looks like and what the human sees. That alignment turns out to matter more than seamless automatic sync would.