Prompt Engineering

PromptAssay: Prompt Engineering Workbench

A purpose-built platform for developing production-grade prompts. AI-powered critique across 6 quality dimensions, git-like version control, A/B testing, and prompt chain orchestration. The methodology layer that powers every other system I build.

Jonathan Lasley

Fractional AI Director

6-Dimension AnalysisVersion ControlA/B TestingPrompt Chains

At a Glance

6 dimensions

AI Critique Framework

Clarity, Completeness, Structure, Technique, Robustness, Efficiency

Git-like

Version Control

Branching, diffing, annotations, full prompt history

5 AI ops

AI-Powered

Critique, Improve, Rewrite, Brainstorm, Compare

The Challenge

Same Tool, Wildly Different Results

Most people use AI the way they used Google in 2005: type something in, hope for the best, try again if it doesn’t work. The same question phrased differently produces wildly different output quality, and most users don’t realize how much capability goes unused.

There’s no institutional knowledge either. When someone discovers a prompt that works well, there’s no system to capture, test, and share it. That insight lives in one person’s chat history until they forget it. Context switching between AI models (Claude, GPT, Gemini) compounds the problem: what works in one model often doesn’t transfer directly to another. Teams end up with inconsistent results depending on which tool they happen to use.

Most teams stay stuck between “AI is impressive in a demo” and “AI reliably produces business-quality output.” The cost: thousands of wasted hours per year across the organization.

The Approach

A Workbench, Not a Library

Decision-makers: the business results are in the next section. This section covers the platform capabilities your team will want to evaluate.

The difference between a team that gets 20% from their AI tools and one that gets 80% is systematic prompt development. PromptAssay is the workbench I built to close that gap: a place to develop, test, version, and refine the prompts that power every AI system. The patterns below are what I teach in workshops and build into client systems.

Technically, it’s a custom web application: Next.js 14, Supabase, Anthropic SDK, 15 database tables, streaming SSE architecture, 20 seeded starter templates.

6-Dimension AI Critique Framework

Every prompt gets evaluated across 6 measurable dimensions:

Clarity & Flow: Is the objective stated upfront? Does information flow logically from general to specific?
Completeness: Are all essential elements present: purpose, context, domain knowledge, role definition, process instructions?
Structure & Coherence: Are instructions grouped logically? Any redundancy or contradiction?
Technique Usage: Does the prompt use effective techniques: XML structuring, chain-of-thought reasoning, few-shot examples, output format specification?
Robustness & Coverage: Are edge cases handled? Does it work across varying inputs?
Efficiency: Is every token earning its place? Any wasted redundancy?

Each dimension gets a score. The aggregate tells you whether a prompt is production-ready or needs work, and the per-dimension breakdown tells you exactly where.

5 AI-Powered Operations

Critique: Full 6-dimension analysis with radar chart visualization, strengths, and specific improvement recommendations.
Improve: 3–10 surgical suggestions with exact before/after text, categorized by issue type (buried objective, missing element, coverage gap, technique opportunity). One-click apply with fuzzy matching.
Rewrite: Complete prompt regeneration optimizing for structural principles. Shows specific changes made and techniques applied.
Brainstorm: Conversational co-pilot for iterating on prompt ideas. Full conversation history with Claude, including the ability to suggest one-click edits.
Compare: Side-by-side evaluation of any two prompt versions across 6 axes, with maturity analysis showing how specific components evolved.

Git-Like Version Control

Every edit creates a new version. Full history timeline. Side-by-side diff viewer with highlighted changes. Restore to any prior version (creates a new version, never rewrites history). Branch a new prompt from any specific version. Change source tracking records whether each version came from manual editing, AI critique, AI improve, AI rewrite, or branching. Version annotations let you document why a change was made.

A/B Testing & Evaluation Suites

A/B Testing: Run the same input through multiple prompt versions simultaneously. Side-by-side output comparison with latency, cost, and token counts.

Evaluation Suites: Named test suites with repeatable test cases. Each test case has a name, input, and pass criteria (keywords that must appear in the output). Run all tests at once, see pass/fail results across the suite.

Prompt Chains

Multi-step workflows where each step’s output feeds the next. Each step links to an existing prompt in the library. Input modes: use previous step’s output, fixed input, or variable from the chain’s initial input. Examples: Research → Summarize → Format. Extract → Validate → Transform.

Fragments & Multi-Format Import/Export

Reusable, parameterized prompt building blocks (fragments) that assemble into sophisticated architectures. Update a fragment once and every prompt that uses it improves. Multi-format import/export supports Anthropic, OpenAI, markdown, and PromptAssay bundle formats with full version history.

These capabilities aren’t independent features. They form a development loop: write a prompt, critique it across 6 dimensions, apply targeted improvements, version the result, test it against real inputs, then compose it into chains for production workflows. Each cycle produces a measurably better prompt with a full audit trail. Here’s how the pieces connect:

PromptAssay system architecture showing 6-dimension critique framework, version control, A/B testing, prompt chains, and fragment management

The Results

From Ad-Hoc to Engineered

The 6-dimension critique framework turns subjective quality judgments into measurable scores. Instead of “that looks good,” you know that a prompt scores 8/10 on Clarity but 4/10 on Robustness, and you know exactly which edge cases aren’t covered.

Reusability eliminates the tinkering cycle

Instead of spending 10 minutes crafting a prompt for every task, teams pull a tested template and customize the inputs. That adds up when your team runs 20–30 AI-assisted tasks per day.

Cross-model consistency

Prompts are optimized per model (Claude, GPT, Gemini). Model migrations don’t reset prompt quality to zero. Switching from one model to another takes minutes, not weeks of re-testing.

Foundation layer for every other system

This system underpins every other implementation I’ve built. The AI Win Strategy System’s 5-phase prompt architecture, the AI Content Operations System’s brand voice profiles, the AI subsystems in UpSkalr: all were developed and refined using PromptAssay’s critique, testing, and versioning tools.

Comparison of ad-hoc prompting vs PromptAssay-engineered prompts across quality dimensions, showing measurable improvement in Clarity, Completeness, Structure, Technique, Robustness, and Efficiency

For Your Business

Your Team Is Using 20% of Their AI Tools

Most companies buy AI tools and capture maybe 20% of the capability, because nobody taught the team how to write effective prompts. The tool works. The prompts don’t.

A prompt library is an organizational asset, not a personal skill. It makes AI performance consistent across the entire team, not dependent on whoever happens to be best at talking to ChatGPT. When every person on a 50-person team reclaims even 30 minutes a day through better prompts, the productivity gain compounds fast.

Most teams can see measurable improvement within the first week of adopting a structured prompt approach. The full workbench takes time to build, but the foundational patterns deliver immediate value. I teach these systems in hands-on AI workshops and build them as part of AI consulting engagements.

Key Takeaways

What Makes This Work

A workbench, not a library

The difference between storing prompts and engineering them. Testing, versioning, scoring, and deployment turn a collection of saved text into an organizational capability.

6-dimension analysis turns quality into a number

Subjective “that looks good” becomes measurable scores across Clarity, Completeness, Structure, Technique, Robustness, and Efficiency. You can track improvement over time.

Prompt chains are the production pattern

Single prompts handle single tasks. Real business workflows require multi-step orchestration where each phase builds on the previous one’s output.

Fragments make prompts composable

Reusable, parameterized building blocks that assemble into sophisticated architectures. Update a fragment once, every prompt that uses it improves.

Your Team Using 20% of Their AI Tools?

I build prompt engineering systems and run hands-on workshops that teach your team systematic prompt development with tested frameworks. They leave with a working prompt library built around their actual workflows. The ROI shows up in the first week.

Take the Free Assessment