Prompt evolution example

This page shows the journey of refining a release notes categorization prompt from simple to production-ready.

Journey: Simple to refined

Version 1: Initial (simple)

First attempt after documenting manual process

Approach: Basic categories with minimal guidance

Categorize GitHub commits as:
- New Features: New capabilities
- Enhancements: Improvements
- Bug Fixes: Corrections
- Documentation: Docs updates

Commits:
{COMMITS}

Results:

Basic categorization working
35% of commits miscategorized
Internal changes included
Feature versus Enhancement boundary unclear
No exclusion logic

Accuracy: 65% - Not usable

Version 2: Add definitions

After reviewing first test results

Changes: Added clearer category definitions

Categorize GitHub commits as:

Categories:
- New Features: Wholly new capabilities that didn't exist before
- Enhancements: Improvements to existing features  
- Bug Fixes: Corrections to existing functionality
- Documentation: Content updates, typo fixes, documentation improvements

Exclusions:
- Internal tooling changes
- Work-in-progress commits
- Merge commits

Commits:
{COMMITS}

Results:

Better feature or enhancement distinction
Some internal changes filtered
Many edge case confusions
"Wholly new" versus "existing" ambiguous
Exclusions too vague

Accuracy: 72% - Improved but not production-ready

Version 3: Add examples

After seeing specific patterns of confusion

Changes: Added concrete examples of each category

Categorize GitHub commits for release notes.

Categories with examples:

**New Features** - Wholly new capabilities
Examples: "Add user authentication", "Create dashboard view"
Not examples: "Improve search" (enhancement)

**Enhancements** - Improvements to existing
Examples: "Improve search performance", "Update API format"
Not examples: "Add search feature" (new feature)

**Bug Fixes** - Corrections only
Examples: "Fix memory leak", "Resolve timeout"
Not examples: "Improve performance" (enhancement)

**Documentation** - Content updates
Examples: "Update API docs", "Fix typo in README"

Exclusions (do not include):
- Commits containing: "WIP", "temp", "test only"
- Commits starting with: "Merge pull request", "chore:"
- Internal tooling changes

Commits:
{COMMITS}

Format:
## New Features
- [commit message]

## Enhancements
- [commit message]

## Bug Fixes
- [commit message]

## Documentation
- [commit message]

Results:

Feature versus Enhancement much clearer
Better exclusion of WIP commits
More consistent categorization
Some internal changes leaking
Test-related commits sometimes included

Accuracy: 83% - Getting close

Version 4: Refined production

After several test runs on different repositories

Changes: Added keyword indicators, comprehensive exclusions, and format specifications

You are helping categorize GitHub commits for release notes.

Review the following commits and categorize each one according to these standards:

## Categories

### New Features
**Definition:** Wholly new capabilities that didn't exist before

**Examples:**
- "Add user authentication system" (Correct)
- "Create new dashboard view" (Correct)
- "Introduce webhook support" (Correct)

**NOT Examples:**
- "Improve existing search" (This is an enhancement)
- "Update authentication flow" (Enhancement to existing)

**Keywords:** "add", "new", "create", "introduce", "implement"
(Only if describing wholly new functionality)

### Enhancements
**Definition:** Improvements to existing features or functionality

**Examples:**
- "Improve search performance by 50%" (Correct)
- "Update API response format" (Correct)
- "Optimize database queries" (Correct)

**NOT Examples:**
- "Add search feature" (New feature)
- "Fix search bug" (Bug fix)

**Keywords:** "improve", "update", "enhance", "optimize", "increase", "better", "refactor"

### Bug Fixes
**Definition:** Corrections to existing functionality that wasn't working as intended

**Examples:**
- "Fix memory leak in parser" (Correct)
- "Resolve login timeout issue" (Correct)
- "Correct validation logic error" (Correct)

**NOT Examples:**
- "Improve validation performance" (Enhancement)
- "Add validation to new field" (Part of new feature)

**Keywords:** "fix", "bug", "resolve", "issue", "correct", "repair", "hotfix", "patch"

### Documentation
**Definition:** Updates to documentation, guides, or API docs

**Examples:**
- "Update API documentation" (Correct)
- "Fix typo in README" (Correct)
- "Add usage examples to guide" (Correct)

**Keywords:** "docs", "documentation", "readme", "guide", "typo" (in docs)

## Exclusions

**Do NOT include commits that:**

**By keywords:**
- Contain: "WIP", "wip", "work in progress", "temp", "temporary"
- Contain: "test only", "testing", "test coverage" (unless adding new test features for users)
- Contain: "internal", "internal only", "for team", "do not publish"

**By commit prefix:**
- Start with: "Merge pull request", "Merge branch"
- Start with: "chore:", "ci:", "build:", "test:", "style:"
- Start with: "Revert", "Bump version", "Update dependencies"

**By file paths:**
- Only change files in: /tests/, /test/, /__tests__/
- Only change files in: /internal/, /scripts/, /.github/, /.ci/
- Only change: package-lock.json, Gemfile.lock, yarn.lock, poetry.lock

**By author:**
- Commits by bots: dependabot[bot], renovate[bot], github-actions[bot]

**By change type:**
- Only formatting changes (prettier, linting, whitespace)
- Only comment updates (unless in documentation files)

## Decision Rules

When keywords conflict (e.g., "add improvement to existing feature"):

1. **Is this a completely new capability users couldn't do before?** → New Feature
2. **Does this improve/enhance something that already exists?** → Enhancement  
3. **Does this correct unintended behavior or bugs?** → Bug Fix
4. **Is this only documentation?** → Documentation

When unsure, default to Enhancement rather than New Feature.

## Output Format

Format your response exactly as:

## New Features
- [commit message] - [Brief explanation if commit message is unclear]

## Enhancements
- [commit message] - [Brief explanation if commit message is unclear]

## Bug Fixes
- [commit message] - [Brief explanation if commit message is unclear]

## Documentation
- [commit message] - [Brief explanation if commit message is unclear]

**Important:**
- Omit any commits that match exclusion criteria
- If a section has no commits, omit that section entirely
- Preserve the commit message but clarify if needed
- Add brief context only if the commit message is vague

## Commits to Categorize

{COMMITS}

Results:

Consistent categorization across different repositories
Effective filtering of internal changes
Clear decision rules for edge cases
Appropriate format and context
Minimal manual cleanup needed

Accuracy: 91% - Production ready

What changed and why

Add examples (v1 to v3)

Problem: "Wholly new" versus "improvements" was ambiguous

Solution: Concrete examples with both positive and negative cases

Impact: +18 percentage points in accuracy

Add keywords (v3 to v4)

Problem: AI struggling with commits using unexpected wording

Solution: Explicit keyword indicators for each category

Impact: +8 percentage points in accuracy

Refine exclusions (v2 to v4)

Problem: Internal changes appearing in release notes

Solution: Comprehensive exclusion rules by keyword, prefix, path, and author

Impact: Reduced false positives by 85%

Add decision rules (v4)

Problem: Edge cases with conflicting signals

Solution: Priority-based decision tree

Impact: More consistent categorization of ambiguous commits

Metrics summary

Version	Accuracy	False Positives	False Negatives	Review Time
v1	65%	25%	10%	45 min
v2	72%	20%	8%	35 min
v3	83%	12%	5%	20 min
v4	91%	4%	5%	12 min

Time savings:

Manual process: 90 minutes
v4 automation + review: 12.5 minutes
Saved: 77.5 minutes (86% reduction)

Lessons learned

What worked

Concrete examples were more effective than abstract definitions
Negative examples ("NOT this") clarified boundaries
Keyword lists helped with pattern matching
Comprehensive exclusions better than general guidelines
Decision trees resolved conflicts

What did not work

Overly complex prompts (more than 2000 words) - diminishing returns
Too many categories (7+ categories became confusing)
Vague language like "generally" or "usually"
Assuming context the AI could not have
Perfection seeking (90% is often better ROI than 98%)

Iteration strategy

Test with same dataset to compare versions fairly
Track metrics systematically (do not rely on gut feel)
Focus on patterns not individual failures
Stop iterating when improvements are less than 3 percentage points
Version control prompts to enable rollback

Adapt this for your workflow

Start with version 3

Start with v3 structure and customize:

Update category definitions to match your standards
Add examples from your actual commits
Customize exclusions for your repository patterns
Test and measure
Iterate based on your specific issues

Repository-specific variations

Different repositories may need different prompts:

Frontend repository: - More emphasis on UI or UX improvements - Exclude styling-only changes or component refactors

Backend API repository: - Separate category for breaking changes - Emphasize endpoint additions or performance - Exclude database migration details

Documentation repository: - Simpler categorization (Content versus Structure versus Fixes) - All commits are user-facing - Focus on impact to readers

Audience-specific variations

Internal release notes: - Include infrastructure improvements - Use technical language - Keep some excluded items

External release notes: - Stricter exclusions - User-benefit focused - Customer-friendly language

Try it yourself

Use this prompt as your starting point:

Copy Version 4 to prompts/categorization_prompt.txt
Customize examples for your domain
Update exclusions for your repository
Test and measure accuracy
Iterate based on results

Go to tutorial