LLM Persona Quality Assessment Framework (LPQAF)
Framework Overview
The LLM Persona Quality Assessment Framework is a comprehensive 100-point evaluation system designed to objectively assess and compare LLM personas for publication quality. The framework evaluates personas across six critical dimensions that directly impact user experience and prompt effectiveness.
Evaluation Dimensions & Scoring
1. Functional Completeness (25 points)
Assesses how well the persona addresses all necessary functional requirements
Scoring Criteria:
- Core Identity Definition (8 points)
- 8: Crystal clear role, expertise level, and primary function
- 6: Clear role with minor ambiguities
- 4: Somewhat clear role with notable gaps
- 2: Vague or incomplete role definition
-
0: No clear role definition
-
Skill Coverage (8 points)
- 8: Comprehensive skill set matching domain requirements
- 6: Good skill coverage with 1-2 minor gaps
- 4: Adequate skills with some notable omissions
- 2: Limited skills, significant gaps
-
0: Insufficient or irrelevant skills
-
Operational Instructions (9 points)
- 9: Detailed, actionable instructions for all scenarios
- 7: Good instructions with minor gaps
- 5: Basic instructions, some unclear areas
- 3: Minimal instructions, many gaps
- 1: Vague or confusing instructions
- 0: No operational guidance
2. Technical Quality (20 points)
Evaluates the technical sophistication and prompt engineering quality
Scoring Criteria:
- Prompt Structure (7 points)
- 7: Optimal structure using advanced techniques (CoT, templates, etc.)
- 5: Good structure with some advanced elements
- 3: Basic structure, functionally sound
- 1: Poor structure, some technical issues
-
0: Fundamentally flawed structure
-
Constraint Management (6 points)
- 6: Excellent constraint definition and boundary setting
- 4: Good constraints with minor issues
- 2: Basic constraints, some ambiguity
-
0: Poor or missing constraints
-
Output Control (7 points)
- 7: Precise output formatting and quality controls
- 5: Good output guidance with minor gaps
- 3: Basic output instructions
- 1: Minimal output control
- 0: No output guidance
3. Domain Expertise (20 points)
Measures the depth and accuracy of domain-specific knowledge
Scoring Criteria:
- Knowledge Depth (10 points)
- 10: Expert-level knowledge with nuanced understanding
- 8: Strong knowledge with good depth
- 6: Solid foundational knowledge
- 4: Basic knowledge with some gaps
- 2: Superficial knowledge
-
0: Inaccurate or missing domain knowledge
-
Industry Accuracy (5 points)
- 5: Perfectly accurate terminology and concepts
- 4: Minor inaccuracies, mostly correct
- 3: Some notable errors but generally sound
- 2: Multiple errors affecting credibility
- 1: Significant inaccuracies
-
0: Fundamentally incorrect information
-
Contemporary Relevance (5 points)
- 5: Current best practices and recent developments
- 4: Mostly current with minor outdated elements
- 3: Mix of current and outdated information
- 2: Somewhat outdated approach
- 1: Significantly outdated
- 0: Completely obsolete information
4. Usability & Clarity (15 points)
Assesses how easy the persona is to understand and implement
Scoring Criteria:
- Instruction Clarity (6 points)
- 6: Crystal clear, unambiguous instructions
- 4: Generally clear with minor ambiguities
- 2: Some confusion or unclear sections
-
0: Confusing or contradictory instructions
-
User Guidance (5 points)
- 5: Excellent guidance for implementation and usage
- 3: Adequate guidance with some gaps
- 1: Minimal user guidance
-
0: No implementation guidance
-
Accessibility (4 points)
- 4: Accessible to intended user skill level
- 3: Mostly accessible with minor barriers
- 2: Some accessibility challenges
- 1: Difficult for target users
- 0: Inaccessible to intended audience
5. Practical Effectiveness (15 points)
Evaluates real-world performance and utility
Scoring Criteria:
- Use Case Coverage (6 points)
- 6: Comprehensive coverage of relevant scenarios
- 4: Good coverage with minor gaps
- 2: Limited scenario coverage
-
0: Poor scenario coverage
-
Scalability (5 points)
- 5: Works well across different contexts and complexity levels
- 3: Good scalability with some limitations
- 1: Limited scalability
-
0: Poor scalability
-
Robustness (4 points)
- 4: Handles edge cases and variations well
- 3: Generally robust with minor vulnerabilities
- 2: Some robustness issues
- 1: Multiple failure modes
- 0: Fragile, breaks easily
6. Innovation & Differentiation (5 points)
Measures unique value and innovative approaches
Scoring Criteria:
- Unique Value Proposition (3 points)
- 3: Significant unique advantages over alternatives
- 2: Some differentiation
- 1: Minor unique elements
-
0: No meaningful differentiation
-
Technical Innovation (2 points)
- 2: Uses cutting-edge or novel prompting techniques
- 1: Some innovative elements
- 0: Standard approach, no innovation
Scoring Interpretation
Score Ranges:
- 90-100: Exceptional quality, ready for premium publication
- 80-89: High quality, suitable for publication with minor refinements
- 70-79: Good quality, needs moderate improvements
- 60-69: Fair quality, significant improvements needed
- 50-59: Poor quality, major revision required
- Below 50: Unsuitable for publication
Publication Decision Matrix:
- Difference ≥ 15 points: Clear winner, publish higher scoring persona
- Difference 10-14 points: Strong preference for higher scoring persona
- Difference 5-9 points: Moderate preference, consider hybrid approach
- Difference < 5 points: Nearly equivalent, use secondary criteria (audience preference, brand alignment, etc.)
Implementation Guidelines
Evaluation Process:
- Independent Scoring: 2-3 evaluators score each persona independently
- Score Averaging: Calculate mean scores for each dimension
- Discrepancy Resolution: Discuss and resolve scores differing by >2 points
- Final Calculation: Sum all dimensional scores for total score
Quality Assurance:
- Use standardized evaluation rubrics for consistency
- Include domain expert in evaluation panel
- Test personas with sample prompts when possible
- Document rationale for all scores ≥7 or ≤3
Continuous Calibration:
- Review scoring accuracy against user feedback post-publication
- Adjust rubrics based on performance data
- Update framework annually based on evolving best practices
Example Application
Sample Evaluation Summary:
Persona A - Financial Guru: 87/100
- Functional Completeness: 23/25
- Technical Quality: 17/20
- Domain Expertise: 19/20
- Usability & Clarity: 13/15
- Practical Effectiveness: 12/15
- Innovation: 3/5
Persona B - Financial Guru: 74/100 - Functional Completeness: 20/25 - Technical Quality: 14/20 - Domain Expertise: 16/20 - Usability & Clarity: 11/15 - Practical Effectiveness: 10/15 - Innovation: 3/5
Decision: Publish Persona A (13-point advantage indicates strong preference)