Skip to content

Add docs character count audit report (DOC-121)#10081

Draft
rosieyohannan wants to merge 5 commits intomainfrom
cursor/DOC-121-docs-character-audit-2b1e
Draft

Add docs character count audit report (DOC-121)#10081
rosieyohannan wants to merge 5 commits intomainfrom
cursor/DOC-121-docs-character-audit-2b1e

Conversation

@rosieyohannan
Copy link
Copy Markdown
Contributor

@rosieyohannan rosieyohannan commented Mar 18, 2026

Description

Added automated character count audit of all .adoc files in the docs directory to identify pages that exceed 50,000 characters and detect missing page-description attributes.

Generated deliverables:

  • docs_character_audit.csv - Complete spreadsheet report (666 files) with character counts and page-description status
  • DOCS_AUDIT_SUMMARY.md - Executive summary highlighting key findings
  • docs_audit_action_plan.md - Comprehensive action plan with splitting strategies and phased implementation
  • CSV_USAGE_GUIDE.md - Guide for filtering and analyzing the CSV in spreadsheet applications
  • analyze_docs.py - Python script (rerunnable)

Reasons

This addresses DOC-121, which requires:

  1. Auditing all documentation pages for character count
  2. Flagging pages over 50,000 characters
  3. Creating a plan to split up mega pages
  4. Flagging pages missing page-description attributes

Key Findings:

  • 9 files exceed the 50,000 character threshold
  • 118 files are missing page-description attributes (important for SEO)
  • 4 files have both issues (highest priority)
  • configuration-reference.adoc is the largest at 106,595 characters (critical priority for splitting)
  • Multiple installation-reference.adoc files across server admin versions (4.3-4.8) all exceed threshold
  • 69 of 118 missing descriptions are in docs/guides directory (batch processing opportunity)

High Priority Files (Both Issues):

  1. values.adoc (v4.9) - 55,728 chars, no description
  2. installation-reference.adoc (v4.5) - 52,735 chars, no description
  3. installation-reference.adoc (v4.4) - 50,125 chars, no description
  4. installation-reference.adoc (v4.3) - 50,101 chars, no description

Action Plan Includes:

  • Detailed splitting strategies for each large file
  • Phased implementation approach (Quick Wins → Critical Splitting → Version Consistency → Remaining)
  • Tooling recommendations (pre-commit hooks, CI checks)
  • Documentation standards proposals

The CSV report can be opened in Excel, Google Sheets, or any spreadsheet application. Use the CSV_USAGE_GUIDE.md for filtering and analysis instructions.

Content checks

N/A - This PR adds audit tooling and reports, not documentation content changes.

Linear Issue: DOC-121

Open in Web Open in Cursor 

- Analyzed 666 .adoc files in docs directory
- Generated CSV report (docs_character_audit.csv) for spreadsheet viewing
- Created summary report (DOCS_AUDIT_SUMMARY.md) highlighting key findings
- Found 9 files exceeding 50,000 character threshold
- configuration-reference.adoc is largest at 106,595 characters

Co-authored-by: Rosie Yohannan <rosieyohannan@users.noreply.github.com>
@linear
Copy link
Copy Markdown

linear bot commented Mar 18, 2026

cursoragent and others added 4 commits March 18, 2026 14:15
- Enhanced analysis to detect missing page-description attributes
- Found 118 files missing this important SEO metadata
- 4 files have both issues (large size + missing description)
- Updated CSV report with missing_page_description column
- Updated summary with detailed findings and recommendations

Co-authored-by: Rosie Yohannan <rosieyohannan@users.noreply.github.com>
- Created detailed splitting strategies for each large file
- Phased implementation recommendations
- Tooling and validation suggestions
- Clear prioritization framework

Co-authored-by: Rosie Yohannan <rosieyohannan@users.noreply.github.com>
- Instructions for opening and filtering the CSV
- Common analysis tasks and queries
- Tips for Excel and Google Sheets users
- Guidance on rerunning the analysis

Co-authored-by: Rosie Yohannan <rosieyohannan@users.noreply.github.com>
- 69 of 118 missing descriptions are in docs/guides directory
- Added distribution breakdown to action plan
- Identified batch processing opportunity

Co-authored-by: Rosie Yohannan <rosieyohannan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants