Add docs character count audit report (DOC-121)#10081
Draft
rosieyohannan wants to merge 5 commits intomainfrom
Draft
Add docs character count audit report (DOC-121)#10081rosieyohannan wants to merge 5 commits intomainfrom
rosieyohannan wants to merge 5 commits intomainfrom
Conversation
- Analyzed 666 .adoc files in docs directory - Generated CSV report (docs_character_audit.csv) for spreadsheet viewing - Created summary report (DOCS_AUDIT_SUMMARY.md) highlighting key findings - Found 9 files exceeding 50,000 character threshold - configuration-reference.adoc is largest at 106,595 characters Co-authored-by: Rosie Yohannan <rosieyohannan@users.noreply.github.com>
- Enhanced analysis to detect missing page-description attributes - Found 118 files missing this important SEO metadata - 4 files have both issues (large size + missing description) - Updated CSV report with missing_page_description column - Updated summary with detailed findings and recommendations Co-authored-by: Rosie Yohannan <rosieyohannan@users.noreply.github.com>
- Created detailed splitting strategies for each large file - Phased implementation recommendations - Tooling and validation suggestions - Clear prioritization framework Co-authored-by: Rosie Yohannan <rosieyohannan@users.noreply.github.com>
- Instructions for opening and filtering the CSV - Common analysis tasks and queries - Tips for Excel and Google Sheets users - Guidance on rerunning the analysis Co-authored-by: Rosie Yohannan <rosieyohannan@users.noreply.github.com>
- 69 of 118 missing descriptions are in docs/guides directory - Added distribution breakdown to action plan - Identified batch processing opportunity Co-authored-by: Rosie Yohannan <rosieyohannan@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Added automated character count audit of all .adoc files in the docs directory to identify pages that exceed 50,000 characters and detect missing page-description attributes.
Generated deliverables:
docs_character_audit.csv- Complete spreadsheet report (666 files) with character counts and page-description statusDOCS_AUDIT_SUMMARY.md- Executive summary highlighting key findingsdocs_audit_action_plan.md- Comprehensive action plan with splitting strategies and phased implementationCSV_USAGE_GUIDE.md- Guide for filtering and analyzing the CSV in spreadsheet applicationsanalyze_docs.py- Python script (rerunnable)Reasons
This addresses DOC-121, which requires:
Key Findings:
configuration-reference.adocis the largest at 106,595 characters (critical priority for splitting)installation-reference.adocfiles across server admin versions (4.3-4.8) all exceed thresholddocs/guidesdirectory (batch processing opportunity)High Priority Files (Both Issues):
Action Plan Includes:
The CSV report can be opened in Excel, Google Sheets, or any spreadsheet application. Use the CSV_USAGE_GUIDE.md for filtering and analysis instructions.
Content checks
N/A - This PR adds audit tooling and reports, not documentation content changes.
Linear Issue: DOC-121