Skip to content

ROB: Prevent excessive layout mode text output from Type3 fonts#3082

Merged
stefan6419846 merged 5 commits intopy-pdf:mainfrom
shartzog:font-interpretability
Jan 27, 2025
Merged

ROB: Prevent excessive layout mode text output from Type3 fonts#3082
stefan6419846 merged 5 commits intopy-pdf:mainfrom
shartzog:font-interpretability

Conversation

@shartzog
Copy link
Copy Markdown
Contributor

Partially addresses #3081 by checking for a '/ToUnicode' map in Type3 font dictionaries. If no such map is present, check to see if the font is using standard Adobe glyph names. If not, mark the font as 'uninterpretable' and prevent collection of text content from any text operations associated with the font.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.50%. Comparing base (049f71e) to head (eb2f5a4).
Report is 115 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3082      +/-   ##
==========================================
+ Coverage   96.48%   96.50%   +0.01%     
==========================================
  Files          52       52              
  Lines        8795     8807      +12     
  Branches     1608     1612       +4     
==========================================
+ Hits         8486     8499      +13     
+ Misses        184      183       -1     
  Partials      125      125              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stefan6419846
Copy link
Copy Markdown
Collaborator

Thanks for the report and looking into it.

I assume that you own the necessary rights for us to be able to distribute the test file as part of the source code?

@shartzog
Copy link
Copy Markdown
Contributor Author

shartzog commented Jan 27, 2025

Good point. I'm not certain about the copyright details for the engine that created that document. The associated test is currently accessing it via its link in #3081 anyway, so there's no real reason to include it in the resources folder. Is the link in the issue an acceptable long term access option or should I make other arrangements (e.g. in samples)?

@stefan6419846
Copy link
Copy Markdown
Collaborator

Accessing using a link is perfectly fine for now. The alternative for specific files which fulfill CC-BY-SA-4.0 would be the https://github.com/py-pdf/sample-files repository, but we are currently not enforcing anything like this.

@stefan6419846 stefan6419846 merged commit 633d188 into py-pdf:main Jan 27, 2025
stefan6419846 added a commit that referenced this pull request Feb 9, 2025
## What's new

### New Features (ENH)
- Handle attachments in /Kids and provide object-oriented API (#3108) by @stefan6419846

### Bug Fixes (BUG)
- Handle annotations being None on merging (#3111) by @stefan6419846

### Robustness (ROB)
- Prevent excessive layout mode text output from Type3 fonts (#3082) by @shartzog

### Documentation (DOC)
- stefan6419846 becomes BDFL of pypdf (#3078) by @MartinThoma

### Developer Experience (DEV)
- Remove ignoring multiple Ruff rules by @j-t-1
- Remove unused mutmut configuration (#3092) by @stefan6419846

### Testing (TST)
- Fix warning assertions to use `pytest.warns()` (#3083) by @mgorny

[Full Changelog](5.2.0...5.3.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants