this post was submitted on 06 Nov 2024
25 points (93.1% liked)

No Stupid Questions

2315 readers
2 users here now

There is no such thing as a Stupid Question!

Don't be embarrassed of your curiosity; everyone has questions that they may feel uncomfortable asking certain people, so this place gives you a nice area not to be judged about asking it. Everyone here is willing to help.


Reminder that the rules for lemmy.ca still apply!


Thanks for reading all of this, even if you didn't read all of this, and your eye started somewhere else, have a watermelon slice 🍉.


founded 2 years ago
MODERATORS
 

Want to ensure financial documents cant be parsed by automated systems

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 3 points 1 week ago* (last edited 1 week ago)

Lots of software can manipulate PDF. Open PDF in libredraw change pages,print as PDF or export as PDF. A system that skims content is purposely going to bypass any signed restriction.

Edit: Here's how to bypass restriction in Paperless OCR.

The parameter PAPERLESS_OCR_USER_ARGS: ‘{“invalidate_digital_signatures”: true}’ in the context of Paperless-ngx and OCRmyPDF allows OCR processing of PDF documents that have been digitally signed by intentionally invalidating those signatures. In its standard configuration, OCRmyPDF does not process documents with digital signatures so as not to compromise their integrity. Setting this parameter to true allows OCR on such documents