this post was submitted on 22 Feb 2024

162 points (98.8% liked)

Selfhosted

39253 readers

227 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago

MODERATORS

[email protected]

162

what's your experience with paperless? (github.com)

submitted 7 months ago by [email protected] to c/[email protected]

56 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] [email protected] 6 points 7 months ago

I haven't really configured a tagging system that makes any sense so it's mostly used the search through documents through text. I'd like to figure out how to hook up a vector database to it to do really fuzzy searching

[–] [email protected] 4 points 7 months ago (2 children)

I currently have a love hate relationship with it, but that’s mostly because of issues outside of paperless. I had been uploading to my server automatically with Nextcloud, and processing the files with paperless as they came in. Next thing I know, all the files are gone and none of the documents are available in paperless any longer, just the OCR translations that… leave something to be desired sometimes.

I’ve scrapped the whole thing in the short term, and will likely try again in the long term. Just need to find the time.

[–] [email protected] 4 points 6 months ago (1 children)

Sounds like maybe you ran it as a container and didn't mount the document archive externally then updated the container. That would have likely blown away the actual ingested documents but left the Metadata (including the OCR data) where it was, assuming the database was either its own container or mounted externally

[–] [email protected] 1 points 6 months ago

Nah, the container had an external volume for storage. That’s one of the first things I checked.

[–] [email protected] 1 points 7 months ago (1 children)

Where did the docs go?

[–] [email protected] 1 points 6 months ago (1 children)

I believe the server purged them for space? Not really sure tbh.

[–] [email protected] 1 points 6 months ago (1 children)

I encountered such problem the moment when watchtower pulled the latest update, including postgres:16. After setting the docker compose file back to postgres:15.4 and updating the stack, all data reappeared.

[–] [email protected] 1 points 6 months ago

This could have been it. At this point the data is fully lost though, as I killed the container completely.

[–] [email protected] 7 points 7 months ago

Paperless was my docker training program. I did so many mistakes and end up losing my database 3 times. My fourth try, runs smooth and I backup everything regularly. Actually 1.300 documents.

After indexing everything, I learned loving the archive feature. Docs I scanned, and don't want to trash in real got a number in paperless and the same number in the paper folder.

[–] [email protected] 3 points 7 months ago

Works great. Setup a month ago and imported over 600 documents, both digital and scanned. Makes backup a lot easier too as everything is in one place now.

[–] [email protected] 14 points 7 months ago (3 children)

Does it do OCR? And can you create tags / naming convention / folders based on rules and text within the scanned document? I want to digitize all my paperwork, but there is so much I don't have time to do the organizing part of it manually.

[–] [email protected] 5 points 7 months ago

Yes

[–] [email protected] 6 points 7 months ago

https://docs.paperless-ngx.com/#features

[–] [email protected] 12 points 7 months ago

I believe it's yes for all

[–] [email protected] 130 points 7 months ago (2 children)

Ha! I wrote it! Well the original anyway. It's been forked a few times since I stepped away.

So yeah, I think it's pretty cool 😆

[–] [email protected] 7 points 6 months ago (1 children)

Just want to say thank you! Paperless is one of the first things I recommend to anyone considering self hosting their infra. Amazing piece of work!

[–] [email protected] 3 points 6 months ago

Thanks! The crazy thing is that it's really not that complicated. I'd say the hardest work was in writing the docs :-). It's awesome to hear that people still use it and love it though.

[–] [email protected] 21 points 7 months ago (1 children)

Legend!

Do you use NGX yourself?

[–] [email protected] 105 points 7 months ago (4 children)

Actually, I stepped away from the project 'cause I stopped using it altogether. I started the project to satisfy the British government with their ridiculous requirements for proof of my relationship with my wife so I could live here. Once I was settled though and didn't need to be able to bring up flight itineraries from 5 years ago, it stopped being something I needed.

Well that, and lemme tell you, maintaining a popular Free software project is HARD. Everyone has an idea of where stuff should go, but most of the contributions come in piecemeal, so you're left mostly acting as the one trying to wrangle different styles and architectures into something cohesive... while you're also holding down a day job. It was stressful to say the least, and with a kid on the way, something had to give.

But every once in a while I consider installing paperless-ngx just to see how it's come along, and how much has changed. I'm absolutely delighted that it's been running and growing in my absence, and from the screenshots alone, I see that a lot of the ideas people had when I was helming made it in in the end.

[–] [email protected] 2 points 6 months ago (1 children)

Thank you very much for the generously contributed code and time while working on it.The effort you put in, will live on for many years to come.

[–] [email protected] 1 points 6 months ago

Aww! Thank you! It was fun ❤️

[–] [email protected] 10 points 6 months ago

Hey man, that is what I used it for, but with the Belgian government! Great piece of software though!

[–] [email protected] 5 points 7 months ago

It is honestly an awesone tool.

[–] [email protected] 23 points 7 months ago

Oh wow! Quite a journey!

I'd consider Paperless a hall-of-famer for self-hosted software and something most people who get into self-hosting discover at some point, even if they don't use it.

So thanks for building it, even if you've moved on. You gave the forkers something great to build from.

[–] [email protected] 8 points 7 months ago

I've used it for a few months now and find it quite useful for storing and organizing my physical documents.

[–] [email protected] 5 points 7 months ago (2 children)

What scanners do people recommend?

[–] [email protected] 2 points 6 months ago

If you have an Android phone I can't recommend Genius Scan enough. Fast, accurate, lots of features. I use it with syncthing by exporting the files to a folder that's configured to sync the paperless input folder.

[–] [email protected] 20 points 7 months ago* (last edited 7 months ago) (2 children)

Brother ADS-1700W is amazing!

no PC or USB required: place it anywhere
WiFi
scans a page double-sided to PDF in two seconds!
sends file to network share, ready to be consumed by Paperless
fully automatic, no button presses needed!
tiny footprint
document feeder
use with separator pages to bulk-scan many documents in one go

😍

[–] [email protected] 3 points 7 months ago (2 children)

275€

Photos with my phone it is! Most documents I receive nowadays are digital anyway.

[–] [email protected] 1 points 6 months ago

I'm in the same boat, but didn't jump so yet. I've been following paperless for a while now but every time I look at scanners I'm blown away by their prices...

[–] [email protected] 2 points 6 months ago

You are right, it's not cheap. I delayed buying one for literally years but once I did, it was a game changer.

[–] [email protected] 4 points 7 months ago

I have this scanner and agree it works great, although there may be cheaper options out there that also work well.

FTP worked well out of the box. For SFTP on Arch Linux, I had to follow the troubleshooting instructions here: https://wiki.archlinux.org/title/Very_Secure_FTP_Daemon#LIST_command_resets_connection.

[–] [email protected] 27 points 7 months ago (1 children)

Slow and unreliable with sqlite, but rock solid and amazing with postgres.

Today, every document I receive goes into my duplex ADF scanner to scan to a network share which is monitored by Paperless. Documents there are ingested and pre-tagged, waiting for me to review them in the inbox. Unlike other posters here, I find the tagging process extremely fast and easy. Granted, I didn't have to bring in thousands of documents to begin with but started from a clean slate.

What's more, development is incredibly fast-moving and really useful features are added all the time.

[–] [email protected] 5 points 7 months ago (1 children)

Slow and unreliable with sqlite, but rock solid and amazing with postgres.

I haven't noticed any major performance issues with sqlite. What tasks improved for you when you moved to postgres?

[–] [email protected] 2 points 7 months ago

Page loading times, general stability. Everything, really.

I set it up with sqlite initially to test if it was for me, and was surprised how flaky it felt given how highly people spoke about it. I'm really glad I tried with postgres instead of just tearing it down. But my experience is highly anecdotal, of course.

[–] [email protected] 4 points 7 months ago* (last edited 7 months ago)

My family and I really like it. I invested in a small, physical scanner capable of network file sharing that we have plugged in and always ready to scan. When we get documents or receipts, we scan them and they're immediately added to the database. I also have it checking an email address (mine is custom, but you could really have it check any address) and any time a PDF or such is sent, it gets consumed and that email them gets sorted.

There are a few downsides, however. As mentioned in other posts, turning your physical stack of documents into a digital stack of documents is just trading one pile for another. At least with a digital pile, you can sort a little quicker, but you still have to sort the consumed documents and check them to make sure the engine, which is supposed to be learning, has elected to sort the documents correctly.

The compose stack is pretty easy to use, but it does benefit from a little knowledge in Docker/containers. Especially when the main container decides it's not healthy. I wouldn't recommend it to a first time Docker user, is all.

Additionally and also previously mentioned, if you're keeping important documents in it, encrypted storage with encrypted back up is important.

load more comments