Hello,
When I worked on the predecessor of Paperwork (about 2008~2009), I had to chose a file format to store the scanned documents. At the time, I choose JPEG. The reasoning was that I assumed 300dpi+JPEG would be good enough for OCR and it would spare disk space. (also don’t ask me why I used the file extension .jpg
instead of .jpeg
… that was just a dumb mistake). When I started working on Paperwork (~2011), I didn’t challenge this decision.
About 1 year ago, FeRD made the case that JPEG is a bad choice for OCR.
Therefore, for Paperwork 2.2, I’ve decided to switch the default image format from JPEG to PNG.
Things to know regarding this change:
- Your existing documents won’t be converted. They will remain in JPEG. Only new documents/pages will be in PNG. The change is backward compatible, so Paperwork will still be able to read them just fine.
- There will be an option to switch back to JPEG if you want. It will be possible to switch it only from the command line, not from the graphical interface.
- To give you some idea of the change, I’ve converted the JPEGs in my 2700 documents into PNGs: it went from 3.2GB to 19GB.
- This change is not forward compatible: documents generated by Paperwork 2.2 won’t be readable by Paperwork 2.1, as Paperwork 2.1 only looks for JPEG files.
If you want, you can already give it a try. Change is available on the branch ‘develop’:
- on Linux:
- Flatpak:
flatpak --user install https://builder.openpaper.work/paperwork_develop.flatpakref
- AppImage:
wget https://download.openpaper.work/linux/amd64/paperwork-gtk-develop-latest.appimage
- Flatpak:
- on Windows: https://download.openpaper.work/windows/installer/paperwork_develop_installer.exe
Best regards,