Paperwork 2.2: Important change in the image file format!

Hello,

When I worked on the predecessor of Paperwork (about 2008~2009), I had to chose a file format to store the scanned documents. At the time, I choose JPEG. The reasoning was that I assumed 300dpi+JPEG would be good enough for OCR and it would spare disk space. (also don’t ask me why I used the file extension .jpg instead of .jpeg … that was just a dumb mistake). When I started working on Paperwork (~2011), I didn’t challenge this decision.

About 1 year ago, FeRD made the case that JPEG is a bad choice for OCR.

Therefore, for Paperwork 2.2, I’ve decided to switch the default image format from JPEG to PNG.

Things to know regarding this change:

  • Your existing documents won’t be converted. They will remain in JPEG. Only new documents/pages will be in PNG. The change is backward compatible, so Paperwork will still be able to read them just fine.
  • There will be an option to switch back to JPEG if you want. It will be possible to switch it only from the command line, not from the graphical interface.
  • To give you some idea of the change, I’ve converted the JPEGs in my 2700 documents into PNGs: it went from 3.2GB to 19GB.
  • This change is not forward compatible: documents generated by Paperwork 2.2 won’t be readable by Paperwork 2.1, as Paperwork 2.1 only looks for JPEG files.

If you want, you can already give it a try. Change is available on the branch ‘develop’:

Best regards,

Hello Jerome,

will import of jpeg files (e.g. from mobile phone gallery) still work?

Kind regards,

Andreas

Of course ! Images will simply be converted to PNG (as they are currently converted to JPEG).