Paperwork doesn't work with ScanSnap ix1500

cherti · May 19, 2021, 8:25pm

I recently acquired a ScanSnap ix1500 and I’m observing some strange behavior in combination with Paperwork:
Paperwork does recognize the scanner, it is also pulling in pages, however, it is scanning more than the page is broad and it scans for an eternity, resulting in a meters long resulting scanned image, which paperwork is then even unable to display. I can briefly see the scanned page pop up in the preview, but afterwards it’s only grey and nothing is there (not even in the jpegs saved on disk).

However, both simple-scan as well as scanimage can properly scan from said scanner without issues. scanimage is doing it page after page, simple-scan is behaving similar to paperwork as it scans one page, then effectively waits quite some time before scanning the next one. The result however is a proper multipage-A4-document.

Hence: does this sound like it would be a bug-report or does it sound like something I could tweak by adjusting some settings somewhere?

jflesch · May 19, 2021, 8:50pm

Hello,

Actually, with some Fujitsu scanners, if you look closely at the scans from simple-scan, the output is slightly truncated (not all Fujitsu scanners). Paperwork/Libinsane set the scan area slightly differently to get the bigger one possible. So the scan is not truncated at all in Paperwork. However with most Fujistu scanner, it creates those annoyingly-long images.

Now regarding the fact that Paperwork is unable to display such long images, it is true as of Paperwork 2.0.2 (the version you get from Flathub or from your Linux distribution). However this is something already fixed in the branch master. You can get a Flatpak build here: doc/install.flatpak.markdown · develop · World / OpenPaperwork / paperwork · GitLab . It will be in Paperwork 2.0.3 too.

Once you have a version of Paperwork that can display those scans, the solution to your problem becomes fairly straight-forward: In the settings of Paperwork, you have something called “scanner calibration”: It allows you to pre-define how all the images you get from you scanner will be cropped.

cherti · May 19, 2021, 9:16pm

ah, thanks, that’s very good to hear. I can also build from master right away?

I had tried scanner calibration, but that was just doing nothing and I killed it when it was at 10G memory usage… On that note: is that expected, given the aforementioned issue, or a separate bug?

jflesch · May 19, 2021, 9:29pm

I can also build from master right away?

Sure, if you want to : doc/install.devel.markdown · develop · World / OpenPaperwork / paperwork · GitLab

when it was at 10G memory usage

That is actually weird. The images from Fujitsu scanners are long, but they shouldn’t be that long …

cherti · May 20, 2021, 11:05am

That did indeed fix the issue, thanks!

And the calibration also worked without memory leak, so possibly that was an issue due to the broken image handling or you implemented a drive-by-fix while fixing that.

cherti · May 20, 2021, 11:29am

On that note: do you have an idea, why both paperwork and simple-scan have to get the full image while scanimage apparently does not? I have noticed when I was using the feeder tray with multiple pages to confirm the initial post that both simple-scan and paperwork take quite some time per page to scan. Given that one can clearly see in paperwork why that is the case (scanning a loooooong image) I suspect that simple-scan has the same issue, albeit it not being visible to the user.
scanimage however simply pulls one page after the other, with virtually no waiting time in between (and the scan looks complete (although shitty, but that’s probably because I’m not sufficiently skilled in the scanimage-cli and picking the right options (and I didn’t even try to be honest for this was merely a crosscheck for paperwork))).

jflesch · May 20, 2021, 11:55am

why both paperwork and simple-scan have to get the full image while scanimage apparently does not?

2 reasons come to mind:

Cancelability:
Paperwork: Since Paperwork 2.0, you can reset a page: it brings it back to exactly what was scanned in a first place. The idea is that you will use the calibration to scan the most common format in your country (A4 or Letter usually). But sometimes you may want to scan something bigger (french taxes papers are slightly bigger than A4 for instance). In those cases, you can scan the page, get it cropped by the calibration, reset it, and crop it again yourself. But this implies that Paperwork must always scan the biggest area possible.
Simple-scan: In simple-scan, you may want to scan something, crop it, and then scan something else but crop it at a bigger size (in other words, cancel/modify your previous cropping) → from a GUI standpoint, it makes everything simpler and more obvious if you always scan the biggest area possible.
Reliability:
Short version: because scanners are hell, scanners are pain, scanners are suffering.
Longer answer: Scanners and scanner drivers are full of annoying quirks and bugs. The very long images you get from Fujitsu scanners is just one out of many, on both Linux and Windows (Libinsane has a big bunch of various workarounds for both platforms). So always scanning the biggest area seemed like the most straightforward way to ensure that the user always gets a consistent result.

jflesch · May 20, 2021, 12:00pm

Quite possible. As you have seen, the Fujitsu Sane backend is a bit buggy at times.

Also, if I’m not mistaken, those Sane backends (drivers) are developed by volunteer(s) by reverse-engineering. It wouldn’t be surprising if they took a shortcut by always asking the scanner to scan the biggest area possible, and then crop the resulting image in the backend.

Yep, that one is actually a bit more surprising. But anyway, I’m just going to go back to my previous “scanners are hell, scanners are pain, scanners are suffering”.

cherti · May 22, 2021, 10:14am

That sounds reasonable, thanks for the insight!

Might try the official Fujitsu scanner driver at some point, although now everything works now, so I might simply be content with that.