skip to Main Content

I installed pdftoppm on my Amazon Linux instance with:

yum install poppler-utils

However, for the same file that is rendering correctly on my Mac laptop:
https://utfs.io/f/4fb084bf-0327-4064-94b9-034c0b160c10-3fawz8.pdf

The rasterized png version is rendered strangely:
enter image description here

The command I used to replicate is:
pdftoppm -png sample.pdf sample

This same command on my macos computer works fine. The CentOS pdftoppm version is 22.08.0 and the Macos local version is 24.01.0. Both are downloaded from the latest available poppler-utils. Is this a version issue? Any help would be greatly appreciated!

2

Answers


  1. It’s possible that the issue you’re encountering is related to differences between versions of pdftoppm, as you’ve identified. Newer versions of software may include bug fixes or improvements that affect how PDF files are rendered.

    One thing you could try is to update the version of pdftoppm on your Amazon Linux instance to match the version on your macOS computer. You might need to look for a repository or package source that provides a more up-to-date version of poppler-utils for your Amazon Linux distribution.

    Alternatively, you could try using a different tool or approach to convert the PDF file to PNG on your Amazon Linux instance. For example, you could try using Ghostscript or ImageMagick, which are also capable of converting PDF files to raster images.

    If updating the version or trying a different tool doesn’t resolve the issue, you may need to investigate further to determine the specific cause of the rendering problem. This could involve examining the PDF file itself to identify any unusual features or properties that might be causing the issue, or comparing the output of different rendering tools to see if there are any differences in how they interpret the file.

    Login or Signup to reply.
  2. The source you supplied is stated as based on PDFium using just 2 Named fonts.

    • TimesNewRomanPSMT
    • TimesNewRomanPS-BoldMT

    If the Fonts were embedded they would make such a small page size larger 12 KB even when highly compressed (20 KB decompressed to plain Text) ! If there were no fonts the "Plain Text" would be about 3-4 KB at most with White spaces. Like this.

    enter image description here

    The difference when PDF is converted into images, is that it tries to use all the positional baggage (the 400% extra positional and styling controls) in some cases results could even be worse!

    So I digress into the fact that Poppler has a lot of conflicting data to contend with as to sizes position and style, so it can only "approximate" the source intent.

    The key to that conversion is the "Font" and without the real font it uses substitution. So here sadly has chosen a totally unsuitable Sans-Serif to try to emulate the named Times-Roman thus should have used a simpler Times and Times Bold.

    In fairness it was mislead by non standard naming and all that unnecessary baggage.

    Given a suitable local substitute SYSTEM font it will do better and simply convert the 4 KB text into 500-600 KB of pixels.

    enter image description here

    ANSWER

    Either the correct named font or a good conversion list (such as GhostScript provides), needs to be found on the system where the conversion is done.

    Poppler should have some ability to pick from fonts but also the conversion server needs to provide them.

    Here we see that if PDFium had used suitably compressed fonts as Ghostscript does, the size would be much closer to the final image size. Poppler would then have no problems at all since the correct fonts are inside the PDF. So the solution is use Ghostscript to add the fonts to the PDF OR simpler use GhostScript to convert (in short one line command) the PDF into one image per page.

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search