Javascript - How to use pdf.js locally

Teepeemm
January 2, 2025
185 views
0 votes
2 Answers

I have a locally running web app in which I would like to extract data from a pdf. pdf.js would work well for this, but it uses modules, which causes cross origin request errors for the file: protocol. Running a local web server gets around the CORS issue, but is too complicated of a task for my users. Is there a way to adjust the code for pdf.js to have it run locally?

locally-installed PDF.JS does not render asked the same question over five years ago, but the answer is two major versions out of date: loading a module as a regular js file doesn’t work because of the slightly differing syntax (export becomes a keyword).

Building PDF.js says that npx gulp generic-legacy will generate build/generic/build/pdf[.worker].js (possibly also npx gulp generic; "this" isn’t clear). This statement appears to be incorrect, as running those commands creates the .mjs files, not .js files.

I think this answer means that what I want can be accomplished using Webpack or Rollup, but the answer doesn’t specify how, and I haven’t been able to figure that out.

(I will be having my users <input> the file using the html widget, so the fact that the pdf is local isn’t a problem.)

A sample of working code is:

<!doctype html>
<html lang="en-US">
    <head>
        <meta charset="utf-8">
        <title>pdf.js test</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/4.9.155/pdf.min.mjs" type="module"></script>
<!-- works with all schemes, but pdf.js is remote -->
<!--<script src="pdf.js" type="module"></script>-->
<!-- local pdf.js doesn't work with file:// scheme -->
<!--<script src="pdf.js"></script>-->
<!-- Uncaught SyntaxError: Cannot use 'import.meta' outside a module -->
    </head>
    <body>
        <input id="fileUploader" type="file">
        <p>Number of pages: <span id="pageCount"></span></p>
    </body>
    <script>
        //const pdfjsLib = require('pdf.js');
        document.getElementById('fileUploader').addEventListener('change', ({target}) => {
            //pdfjsLib.GlobalWorkerOptions.workerSrc = 'pdf.worker.js';
            pdfjsLib.GlobalWorkerOptions.workerSrc =
                `https://cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjsLib.version}/pdf.worker.min.mjs`;
            const reader = new FileReader();
            reader.addEventListener('load', async () => {
                const buffer = reader.result;
                const pdf = await pdfjsLib.getDocument({ data: new Uint8Array(buffer) }).promise;
                const numPages = pdf.numPages;
                document.getElementById('pageCount').textContent = numPages;
            });
            reader.readAsArrayBuffer(target.files[0]);
        });
    </script>
</html>

Answers

Chosen as BEST ANSWER

I was able to get this working with Rollup after all (with some help from Google's Gemini). Starting with the two files pdfjs-dist/build/pdf[.worker].mjs, I created the file rollup.config.mjs:

export default [ {
    input: 'pdf.mjs',
    output: {
        file: 'pdf.local.js',
        format: 'iife', // for browsers
        name: 'pdfjsLib'
    }
}, {
    input: 'pdf.worker.mjs',
    output: {
        file: 'pdf.worker.local.js',
        format: 'iife', // for browsers
        name: 'PDFWorker'
    }
} ];

and ran the command npx rollup -c. Then, the html:

<!doctype html>
<html lang="en-US">
    <head>
        <meta charset="utf-8">
        <title>pdf.js test</title>
        <script src="pdf.local.js"></script>
        <script src="pdf.worker.local.js"></script>
    </head>
    <body>
        <input id="fileUploader" type="file">
        <p>Number of pages: <span id="pageCount"></span></p>
    </body>
    <script>
        document.getElementById('fileUploader').addEventListener('change', ({target}) => {
            const reader = new FileReader();
            reader.addEventListener('load', async () => {
                const buffer = reader.result;
                const pdf = await pdfjsLib.getDocument({ data: new Uint8Array(buffer) }).promise;
                const numPages = pdf.numPages;
                document.getElementById('pageCount').textContent = numPages;
            });
            reader.readAsArrayBuffer(target.files[0]);
        });
    </script>
</html>

was able to process a pdf without internet access. (This still gets the warning about setting up a fake worker, but I think that's unavoidable without more drastic modification of pdf.js.)

(Edit)

- Allan
- January 1, 2025 at 11:57 pm
- 0 votes
0
If i understand the question correctly, you have pdf.js setup as a separate script file.

I suggest that you use vite along with vite-plugin-singlefile to bundle all necessary js into a single html file.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Javascript – How to use pdf.js locally

Answers