skip to Main Content

I have a locally running web app in which I would like to extract data from a pdf. pdf.js would work well for this, but it uses modules, which causes cross origin request errors for the file: protocol. Running a local web server gets around the CORS issue, but is too complicated of a task for my users. Is there a way to adjust the code for pdf.js to have it run locally?

locally-installed PDF.JS does not render asked the same question over five years ago, but the answer is two major versions out of date: loading a module as a regular js file doesn’t work because of the slightly differing syntax (export becomes a keyword).

Building PDF.js says that npx gulp generic-legacy will generate build/generic/build/pdf[.worker].js (possibly also npx gulp generic; "this" isn’t clear). This statement appears to be incorrect, as running those commands creates the .mjs files, not .js files.

I think this answer means that what I want can be accomplished using Webpack or Rollup, but the answer doesn’t specify how, and I haven’t been able to figure that out.

(I will be having my users <input> the file using the html widget, so the fact that the pdf is local isn’t a problem.)

A sample of working code is:

<!doctype html>
<html lang="en-US">
    <head>
        <meta charset="utf-8">
        <title>pdf.js test</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/4.9.155/pdf.min.mjs" type="module"></script>
<!-- works with all schemes, but pdf.js is remote -->
<!--<script src="pdf.js" type="module"></script>-->
<!-- local pdf.js doesn't work with file:// scheme -->
<!--<script src="pdf.js"></script>-->
<!-- Uncaught SyntaxError: Cannot use 'import.meta' outside a module -->
    </head>
    <body>
        <input id="fileUploader" type="file">
        <p>Number of pages: <span id="pageCount"></span></p>
    </body>
    <script>
        //const pdfjsLib = require('pdf.js');
        document.getElementById('fileUploader').addEventListener('change', ({target}) => {
            //pdfjsLib.GlobalWorkerOptions.workerSrc = 'pdf.worker.js';
            pdfjsLib.GlobalWorkerOptions.workerSrc =
                `https://cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjsLib.version}/pdf.worker.min.mjs`;
            const reader = new FileReader();
            reader.addEventListener('load', async () => {
                const buffer = reader.result;
                const pdf = await pdfjsLib.getDocument({ data: new Uint8Array(buffer) }).promise;
                const numPages = pdf.numPages;
                document.getElementById('pageCount').textContent = numPages;
            });
            reader.readAsArrayBuffer(target.files[0]);
        });
    </script>
</html>

2

Answers


  1. Chosen as BEST ANSWER

    I was able to get this working with Rollup after all (with some help from Google's Gemini). Starting with the two files pdfjs-dist/build/pdf[.worker].mjs, I created the file rollup.config.mjs:

    export default [ {
        input: 'pdf.mjs',
        output: {
            file: 'pdf.local.js',
            format: 'iife', // for browsers
            name: 'pdfjsLib'
        }
    }, {
        input: 'pdf.worker.mjs',
        output: {
            file: 'pdf.worker.local.js',
            format: 'iife', // for browsers
            name: 'PDFWorker'
        }
    } ];
    

    and ran the command npx rollup -c. Then, the html:

    <!doctype html>
    <html lang="en-US">
        <head>
            <meta charset="utf-8">
            <title>pdf.js test</title>
            <script src="pdf.local.js"></script>
            <script src="pdf.worker.local.js"></script>
        </head>
        <body>
            <input id="fileUploader" type="file">
            <p>Number of pages: <span id="pageCount"></span></p>
        </body>
        <script>
            document.getElementById('fileUploader').addEventListener('change', ({target}) => {
                const reader = new FileReader();
                reader.addEventListener('load', async () => {
                    const buffer = reader.result;
                    const pdf = await pdfjsLib.getDocument({ data: new Uint8Array(buffer) }).promise;
                    const numPages = pdf.numPages;
                    document.getElementById('pageCount').textContent = numPages;
                });
                reader.readAsArrayBuffer(target.files[0]);
            });
        </script>
    </html>
    

    was able to process a pdf without internet access. (This still gets the warning about setting up a fake worker, but I think that's unavoidable without more drastic modification of pdf.js.)


  2. If i understand the question correctly, you have pdf.js setup as a separate script file.

    I suggest that you use vite along with vite-plugin-singlefile to bundle all necessary js into a single html file.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search