I have a locally running web app in which I would like to extract data from a pdf. pdf.js would work well for this, but it uses modules, which causes cross origin request errors for the file:
protocol. Running a local web server gets around the CORS issue, but is too complicated of a task for my users. Is there a way to adjust the code for pdf.js to have it run locally?
locally-installed PDF.JS does not render asked the same question over five years ago, but the answer is two major versions out of date: loading a module as a regular js file doesn’t work because of the slightly differing syntax (export
becomes a keyword).
Building PDF.js says that npx gulp generic-legacy
will generate build/generic/build/pdf[.worker].js
(possibly also npx gulp generic
; "this" isn’t clear). This statement appears to be incorrect, as running those commands creates the .mjs
files, not .js
files.
I think this answer means that what I want can be accomplished using Webpack or Rollup, but the answer doesn’t specify how, and I haven’t been able to figure that out.
(I will be having my users <input>
the file using the html widget, so the fact that the pdf is local isn’t a problem.)
A sample of working code is:
<!doctype html>
<html lang="en-US">
<head>
<meta charset="utf-8">
<title>pdf.js test</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/4.9.155/pdf.min.mjs" type="module"></script>
<!-- works with all schemes, but pdf.js is remote -->
<!--<script src="pdf.js" type="module"></script>-->
<!-- local pdf.js doesn't work with file:// scheme -->
<!--<script src="pdf.js"></script>-->
<!-- Uncaught SyntaxError: Cannot use 'import.meta' outside a module -->
</head>
<body>
<input id="fileUploader" type="file">
<p>Number of pages: <span id="pageCount"></span></p>
</body>
<script>
//const pdfjsLib = require('pdf.js');
document.getElementById('fileUploader').addEventListener('change', ({target}) => {
//pdfjsLib.GlobalWorkerOptions.workerSrc = 'pdf.worker.js';
pdfjsLib.GlobalWorkerOptions.workerSrc =
`https://cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjsLib.version}/pdf.worker.min.mjs`;
const reader = new FileReader();
reader.addEventListener('load', async () => {
const buffer = reader.result;
const pdf = await pdfjsLib.getDocument({ data: new Uint8Array(buffer) }).promise;
const numPages = pdf.numPages;
document.getElementById('pageCount').textContent = numPages;
});
reader.readAsArrayBuffer(target.files[0]);
});
</script>
</html>
2
Answers
I was able to get this working with Rollup after all (with some help from Google's Gemini). Starting with the two files
pdfjs-dist/build/pdf[.worker].mjs
, I created the filerollup.config.mjs
:and ran the command
npx rollup -c
. Then, the html:was able to process a pdf without internet access. (This still gets the warning about setting up a fake worker, but I think that's unavoidable without more drastic modification of
pdf.js
.)If i understand the question correctly, you have pdf.js setup as a separate script file.
I suggest that you use vite along with vite-plugin-singlefile to bundle all necessary js into a single html file.