skip to Main Content

We are trying to read a PDF and populate values in it dynamically. Based on a incoming request we run some rules and derive what PDF to use and then populate values to it dynamically. We are using Apache PDFBox version 2.0.11 and for some reason we are facing issues with a particular PDF Template. We are not able to read some of the fields for this particular template and generated PDF is incomplete. Wondering if something to do with original PDF itself. Here is the code snippet we are using to read fields and populate it.

PDDocument pdfTemplate = PDDocument.load(inputStream);
PDDocumentCatalog docCatalog = pdfTemplate.getDocumentCatalog();
PDAcroForm  acroForm = docCatalog.getAcroForm();
acroForm.setXFA(null);
COSArrayList<PDField> list = (COSArrayList<PDField>) acroForm.getFields();
for (PDField field : list) {
     field.setReadOnly(true);
      logger.debug("Field name "+field.getFullyQualifiedName())))
      //use logic to populate value by calling field.setValue();
}

When we tried to print each field name we observed that more than 30 percent of the fields are missing. Can any one help on how to fix it? PDF is of 15 pages with different questions. If the issue is with Original PDF itself then what might be reason to not able read some of the fields?

2

Answers


  1. Chosen as BEST ANSWER

    Issue was resolved after reconstructing the whole PDF again.


  2. You probably have hierarchical fields on that form. Try something like the code below instead…

    PDDocument pdfTemplate = PDDocument.load(inputStream);
    PDDocumentCatalog docCatalog = pdfTemplate.getDocumentCatalog();    
    PDAcroForm  acroForm = docCatalog.getAcroForm();
    PDFieldTree fieldTree = acroForm.getFieldTree();
    Iterator<PDField> fieldTreeIterator = fieldTree.iterator();
    while (fieldTreeIterator.hasNext()) {
        PDField field = fieldTreeIterator.next();
        if (field instanceof PDTerminalField) {
            String fullyQualifiedName = field.getFullyQualifiedName();
            logger.debug("Field name "+fullyQualifiedName);
        }
    }
    

    PDAcroForm.getFields() only gets the root fields, not their children. PDAcroForm.getFieldTree() gets all fields but then you need to test to see if they’re terminal before setting a value. Non-terminal fields can’t have a value and don’t have widgets (representations on the page) associated with them. You’ll know this is the problem if the fully qualified name has periods in it. The periods represent the hierarchy.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search