Replacing all text in powerpoint using Apache POI

Lez
September 24, 2018
219 views
0 votes
2 Answers

I looked at the apache POI documentation and created a function that redacts all the text in a powerpoint. Function works well in replacing texts in slides but not the texts found in grouped textboxes. Is there seperate object that handles the grouped items?

private static void redactText(XMLSlideShow ppt) {
    for (XSLFSlide slide : ppt.getSlides()) {
        System.out.println("REDACT Slide: " + slide.getTitle());

        XSLFTextShape[] shapes = slide.getPlaceholders();

        for (XSLFTextShape textShape : shapes) {

            List<XSLFTextParagraph> textparagraphs = textShape.getTextParagraphs();

            for (XSLFTextParagraph para : textparagraphs) {

                List<XSLFTextRun> textruns = para.getTextRuns();

                for (XSLFTextRun incomingTextRun : textruns) {

                    String text = incomingTextRun.getRawText();

                    System.out.println(text);

                    if (text.toLowerCase().contains("test")) {

                        String newText = text.replaceAll("(?i)" + "test", "XXXXXXXX");

                        incomingTextRun.setText(newText);

                    }
                }
            }

        }
    }
}

Answers

If the need is simply getting all text contents independent of in what objects it is, then one could simply do exactly that. Text contents are contained in org.apache.xmlbeans.XmlString elements. In PowerPoint XML they are in a:t tags. Name space a=”http://schemas.openxmlformats.org/drawingml/2006/main”.

So following code gets all text in all objects in all slides and does replacing case-insensitive string “test” with “XXXXXXXX”.

import java.io.FileInputStream;
import java.io.FileOutputStream;

import org.apache.poi.xslf.usermodel.*;
import org.openxmlformats.schemas.presentationml.x2006.main.CTSlide;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlString;

public class ReadPPTXAllText {

 public static void main(String[] args) throws Exception {

  XMLSlideShow slideShow = new XMLSlideShow(new FileInputStream("MicrosoftPowerPoint.pptx"));
  for (XSLFSlide slide : slideShow.getSlides()) {
   CTSlide ctSlide = slide.getXmlObject();
   XmlObject[] allText = ctSlide.selectPath(
    "declare namespace a='http://schemas.openxmlformats.org/drawingml/2006/main' " +
    ".//a:t"
   );
   for (int i = 0; i < allText.length; i++) {
    if (allText[i] instanceof XmlString) {
     XmlString xmlString = (XmlString)allText[i];
     String text = xmlString.getStringValue();
System.out.println(text);
     if (text.toLowerCase().contains("test")) {
      String newText = text.replaceAll("(?i)" + "test", "XXXXXXXX");
      xmlString.setStringValue(newText);
     }
    }
   }
  }

  FileOutputStream out = new FileOutputStream("MicrosoftPowerPointChanged.pptx");
  slideShow.write(out);
  slideShow.close();
  out.close();
 }
}

- developer
- May 13, 2019 at 11:35 am
- 0 votes
0
If one doesn’t like the approach of replacing via Xml directly, it is possible to iterate over all slides and their shapes. If a shape is a XSLFTextShape, get the paragraphs and handle them like you did.
If you receive a XSLFGroupShape, iterate over their getShapes() as well. Since they could contain different types of shapes you might use recursion for that. You might handle the shape type XSLFTable also.

But the real trouble starts when you realize, that something you want to replace is divided into several runs 😉

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.