I looked at the apache POI documentation and created a function that redacts all the text in a powerpoint. Function works well in replacing texts in slides but not the texts found in grouped textboxes. Is there seperate object that handles the grouped items?
private static void redactText(XMLSlideShow ppt) {
for (XSLFSlide slide : ppt.getSlides()) {
System.out.println("REDACT Slide: " + slide.getTitle());
XSLFTextShape[] shapes = slide.getPlaceholders();
for (XSLFTextShape textShape : shapes) {
List<XSLFTextParagraph> textparagraphs = textShape.getTextParagraphs();
for (XSLFTextParagraph para : textparagraphs) {
List<XSLFTextRun> textruns = para.getTextRuns();
for (XSLFTextRun incomingTextRun : textruns) {
String text = incomingTextRun.getRawText();
System.out.println(text);
if (text.toLowerCase().contains("test")) {
String newText = text.replaceAll("(?i)" + "test", "XXXXXXXX");
incomingTextRun.setText(newText);
}
}
}
}
}
}
2
Answers
If the need is simply getting all text contents independent of in what objects it is, then one could simply do exactly that. Text contents are contained in
org.apache.xmlbeans.XmlString
elements. InPowerPoint
XML
they are ina:t
tags. Name space a=”http://schemas.openxmlformats.org/drawingml/2006/main”.So following code gets all text in all objects in all slides and does replacing case-insensitive string “test” with “XXXXXXXX”.
If one doesn’t like the approach of replacing via Xml directly, it is possible to iterate over all slides and their shapes. If a shape is a
XSLFTextShape
, get the paragraphs and handle them like you did.If you receive a
XSLFGroupShape
, iterate over theirgetShapes()
as well. Since they could contain different types of shapes you might use recursion for that. You might handle the shape typeXSLFTable
also.But the real trouble starts when you realize, that something you want to replace is divided into several runs 😉