'java.lang.OutOfMemoryError' when using a Base64 encoded, embedded JPEG image #51

skjardenCode · 2016-11-19T16:36:51Z

Hello,

I recently experienced an OutOfMemory error while using the OpenHtmlToPDF framework. Our requirements are rather normal, that is, generating a PDF file out of a simple HTML file which contains only basic CSS 2.0 and XHTML - mainly tables, text and up to three images.

We ran a stress test because the framework should be integrated in our server component, which needs to convert HTML to PDF for our clients. I used a fairly simple for-loop to iterate over HTML files and for each HTML-content, we used OpenHtmlToPDF to generate a PDF file. After about 6000 iterations, the test stopped and a "java.lang.OutOfMemoryError" was shown.

I then took apart all the components, stripped away code step by step to reproduce the OutOfMemoryError with minimal test-code and the result was this simple test case:

@Test
public void test_stressPdfRendererBuilder() throws Exception
{
    int count = 10000;

    String html = FileUtils.readFileToString( new File( "html-with-embedded-jpg.html" ), Charsets.UTF_8 );

    for ( int i = 0; i < count; i++ )
    {
        System.err.println( "i: " + i );

        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();

        PdfRendererBuilder builder = new PdfRendererBuilder();
        builder.withHtmlContent( html, null );
        builder.toStream( byteArrayOutputStream );

        builder.run();
    }
}

The file html-with-embedded-jpg.html is a simple HTML with a img-Tag with embedded JPEG image (Base64 encoded). You can display that HTML file with the image with any browser.

Running the above test, one can see in the Windows Task Manager, how the occupied memory grows rapidly (interestingly, the Java heap space is doing "ok"). In iteration 5000, it was at ~ 1,6 GB.

After about iteration 6000, the "java.lang.OutOfMemoryError" occurs, with the following stack trace:

java.lang.OutOfMemoryError: Initializing Reader
    at com.sun.imageio.plugins.jpeg.JPEGImageReader.initJPEGImageReader(Native Method)
    at com.sun.imageio.plugins.jpeg.JPEGImageReader.<init>(Unknown Source)
    at com.sun.imageio.plugins.jpeg.JPEGImageReaderSpi.createReaderInstance(Unknown Source)
    at javax.imageio.spi.ImageReaderSpi.createReaderInstance(Unknown Source)
    at javax.imageio.ImageIO$ImageReaderIterator.next(Unknown Source)
    at javax.imageio.ImageIO$ImageReaderIterator.next(Unknown Source)
    at org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.readJPEG(JPEGFactory.java:103)
    at org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.createFromStream(JPEGFactory.java:78)
    at com.openhtmltopdf.pdfboxout.PdfBoxOutputDevice.realizeImage(PdfBoxOutputDevice.java:688)
    at com.openhtmltopdf.pdfboxout.PdfBoxUserAgent.getImageResource(PdfBoxUserAgent.java:81)
    at com.openhtmltopdf.pdfboxout.PdfBoxReplacedElementFactory.createReplacedElement(PdfBoxReplacedElementFactory.java:58)
    at com.openhtmltopdf.render.BlockBox.calcMinMaxWidth(BlockBox.java:1524)
    at com.openhtmltopdf.render.BlockBox.calcMinMaxWidthInlineChildren(BlockBox.java:1684)
    at com.openhtmltopdf.render.BlockBox.calcMinMaxWidth(BlockBox.java:1567)
    at com.openhtmltopdf.newtable.TableBox$AutoTableLayout.recalcColumn(TableBox.java:1240)
    at com.openhtmltopdf.newtable.TableBox$AutoTableLayout.fullRecalc(TableBox.java:1214)
    at com.openhtmltopdf.newtable.TableBox$AutoTableLayout.calcMinMaxWidth(TableBox.java:1509)
    at com.openhtmltopdf.newtable.TableBox.calcMinMaxWidth(TableBox.java:158)
    at com.openhtmltopdf.newtable.TableBox.layout(TableBox.java:221)
    at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
    at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
    at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
    at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:990)
    at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:870)
    at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:799)
    at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
    at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
    at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
    at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:990)
    at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:870)
    at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:799)
    at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild0(BlockBoxing.java:321)
    at com.openhtmltopdf.layout.BlockBoxing.layoutBlockChild(BlockBoxing.java:299)
    at com.openhtmltopdf.layout.BlockBoxing.layoutContent(BlockBoxing.java:90)
    at com.openhtmltopdf.render.BlockBox.layoutChildren(BlockBox.java:990)
    at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:870)
    at com.openhtmltopdf.render.BlockBox.layout(BlockBox.java:799)
    at com.openhtmltopdf.pdfboxout.PdfBoxRenderer.layout(PdfBoxRenderer.java:431)
    at com.openhtmltopdf.pdfboxout.PdfRendererBuilder.run(PdfRendererBuilder.java:54)
    ...
    ...

I digged into the code and ended up in PdfBoxOutputDevice.realizeImage(PdfBoxImage) where I found the following lines:

if (img.isJpeg()) {
    xobject = JPEGFactory.createFromStream(_writer,
            new ByteArrayInputStream(img.getBytes()));
} else {
    BufferedImage buffered = ImageIO.read(new ByteArrayInputStream(
            img.getBytes()));

    xobject = LosslessFactory.createFromImage(_writer, buffered);
}

So there is a condition where JPEGFactory.createFromStream is used, if the image is an JPEG, otherwise ImageIO.read is used.

So I changed my test-html-file to embed a PNG instead of an JPEG image - and the OutOfMemory error was gone. Java heap is doing fine, the Windows Task Manager shows only ~ 80 MB memory usage for the Java process no matter how many iterations I run.

Doing a simple seach I came across this:

http://stackoverflow.com/questions/11052091/pdfbox-out-of-memory-when-adding-image

Maybe there is a problem in PDFBox or in the way, the PDFBox-API is used to integrate an JPEG image into a PDDocument, I'm not sure.

So, the workaround for me is to not use the JPEG image format when embedding an image into the HTML code, but instead using PNG.

I wanted to post this issue here first. I'm sure you know how to debug the code better than me, but I hope I could help a bit with the above information.

Hope to hear from you and that there is an easy fix for it. Or maybe this is a bug in PdfBox eventually.

Thanks a lot!

The text was updated successfully, but these errors were encountered:

danfickle · 2016-11-20T03:21:24Z

Hi @skjardenCode
Thanks for the detailed work up. It does seem it may be a pdfbox bug, but I'll try to reproduce it with raw pdfbox code to make sure.

The pdfbox code in question is at the link below. The only thing I can see is that all the readers returned by the iterator may not have dispose called on them. It is also surprising that they always decompress the entire image even though I believe they just need metadata.

https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/JPEGFactory.java

I'll comment again here when I have debugged further.

danfickle · 2016-11-20T04:49:46Z

Unfortunately, I can't replicate this on mac (even with -Xmx30m. Possibly a windows specific issue? If you get a moment, could you please run the following code? If it crashes too, it will tell us definitively that the bug is in PDFBOX or the JRE.

import java.io.ByteArrayInputStream;
import java.util.Base64;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory;


public class TestUsage {
    public static void main(String...args) throws Exception {
        String jpeg = 
                "/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIs" +
                "IxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIy" + 
                "MjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAABAAEDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAA" + 
                "AAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAk" + 
                "M2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKT" + 
                "lJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QA" +
                "HwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdh" + 
                "cRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hp" + 
                "anN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk" + 
                "5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iiigD//2Q==";

        byte[] jpegBytes = Base64.getDecoder().decode(jpeg);

        for (int i = 0; i < 10000000; i++) {
            PDDocument doc = new PDDocument();

            try {
                JPEGFactory.createFromStream(doc, new ByteArrayInputStream(jpegBytes));
            } finally {
                doc.close();
            }
        }
    }
}

Thanks,
Daniel.

skjardenCode · 2016-11-20T10:01:32Z

Hey Daniel,

thanks a lot for your reply. I'm currently on a trip, I'll test your code as soon as I get back home in a few hours.

The only thing I can see is that all the readers returned by the iterator may not have dispose called on them.

You speak about the method private static BufferedImage readJPEG(InputStream stream) throws IOException ? At the end of it, there is a

// ....
finally
{
    if (iis != null)
    {
        iis.close();
    }
    reader.dispose();
}

closing the used reader. Or do you see another location where a reader does not get closed properly?

~ Timo

skjardenCode · 2016-11-21T22:08:12Z

Hey Daniel,

I'm sorry for the late answer, had a lot of work to be finished first.

I did the follwing 3 things with the given results:

Running your example code
--> No OufOfMemory error, Windows Task Manager shows normal and stable memory usage even with thousands of iterations
Next I changed your JPEG-code to the one I used inside of my test-html-file, which is bigger than the one in your example
--> Again no OufOfMemory error, Windows Task Manager shows normal and stable memory usage even with thousands of iterations
Third I wrapped the JPEG-code into very simple HTML and extended the test to use the PdfRendererBuilder / builder.run(); again instead of just JPEGFactory.createFromStream(...).
--> There is is again, a OufOfMemory error after a few thousand iterations and the Windows Task Manager shows memory usage for javaw.exe of over 1,5 GB after just a few seconds:

Test-code:

public class OpenHtmlToPdfOutOfMemoryTest2
{
    public static void main( String... args ) throws Exception
    {
        String jpeg =
                    "/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIs" +
                    "IxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIy" + 
                    "MjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAABAAEDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAA" + 
                    "AAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAk" + 
                    "M2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKT" + 
                    "lJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QA" +
                    "HwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdh" + 
                    "cRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hp" + 
                    "anN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk" + 
                    "5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3+iiigD//2Q==";
        
        String html = 
                "<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"de\" lang=\"de\">" +
                    "<body>" +
                        "<img src=\"data:image/jpeg;base64," + jpeg + "\" />" +
                    "</body>" +
                "</html>";

        for ( int i = 0; i < 10000000; i++ )
        {
            System.out.println( i );
            
            PdfRendererBuilder builder = new PdfRendererBuilder();
            builder.withHtmlContent( html, null );
            builder.toStream( new ByteArrayOutputStream() );
            
            builder.run();
        }
    }
}

(In the code above it is still your JPEG-code due to the size)

I seems that this is a system / OS depended problem, maybe with ImageIO native code of some sort. Unfortunately, I'm no expert of the Java Memory Model. The heap during the test is OK, but the used memory shown in the Windows Task Manager is just "exploding". I know that those memory values are "virtual memory usage" and not directly related to JVM heap usage. But the memory usage accumulation and the OufOfMemory in the end are clearly indicating a problem.

As a side note, maybe important: I'm still using Java 6 ("1.6.0_45", SUN JDK) due to project restrictions at the moment.

Please let me know if I can be of any help, running some more tests etc.

…nished with. [ci skip]

danfickle · 2016-11-22T11:09:50Z

Embarrassingly, it turns out I wasn't calling dispose! I've added it and done a release 0.0.1-RC8 so you could try your stress test again.

MartyMcMartface · 2016-11-22T12:48:28Z

Sorry to butt into this thread but shouldn't reader.dispose() be inside a finally clause?

skjardenCode · 2016-11-22T20:05:41Z

Sorry to butt into this thread but shouldn't reader.dispose() be inside a finally clause?

Yes, I think this should be the case. Just declare reader outside of the try..catch-block and use the existing finally block where you already close the stream.

I've added it and done a release 0.0.1-RC8 so you could try your stress test again.

Thanks a lot, I'll try it tomorrow and report back.

skjardenCode · 2016-11-25T17:58:23Z

Hey Daniel,

I've added it and done a release 0.0.1-RC8 so you could try your stress test again.

I tested it again and it works - no more memory leak, I can do thousands of iterations, the heap and Windows Task Manager both stay below ~ 60 MB memory usage.

I've also done a re-check by commenting out the line reader.dispose(); in PdfBoxImage and the OutOfMemory error occurs again.

As @MartyMcMartface mentioned, you should do the disposal inside of a finally-block and everything should be fine.

Thanks for your support!

danfickle mentioned this issue Nov 20, 2016

Risks if generating PDF from user supplied HTML? #50

Open

danfickle added a commit that referenced this issue Nov 22, 2016

For #51 - Make sure we are calling dispose on the ImageReader once fi…

9d543fd

…nished with. [ci skip]

danfickle closed this as completed in c82edd1 Dec 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'java.lang.OutOfMemoryError' when using a Base64 encoded, embedded JPEG image #51

'java.lang.OutOfMemoryError' when using a Base64 encoded, embedded JPEG image #51

skjardenCode commented Nov 19, 2016 •

edited

Loading

danfickle commented Nov 20, 2016

danfickle commented Nov 20, 2016

skjardenCode commented Nov 20, 2016 •

edited

Loading

skjardenCode commented Nov 21, 2016

danfickle commented Nov 22, 2016

MartyMcMartface commented Nov 22, 2016

skjardenCode commented Nov 22, 2016 •

edited

Loading

skjardenCode commented Nov 25, 2016

'java.lang.OutOfMemoryError' when using a Base64 encoded, embedded JPEG image #51

'java.lang.OutOfMemoryError' when using a Base64 encoded, embedded JPEG image #51

Comments

skjardenCode commented Nov 19, 2016 • edited Loading

danfickle commented Nov 20, 2016

danfickle commented Nov 20, 2016

skjardenCode commented Nov 20, 2016 • edited Loading

skjardenCode commented Nov 21, 2016

danfickle commented Nov 22, 2016

MartyMcMartface commented Nov 22, 2016

skjardenCode commented Nov 22, 2016 • edited Loading

skjardenCode commented Nov 25, 2016

skjardenCode commented Nov 19, 2016 •

edited

Loading

skjardenCode commented Nov 20, 2016 •

edited

Loading

skjardenCode commented Nov 22, 2016 •

edited

Loading