PDFBox Extracting Image
In this section, we will learn how to extract image from the existing PDF document. The PDFBox library provides a PDFRender class which renders a PDF document into an AWT BufferedImage.
Follow the steps below to extract an image from the existing PDF document-
Load Existing PDF Document
We can load the existing PDF document by using the static load () method. This method accepts a file object as a parameter. We can also invoke it using the class name PDDocument of the PDFBox.
Instantiate the PDFRender class
PDFRenderer class renders a PDF document into an AWT BufferedImage. The instance of this class needs a document object as its parameter. This can be shown in the following code.
Render Image
The renderImage() method of the Renderer class can be used to render the image in a particular page. This method need to pass the index of the page, where we have the image that is to be rendered.
Writing the Image to a File
We can write the rendered image to a file using the write () method. In this method, we need to pass three parameters –
- The rendered image object.
- String representing the type of the image (jpg or png).
- File object to which we need to save the extracted image.
This can be shown in the following code:
Close Document
After completing the task, we need to close the PDDocument class object by using the close () method.
Example-
This is a PDF document which we are going to extract its page as an image by using PDFBox library of a Java program.
Java Program
Output:
After successful execution, the above program shows the following output.
Now for verification, open the image as shown below-