Home » PDFBox Reading Text

PDFBox Reading Text

by Online Tutorials Library

PDFBox Reading Text

One of the main features of PDFBox library is its ability to quickly and accurately extract text from an existing PDF document. In this section, we will learn how to read text from an existing document in the PDFBox library by using a Java Program. The PDF document may contain text, animation, and images etc as its text contents. We can extract text from the existing PDF document by using getText() method of the PDFTextStripper class.

Follow the steps below to read text from the existing PDF document-

Load PDF Document

We can load the existing PDF document by using the static load() method. This method accepts a file object as a parameter. We can also invoke it using the class name PDDocument of the PDFBox.

Instantiate PDFTextStripper class

PDFTextStripper class is used to retrieve text from a PDF document. We can instantiate this class as following-

Retrieve Text

getText() method is used to read the text contents from the PDF document. In this method, we need to pass the document object as a parameter. This method returns the text as a string object.

Close Document

After completing the task, we need to close the PDDocument class object by using the close() method.

Example-

This is a PDF document,in which we are going to extract its text content by using PDFBox library of a Java program.

PDFBox Reading Text

Java Program-

Output:

After successful execution, the above program retrieves the text from the PDF document as shown in the following output.

PDFBox Reading Text

You may also like