Home » Tika Parsing Document to Plain Text

Tika Parsing Document to Plain Text

by Online Tutorials Library

Tika Parsing Document to Plain Text

Tika allows us to get extracted content in various formats like text, html or xhtml etc. The ContentHandler class is responsible for returning content. We can use BodyContentHandler also if want to get content of the document’s body as plain text.

Lets see an example in which we are getting plain text output from the html file.

Tika Parsing to Plain Text Example

Output:

Following is the our html file.

// index. html

<html>  <head>  <title>Index Page</title>  </head>  <body>  <h2>Hello, Welcome to Tutor Aspire. </h2>  </body>  </html>  

After extracting, it produces the output in plain text.

Hello, Welcome to Tutor Aspire.  

You may also like