87
Tika Facade
In Tika, document parsing can be done either using Tika facade or using Auto-Detect Parser. Both are used to parse document without specific parser.
Apache Tika provides a facade class for accessing Tika functionality. This class provides methods to implement parsing and detection operations.
It is located inside the org.apache.tika.Tika package. It contains various constructors and methods that are given below.
Tika Constructors
Following are the constructors of Tika Facade class.
Constructor | Description |
---|---|
Tika() | It is used to create a Tika facade using the default configuration. |
Tika(Detector detector) | It is used to create a Tika facade using the given detector instance. |
Tika(Detector detector, Parser parser) | It is used to create a Tika facade using the given detector and parser instances. |
Tika(Detector detector, Parser parser, Translator translator) | It is used to create a Tika facade using the given detector, parser, and translator instances. |
Tika(TikaConfig config) | It is used to create a Tika facade using the given configuration. |
Tika Methods
Following are the methods of Tika Facade class.
Method | Description |
---|---|
public String detect(byte[] prefix) | It detects the media type of the given document. |
public String detect(Path path) throws IOException | It detects the media type of the file at the given path. |
public String detect(File file) throws IOException | It detects the media type of the given file. |
public String detect(URL url) throws IOException | It detects the media type of the resource at the given URL. |
public String detect(String name) | It detects the media type of a document with the given file name. |
public String translate(String text, String sourceLanguage, String targetLanguage) | It translates the given text String to and from the given languages. |
public String translate(String text, String targetLanguage) | It translates the given text String to the given language. |
public Reader parse(InputStream stream, Metadata metadata) throws IOException | It parses the given document and returns the extracted text content. |
public Reader parse(InputStream stream) throws IOException | It parses the given document and returns the extracted text content. |
public Reader parse(Path path, Metadata metadata) throws IOException | It parses the file at the given path and returns the extracted text content. |
public String parseToString(InputStream stream, Metadata metadata) throws IOException, TikaException | It parses the given document and returns the extracted text content. |
public int getMaxStringLength() | Returns the maximum length of strings returned by the parseToString methods. |
Tika Example
We are extracting content from text file using Tika facade.
Output:
Following is the content of hello.txt file.
Hello, Welcome to Tutor Aspire
Next TopicTika Auto-Detect Parser