98
Tika MS Office File Extraction
To extract Microsoft office files such as xls file, Tika provides OOXMLParser class. This class is used to extract content and metadata from the Microsoft files. It is located into the org.apache.tika.parser.microsoft.ooxml package and contains various constructors and methods that are tabled below.
Tika OOXMLParser Constructor
Constructor | Description |
---|---|
public OOXMLParser() | It is used to instantiate the class. |
Method | Description |
---|---|
public Set<MediaType> getSupportedTypes(ParseContext context) | It returns the set of media types supported by this parser. |
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException | It parses a document stream into a sequence of XHTML SAX events. |
Tika OOXMLParser Example
Our file contains following content.
Output:
Document Content:Sheet1 Employee Manual Punch In TimeOut TimeDeviceTotal MinuteTotal TimeWorking Minutes 01-Nov-17 8:27:00 AM01-Nov-17 6:30:00 PM1603540-63 02-Nov-17 8:09:00 AM02-Nov-17 6:30:00 PM1621540-81 03-Nov-17 8:25:00 AM03-Nov-17 6:30:00 PM1605540-65 Document Metadata: date: 2018-05-06T11:20:06Z cp:revision: 1 custom:DocSecurity: 0 dc:creator: Reception dcterms:created: 2017-12-03T08:38:57Z language: en-IN Last-Modified: 2018-05-06T11:20:06Z dcterms:modified: 2018-05-06T11:20:06Z Last-Save-Date: 2018-05-06T11:20:06Z Template: protected: false meta:save-date: 2018-05-06T11:20:06Z Application-Name: LibreOffice/5.1.6.2$Linux_X86_64 LibreOffice_project/10m0$Build-2 modified: 2018-05-06T11:20:06Z custom:LinksUpToDate: false Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet creator: Reception dc:language: en-IN meta:author: Reception meta:creation-date: 2017-12-03T08:38:57Z extended-properties:Application: LibreOffice/5.1.6.2$Linux_X86_64 LibreOffice_project/10m0$Build-2 custom:ShareDoc: false custom:ScaleCrop: false Creation-Date: 2017-12-03T08:38:57Z custom:HyperlinksChanged: false Revision-Number: 1 extended-properties:Template: custom:AppVersion: 12.0000
Next TopicTika Extracting Image