Home » Tika MS Office File Extraction

Tika MS Office File Extraction

by Online Tutorials Library

Tika MS Office File Extraction

To extract Microsoft office files such as xls file, Tika provides OOXMLParser class. This class is used to extract content and metadata from the Microsoft files. It is located into the org.apache.tika.parser.microsoft.ooxml package and contains various constructors and methods that are tabled below.

Tika OOXMLParser Constructor

Constructor Description
public OOXMLParser() It is used to instantiate the class.

Method Description
public Set<MediaType> getSupportedTypes(ParseContext context) It returns the set of media types supported by this parser.
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException It parses a document stream into a sequence of XHTML SAX events.

Tika OOXMLParser Example

Our file contains following content.

Tika MS Office File Extraction


Document Content:Sheet1  Employee Manual Punch  In TimeOut TimeDeviceTotal MinuteTotal TimeWorking Minutes  01-Nov-17 8:27:00 AM01-Nov-17 6:30:00 PM1603540-63  02-Nov-17 8:09:00 AM02-Nov-17 6:30:00 PM1621540-81  03-Nov-17 8:25:00 AM03-Nov-17 6:30:00 PM1605540-65    Document Metadata:  date:   2018-05-06T11:20:06Z  cp:revision:   1  custom:DocSecurity:   0  dc:creator:   Reception  dcterms:created:   2017-12-03T08:38:57Z  language:   en-IN  Last-Modified:   2018-05-06T11:20:06Z  dcterms:modified:   2018-05-06T11:20:06Z  Last-Save-Date:   2018-05-06T11:20:06Z  Template:     protected:   false  meta:save-date:   2018-05-06T11:20:06Z  Application-Name:   LibreOffice/$Linux_X86_64 LibreOffice_project/10m0$Build-2  modified:   2018-05-06T11:20:06Z  custom:LinksUpToDate:   false  Content-Type:   application/vnd.openxmlformats-officedocument.spreadsheetml.sheet  creator:   Reception  dc:language:   en-IN  meta:author:   Reception  meta:creation-date:   2017-12-03T08:38:57Z  extended-properties:Application:   LibreOffice/$Linux_X86_64 LibreOffice_project/10m0$Build-2  custom:ShareDoc:   false  custom:ScaleCrop:   false  Creation-Date:   2017-12-03T08:38:57Z  custom:HyperlinksChanged:   false  Revision-Number:   1  extended-properties:Template:     custom:AppVersion:   12.0000  

You may also like