External Content Extractor

Warning

This feature is currently in beta, usage in production environment is not recommended.

The external content extractor detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making it useful for search engine indexing, content analysis, translation, and much more.

Why use Tika Server as Content Extractor?

Zextras uses a Tika library that shares the same Java Virtual Machine (JVM) as the mailbox. With the Tika server you can you can have multiple Tika servers indexing the content separated from the mailbox. In case of a crash of a Tika server, the mailbox JVM remains unaffected.

Switching to the Tika Server

You can run Tika server as a docker container, on the same server as the mailbox; or on separate servers accessible by Zimbra.

Tika Server Management

Add a Tika Server

You can add a Tika server by running the following command on the Command Line Interface (CLI).

zxsuite powerstore Indexing content-extraction-tool add
zxsuite powerstore Indexing content-extraction-tool add *endpoint*
[param VALUE[,VALUE]]

Parameter List

NAME

TYPE

EXPECTED VALUES

DEFAULT

endpoint (M)

String

server(O)

String

global(O)

Boolean

true|false

(M) == mandatory parameter, (O) == optional parameter

Add tika endpoint for this mailbox store:

Usage Example

zxsuite powerstore Indexing content-extraction-tool add http://tika-server.example.com:9998/tika

Add tika endpoint for mailbox store store1.example.com:

Usage Example

zxsuite powerstore Indexing content-extraction-tool add http://tika-server.example.com:9998/tika server store1.example.com

Add tika endpoint for all mailbox stores (applies only to mailbox stores that don’t have any endpoint specified):

Usage Example

zxsuite powerstore Indexing content-extraction-tool add http://tika-server.example.com:9998/tika global true
List Tika Servers

You can list all Tika servers by running the following command on the Command Line Interface (CLI).

zxsuite powerstore Indexing content-extraction-tool list
zxsuite powerstore Indexing content-extraction-tool list [param
VALUE[,VALUE]]

Parameter List

NAME

TYPE

EXPECTED VALUES

DEFAULT

server(O)

String

global(O)

Boolean

true|false

(M) == mandatory parameter, (O) == optional parameter

List tika endpoints for this mailbox store:

Usage Example

zxsuite powerstore Indexing content-extraction-tool list

List tika endpoints for mailbox store store1.example.com:

Usage Example

zxsuite powerstore Indexing content-extraction-tool list server store1.example.com

List tika endpoints for all mailbox stores that don’t have any endpoint specified:

Usage Example

zxsuite powerstore Indexing content-extraction-tool list global true

A sample output lists all the running Tika servers with their addresses and the ports on which they are listening, for example:

content-extraction-endpoints
           http://test.example.com:9998/tika
Remove a Tika Server

You can remove a previously added Tika server by running the following command on the Command Line Interface (CLI).

zxsuite powerstore Indexing content-extraction-tool remove
zxsuite powerstore Indexing content-extraction-tool remove *endpoint*
[param VALUE[,VALUE]]

Parameter List

NAME

TYPE

EXPECTED VALUES

DEFAULT

** endpoint**(M)

String

server(O)

String

global(O)

Boolean

true|false

(M) == mandatory parameter, (O) == optional parameter

Remove tika endpoint for this mailbox store:

Usage Example

zxsuite powerstore Indexing content-extraction-tool remove http://tika-server.example.com:9998/tika

Remove tika endpoint for mailbox store store1.example.com:

Usage Example

zxsuite powerstore Indexing content-extraction-tool remove http://tika-server.example.com:9998/tika server store1.example.com

Remove tika endpoint used by all mailbox stores that don’t have any endpoint specified:

Usage Example

zxsuite powerstore Indexing content-extraction-tool remove http://tika-server.example.com:9998/tika global true

Is the Tika Server Running?

You can use the following methods to check if the Tika Server is running.

From the Graphical User Interface (GUI)
  1. Send an email with a new attachment

  2. Search for the attachment

From the CLI
  1. Navigate to /opt/zimbra/log

  2. View the contents of mailbox.log

    • You can use tail -f to follow in real time the new messages in the file.

Sample Output:

2021-07-07 15:24:25,444 INFO [qtp413601558-41832:https://mail.example.com/service/soap/SearchRequest] [name=user@mail.example.com;mid=136;oip=192.168.0.10;port=33008;ua=ZimbraWebClient - FF89 (Linux)/8.8.15_GA_4007;soapId=3084e510;] mailbox - Using http://test.example.com:9997/tika for content extraction