External Content Extractor
Warning
This feature is currently in beta, usage in production environment is not recommended.
The external content extractor detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making it useful for search engine indexing, content analysis, translation, and much more.
Why use Tika Server as Content Extractor?
Zextras uses a Tika library that shares the same Java Virtual Machine (JVM) as the mailbox. With the Tika server you can you can have multiple Tika servers indexing the content separated from the mailbox. In case of a crash of a Tika server, the mailbox JVM remains unaffected.
Switching to the Tika Server
You can run Tika server as a docker container, on the same server as the mailbox; or on separate servers accessible by Zimbra.
Tika Server Management
You can add a Tika server by running the following command on the Command Line Interface (CLI).
zxsuite powerstore Indexing content-extraction-tool add
zxsuite powerstore Indexing content-extraction-tool add *endpoint*
[param VALUE[,VALUE]]
Parameter List
NAME |
TYPE |
EXPECTED VALUES |
DEFAULT |
endpoint (M) |
String |
||
server(O) |
String |
||
global(O) |
Boolean |
true|false |
(M) == mandatory parameter, (O) == optional parameter
Add tika endpoint for this mailbox store:
Usage Example
zxsuite powerstore Indexing content-extraction-tool add http://tika-server.example.com:9998/tika
Add tika endpoint for mailbox store store1.example.com:
Usage Example
zxsuite powerstore Indexing content-extraction-tool add http://tika-server.example.com:9998/tika server store1.example.com
Add tika endpoint for all mailbox stores (applies only to mailbox stores that don’t have any endpoint specified):
Usage Example
zxsuite powerstore Indexing content-extraction-tool add http://tika-server.example.com:9998/tika global true
You can list all Tika servers by running the following command on the Command Line Interface (CLI).
zxsuite powerstore Indexing content-extraction-tool list
zxsuite powerstore Indexing content-extraction-tool list [param
VALUE[,VALUE]]
Parameter List
NAME |
TYPE |
EXPECTED VALUES |
DEFAULT |
server(O) |
String |
||
global(O) |
Boolean |
true|false |
(M) == mandatory parameter, (O) == optional parameter
List tika endpoints for this mailbox store:
Usage Example
zxsuite powerstore Indexing content-extraction-tool list
List tika endpoints for mailbox store store1.example.com:
Usage Example
zxsuite powerstore Indexing content-extraction-tool list server store1.example.com
List tika endpoints for all mailbox stores that don’t have any endpoint specified:
Usage Example
zxsuite powerstore Indexing content-extraction-tool list global true
A sample output lists all the running Tika servers with their addresses and the ports on which they are listening, for example:
content-extraction-endpoints
http://test.example.com:9998/tika
You can remove a previously added Tika server by running the following command on the Command Line Interface (CLI).
zxsuite powerstore Indexing content-extraction-tool remove
zxsuite powerstore Indexing content-extraction-tool remove *endpoint*
[param VALUE[,VALUE]]
Parameter List
NAME |
TYPE |
EXPECTED VALUES |
DEFAULT |
** endpoint**(M) |
String |
||
server(O) |
String |
||
global(O) |
Boolean |
true|false |
(M) == mandatory parameter, (O) == optional parameter
Remove tika endpoint for this mailbox store:
Usage Example
zxsuite powerstore Indexing content-extraction-tool remove http://tika-server.example.com:9998/tika
Remove tika endpoint for mailbox store store1.example.com:
Usage Example
zxsuite powerstore Indexing content-extraction-tool remove http://tika-server.example.com:9998/tika server store1.example.com
Remove tika endpoint used by all mailbox stores that don’t have any endpoint specified:
Usage Example
zxsuite powerstore Indexing content-extraction-tool remove http://tika-server.example.com:9998/tika global true
Is the Tika Server Running?
You can use the following methods to check if the Tika Server is running.
Send an email with a new attachment
Search for the attachment
Navigate to
/opt/zimbra/log
View the contents of
mailbox.log
You can use tail -f to follow in real time the new messages in the file.
Sample Output:
2021-07-07 15:24:25,444 INFO [qtp413601558-41832:https://mail.example.com/service/soap/SearchRequest] [name=user@mail.example.com;mid=136;oip=192.168.0.10;port=33008;ua=ZimbraWebClient - FF89 (Linux)/8.8.15_GA_4007;soapId=3084e510;] mailbox - Using http://test.example.com:9997/tika for content extraction