External Content Extractor
This feature is currently in beta, usage in production environment is not recommended. |
The external content extractor detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making it useful for search engine indexing, content analysis, translation, and much more.
Why use Tika Server as Content Extractor?
Zextras uses a Tika library that shares the same Java Virtual Machine (JVM) as the mailbox. With the Tika server you can you can have multiple Tika servers indexing the content separated from the mailbox. In case of a crash of a Tika server, the mailbox JVM remains unaffected.
Switching to the Tika Server
You can run Tika server as a docker container, on the same server as the mailbox; or on separate servers accessible by Zimbra.
Add a Tika Server
You can add a Tika server by running the following command on the Command Line Interface (CLI).
- Format
zxsuite powerstore Indexing content-extraction-tool add {endpoint} [attr1 value1 [attr2 value2...]]
PARAMETER LIST
NAME TYPE EXPECTED VALUES endpoint(M) String server(O) String global(O) Boolean true|false
- Example
zxsuite powerstore Indexing content-extraction-tool add http://test.example.com:9997/tika
- Explanation
-
Zextras adds an endpoint with address
http://test.example.com
listening on port9997
- Add tika endpoint for this mailbox store
-
Run the below command, as a zimbra user, from the same server as the mailbox
zxsuite powerstore Indexing content-extraction-tool add http://test.example.com:9998/tika
- Add tika endpoint for mailbox store store1.example.com
-
Run the below command, as a zimbra user, from the same server as the mailbox
zxsuite powerstore Indexing content-extraction-tool add http://test.example.com/tika server store1.example.com
- Add tika endpoint for all mailbox stores (applies only to mailbox stores that don’t have any endpoint specified)
zxsuite powerstore Indexing content-extraction-tool add http://test.example.com:9998/tika global true
List Tika Servers
You can list all Tika servers by running the following command on the Command Line Interface (CLI).
- Command
zxsuite powerstore Indexing content-extraction-tool list
- Sample Output
content-extraction-endpoints
http://test.example.com:9998/tika
- Explanation
-
Zextras lists all the running Tika servers with their addresses and the ports on which they are listening.
Remove a Tika Server
You can remove a previously added Tika server by running the following command on the Command Line Interface (CLI).
- Format
zxsuite powerstore Indexing content-extraction-tool remove {endpoint} [attr1 value1 [attr2 value2...]]
PARAMETER LIST
NAME TYPE EXPECTED VALUES endpoint(M) String server(O) String global(O) Boolean true|false
(M) == mandatory parameter, (O) == optional parameter
- Example
zxsuite powerstore Indexing content-extraction-tool remove http://test.example.com:9997/tika
- Explanation
-
Zextras removes the server with address
http://test.example.com
listening on port9997
Is the Tika Server Running?
You can use the following methods to check if the Tika Server is running.
- Graphical User Interface (GUI)
-
-
Send an email with a new attachment.
-
Search for the attachment.
-
- Command Line Interface (CLI)
-
-
Navigate to
/opt/zimbra/log
. -
View the contents of
mailbox.log
.-
You can use
tail -f
.
-
-
- Sample Output
2021-07-07 15:24:25,444 INFO [qtp413601558-41832:https://mail.example.com/service/soap/SearchRequest] [name=user@mail.example.com;mid=136;oip=192.168.0.10;port=33008;ua=ZimbraWebClient - FF89 (Linux)/8.8.15_GA_4007;soapId=3084e510;] mailbox - Using http://test.example.com:9997/tika for content extraction