Imhotep Architecture
Page Contents
Created by gh-md-toc
Learn more about the Imhotep architecture and the components necessary to run Imhotep.
Imhotep Backend Component
The ImhotepDaemon (a.k.a. Imhotep Server) is the back-end Java service responsible for servicing query requests. Add instances of ImhotepDaemon to help maintain high performance with large amounts of data and increased load.
It depends on:
-
A Zookeeper cluster to coordinate with other components
-
A storage layer (HDFS or S3) to pull down data shards for serving
Imhotep Frontend Components
IQL Webapp
The IQL webapp is a web-based user interface for issuing IQL queries.
Learn how to use this tool.
This Java webapp typically runs in the Tomcat7 servlet container behind the Apache web server.
It depends on:
-
A Zookeeper cluster to find ImhotepDaemon instances
-
ImhotepDaemon instances to service queries
IUpload Webapp
The IUpload Webapp (a.k.a. TSV uploader) is a web-based user interface for uploading data in TSV or CSV format into the Imhotep system.
Learn how to use this tool.
This Java webapp typically runs in the Tomcat7 servlet container behind the Apache web server.
It depends on:
- A storage layer (HDFS or S3) to place uploaded files
Optimally, you can directly place TSV/CSV data in the storage layer. To upload files directly to your S3 build bucket, place the files in the iupload/tsvtoindex/datasetName/ directory. As they are processed, they are moved to iupload/indexedtsv/datasetName/. You can also view the files in TSV Uploader.
Learn more about uploading data.
Shard Builder (TSV converter)
The shard builder typically runs as a scheduled cron job and handles converting TSV or CSV files uploaded to the storage layer into data shards for consumption by the ImhotepDaemon instances.
This builder is implemented in Java.
It depends on:
- A storage layer (HDFS or S3) to retrieve uploaded data and store converted data
Dependencies
Java
The Imhotep components have been tested with Java 7 from Oracle.
Storage Options
The storage layer for Imhotep can be HDFS (Apache Hadoop File System) or S3 (Amazon Simple Storage Service).
If you plan on running Imhotep in AWS, use S3. Otherwise, choose HDFS, as we do for this docker evaluation version of the stack.
Imhotep has been tested with the CDH5 distribution of Hadoop.
Zookeeper
The Zookeeper cluster is used for coordination among the ImhotepDaemon instances and the IQL webapp frontend.
Imhotep has been tested with Zookeeper 3.4.5 from the CDH 5 distribution. Download here.