Service Configuration
Indexify is configured by a YAML configuration file. The easiest way to start is by generating it with the CLI or by downloading a sample configuration file, and then tweaking it to fit your needs.
Generate with CLI
Unable to find ./indexify
?
Don't forget to download our Indexify
binary before running the command below. You can do by running the command curl https://getindexify.ai | sh
.
This will in turn download the relevant binary at the relative path ./indexify
.
Configuration Reference
Network Configuration
listen_if: 0.0.0.0
api_port: 8900
coordinator_port: 8950
coordinator_http_port: 8960
raft_port: 8970
coordinator_addr: 0.0.0.0:8950
- listen_if: The interface on which the servers listens on. Typically you would want to listen on all interfaces.
- api_port: The port in which the application facing API server is exposed. This is the HTTP port on which applications upload data, create extraction policies and retrieved extracted data from indexes.
- coordinator_port: Port on which the coordinator is exposed. This is available as a separate configuration becasue in the dev mode, we expose both the api server and the coordinator server in the same process.
- coordinator_http_port Port to access coordinator metrics
- raft_port: Port on which internal messages across coordinator nodes are transmitted. This is only needed if Indexify is either started as a coordinator or in dev mode.
Don't forget to configure a volume
Indexify stores all of the the Extraction Graphs you've configured and data it has processed locally. This is configured in indexify.yaml
as seen below
Don't forget to configure a persistent volume at this location if you'll like to make sure you don't lose your data when your server restarts.
Blob Storage Configuration
Blob Storage Configuration refers to the raw bytes of unstructured data. For instance if you're splitting your text data into chunks, these text chunks will be stored at the location you specify below.
We support two forms of blob storage at the moment - Disk and S3 Storage.
Disk
A common use-case for disk storage is if you're using a shared volume to replicate/share data between different processes.
S3 Storage
For S3 Storage, you'll need to also ensure you have the two following environment variables configured. Once you've configured these environment variables, our S3 integration will take care of the rest
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
Vector Index Storage
- index_store: (Default: LancDb): Name of the vector be, possible values:
LancdDb
,Qdrant
,PgVector
Qdrant Config
addr
: Address of the Qdrant http endpoint
Pg Vector Config
addr
: Address of Postgres
index_config:
index_store: PgVector
pg_vector_config:
addr: postgres://postgres:postgres@localhost/indexify
m: 16
efconstruction: 64
LanceDb Config
path
: Path of the database
Caching
API Server TLS
To set up mTLS for the indexify server, you first need to create a root certificate along with a client certificate and key pair along with a server certificate and key pair. The commands below will generate the certificates and keys and store them in a folder called .dev-tls
.
local-dev-tls-insecure: ## Generate local development TLS certificates (insecure)
@mkdir -p .dev-tls && \
openssl req -x509 -newkey rsa:4096 -keyout .dev-tls/ca.key -out .dev-tls/ca.crt -days 365 -nodes -subj "/C=US/ST=TestState/L=TestLocale/O=IndexifyOSS/CN=localhost" && \
openssl req -new -newkey rsa:4096 -keyout .dev-tls/server.key -out .dev-tls/server.csr -nodes -config ./client_cert_config && \
openssl x509 -req -in .dev-tls/server.csr -CA .dev-tls/ca.crt -CAkey .dev-tls/ca.key -CAcreateserial -out .dev-tls/server.crt -days 365 -extensions v3_ca -extfile ./client_cert_config && \
openssl req -new -nodes -out .dev-tls/client.csr -newkey rsa:2048 -keyout .dev-tls/client.key -config ./client_cert_config && \
openssl x509 -req -in .dev-tls/client.csr -CA .dev-tls/ca.crt -CAkey .dev-tls/ca.key -CAcreateserial -out .dev-tls/client.crt -days 365 -extfile ./client_cert_config -extensions v3_ca
Once you have the certificates and keys generated, add the config below to your server config and provide the paths to where you have stored the root certificate and the server certificate and key pair.
tls:
api: true
ca_file: .dev-tls/ca.crt # Path to the CA certificate
cert_file: .dev-tls/server.crt # Path to the server certificate
key_file: .dev-tls/server.key # Path to the server private key
HA configuration
To setup mulitple coordinator nodes for high availability configuration, start with a single node, called a seed node. Create a separate configuration file for each additional coordinator instance. Each node should have a unique node_id field in configuration file. seed_node field should be set to ip address and port of the original coordinator node.
Seed node:
New node (replace 10.0.0.10 with actual seed node IP address, 8970 should match configured raft_port of the seed node):