TL;DR — Quick Summary

Configure Elasticsearch for robust centralized log analysis, troubleshooting node failures, fixing memory issues, and optimizing index lifecycle management.

Centralized Logging with Elasticsearch

In modern infrastructure, applications are distributed across dozens of microservices, containers, and servers. Troubleshooting by SSHing into individual machines and grepping log files is no longer viable.

Elasticsearch (part of the ELK stack alongside Logstash and Kibana) solves this by collecting, indexing, and allowing blazing-fast searches across all your logs centrally.

Prerequisites

  • A Linux server (Ubuntu 22.04 or RHEL 9 recommended).
  • At least 8 GB RAM (Elasticsearch is resource-heavy).
  • Open JDK 17+ (usually bundled with modern ES installations).
  • Root or sudo privileges.

Step-by-Step Installation

1. Add Repository and Install (Ubuntu/Debian)

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update && sudo apt install elasticsearch -y

Note: The installer automatically generates an elastic superuser password and TLS certificates. Save this output!

2. Configure JVM Heap Size

Elasticsearch uses the Java Virtual Machine (JVM). Memory configuration is critical.

Edit /etc/elasticsearch/jvm.options.d/heap.options:

# Set to exact same value. Max is 50% of total RAM, no more than 31g.
-Xms4g
-Xmx4g

3. Configure the Cluster

Edit /etc/elasticsearch/elasticsearch.yml:

cluster.name: prod-logs-cluster
node.name: es-node-01
network.host: 192.168.1.50   # Use 0.0.0.0 for all interfaces, 127.0.0.1 for local testing
http.port: 9200

# Security (Auto-configured in 8.x)
xpack.security.enabled: true
xpack.security.http.ssl.enabled: true

4. Start the Service

sudo systemctl daemon-reload
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

Verify it’s running:

curl --cacert /etc/elasticsearch/certs/http_ca.crt -u elastic:YOUR_PASSWORD https://192.168.1.50:9200

Managing Log Data (Index Lifecycle Management)

Logs grow infinitely. If you don’t manage them, Elasticsearch will run out of disk space, enter read-only mode, and stop accepting new logs.

Index Lifecycle Management (ILM) automates this.

Create an ILM Policy (via cURL or Kibana Dev Tools)

This policy rolls over to a new index when it reaches 50GB or 30 days old, and permanently deletes data after 90 days.

PUT _ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50GB",
            "max_age": "30d"
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Troubleshooting Cluster Health

Check health: GET _cluster/health

”Yellow” Status

Symptom: Primary shards are active, but replica shards are unassigned. Cause: You have an index configured to keep 1 replica, but only 1 node in the cluster. Elasticsearch will never assign a replica to the same node as the primary. Fix (for single-node testing): Set replicas to 0.

PUT /*/_settings
{
  "index" : {
    "number_of_replicas" : 0
  }
}

”Red” Status

Symptom: Primary shards are missing. Data is actively unavailable. Cause: A node crashed, disks are full, or indices are corrupt. Fix: Run GET _cluster/allocation/explain to see exactly why Elasticsearch is refusing to assign the shard.

Disk Usage Exception (High Watermark)

Error: cluster_block_exception [FORBIDDEN/12/index read-only / allow delete (api)] Cause: Disk hit 95% capacity. ES locks indices to prevent corruption. Fix:

  1. Free up disk space (delete old indices).
  2. Manually unlock the indices:
PUT _all/_settings
{
  "index.blocks.read_only_allow_delete": null
}

Summary

  • Set JVM heap (-Xms and -Xmx) to identical values, strictly 50% of system RAM.
  • Always use Index Lifecycle Management (ILM) to prevent log partitions from filling up your disks.
  • Diagnose Yellow/Red cluster states rapidly using the _cluster/allocation/explain API endpoint.