What is Elasticsearch and why is it used for log analysis?

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It is the heart of the Elastic Stack (ELK). It is used for log analysis because it can ingest, search, and parse massive volumes of unstructured log data in near real-time, making it invaluable for DevOps troubleshooting and security monitoring.

Why does my Elasticsearch cluster constantly run out of memory (OOM killer)?

Elasticsearch requires significant RAM. By default, it allocates 50% of total physical RAM to the JVM heap. If the heap is too small, it crashes with OutOfMemoryError. If the heap is set too large (over 32GB), it loses compressed object pointers, degrading performance. Always set Xms and Xmx to the same value, not exceeding 50% of total RAM or 31GB.

What is 'yellow' or 'red' cluster status and how do I fix it?

Green means all primary and replica shards are allocated. Yellow means all primaries are allocated, but some replicas are unassigned (often due to single-node clusters). Red means some primary shards are unassigned, resulting in data loss. Run `GET _cluster/allocation/explain` to diagnose why shards are unassigned.

Elasticsearch Setup and Centralized Log Analysis Guide

TL;DR — Quick Summary

Configure Elasticsearch for robust centralized log analysis, troubleshooting node failures, fixing memory issues, and optimizing index lifecycle management.

Centralized Logging with Elasticsearch

In modern infrastructure, applications are distributed across dozens of microservices, containers, and servers. Troubleshooting by SSHing into individual machines and grepping log files is no longer viable.

Elasticsearch (part of the ELK stack alongside Logstash and Kibana) solves this by collecting, indexing, and allowing blazing-fast searches across all your logs centrally.

Prerequisites

A Linux server (Ubuntu 22.04 or RHEL 9 recommended).
At least 8 GB RAM (Elasticsearch is resource-heavy).
Open JDK 17+ (usually bundled with modern ES installations).
Root or sudo privileges.

Step-by-Step Installation

1. Add Repository and Install (Ubuntu/Debian)

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update && sudo apt install elasticsearch -y

Note: The installer automatically generates an elastic superuser password and TLS certificates. Save this output!

2. Configure JVM Heap Size

Elasticsearch uses the Java Virtual Machine (JVM). Memory configuration is critical.

Edit /etc/elasticsearch/jvm.options.d/heap.options:

# Set to exact same value. Max is 50% of total RAM, no more than 31g.
-Xms4g
-Xmx4g

3. Configure the Cluster

Edit /etc/elasticsearch/elasticsearch.yml:

cluster.name: prod-logs-cluster
node.name: es-node-01
network.host: 192.168.1.50   # Use 0.0.0.0 for all interfaces, 127.0.0.1 for local testing
http.port: 9200

# Security (Auto-configured in 8.x)
xpack.security.enabled: true
xpack.security.http.ssl.enabled: true

4. Start the Service

sudo systemctl daemon-reload
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

Verify it’s running:

curl --cacert /etc/elasticsearch/certs/http_ca.crt -u elastic:YOUR_PASSWORD https://192.168.1.50:9200

Managing Log Data (Index Lifecycle Management)

Logs grow infinitely. If you don’t manage them, Elasticsearch will run out of disk space, enter read-only mode, and stop accepting new logs.

Index Lifecycle Management (ILM) automates this.

Create an ILM Policy (via cURL or Kibana Dev Tools)

This policy rolls over to a new index when it reaches 50GB or 30 days old, and permanently deletes data after 90 days.

PUT _ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50GB",
            "max_age": "30d"
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Troubleshooting Cluster Health

Check health: GET _cluster/health

”Yellow” Status

Symptom: Primary shards are active, but replica shards are unassigned. Cause: You have an index configured to keep 1 replica, but only 1 node in the cluster. Elasticsearch will never assign a replica to the same node as the primary. Fix (for single-node testing): Set replicas to 0.

PUT /*/_settings
{
  "index" : {
    "number_of_replicas" : 0
  }
}

”Red” Status

Symptom: Primary shards are missing. Data is actively unavailable. Cause: A node crashed, disks are full, or indices are corrupt. Fix: Run GET _cluster/allocation/explain to see exactly why Elasticsearch is refusing to assign the shard.

Disk Usage Exception (High Watermark)

Error: cluster_block_exception [FORBIDDEN/12/index read-only / allow delete (api)] Cause: Disk hit 95% capacity. ES locks indices to prevent corruption. Fix:

Free up disk space (delete old indices).
Manually unlock the indices:

PUT _all/_settings
{
  "index.blocks.read_only_allow_delete": null
}

Summary

Set JVM heap (-Xms and -Xmx) to identical values, strictly 50% of system RAM.
Always use Index Lifecycle Management (ILM) to prevent log partitions from filling up your disks.
Diagnose Yellow/Red cluster states rapidly using the _cluster/allocation/explain API endpoint.