As a senior software developer with 3.5 years of experience, I’ve worked extensively with Elasticsearch and Golang, focusing on optimizing performance for large-scale applications. In this article, we’ll explore strategies to improve Elasticsearch indexing performance using Golang, particularly with Elasticsearch version 8 and above.

Introduction

Elasticsearch is a powerful search engine widely used for its full-text search capabilities and real-time analytics. However, indexing large volumes of data efficiently can be challenging. This article will focus on optimizing indexing performance using Golang, including setting up the Elasticsearch client, utilizing the bulk API, and implementing concurrency with Goroutines.

Setting Up Elasticsearch with Golang

First, let’s set up Elasticsearch and the Go client.

Installing Elasticsearch

Download and install Elasticsearch from the official website or using a package manager. Ensure you have Elasticsearch version 8 or above.

Installing Go Client for Elasticsearch

We’ll use the official Go client for Elasticsearch.

go get github.com/elastic/go-elasticsearch/v8

Configuring the Client

Create a configuration file to initialize the Elasticsearch client.

package main

import (
“log”
“github.com/elastic/go-elasticsearch/v8”
)

func main() {
cfg := elasticsearch.Config{
Addresses: []string{
“http://localhost:9200”,
},
}

es, err := elasticsearch.NewClient(cfg)
if err != nil {
log.Fatalf(“Error creating the client: %s”, err)
}

// Test the connection
res, err := es.Info()
if err != nil {
log.Fatalf(“Error getting response: %s”, err)
}
defer res.Body.Close()

log.Println(res)
}

Efficient Indexing with the Bulk API

Indexing documents one by one is inefficient, especially with large datasets. The bulk API allows you to index multiple documents in a single request, significantly improving performance.

Implementing Bulk Indexing

Here’s how you can implement bulk indexing in Golang.

package main

import (
“bytes”
“encoding/json”
“fmt”
“log”
“strings”
“github.com/elastic/go-elasticsearch/v8”
)

type Document struct {
Title string `json:”title”`
Content string `json:”content”`
}

func main() {
cfg := elasticsearch.Config{
Addresses: []string{
“http://localhost:9200”,
},
}
es, err := elasticsearch.NewClient(cfg)
if err != nil {
log.Fatalf(“Error creating the client: %s”, err)
}

docs := []Document{
{Title: “Document 1”, Content: “This is the content of document 1”},
{Title: “Document 2”, Content: “This is the content of document 2”},
// Add more documents as needed
}

var buf bytes.Buffer
for _, doc := range docs {
meta := []byte(fmt.Sprintf(`{ “index” : { “_index” : “my-index” } }%s`, “n”))
data, err := json.Marshal(doc)
if err != nil {
log.Fatalf(“Error marshaling document: %s”, err)
}
data = append(data, “n”…)
buf.Grow(len(meta) + len(data))
buf.Write(meta)
buf.Write(data)
}

res, err := es.Bulk(bytes.NewReader(buf.Bytes()))
if err != nil {
log.Fatalf(“Error getting response: %s”, err)
}
defer res.Body.Close()

if res.IsError() {
log.Fatalf(“Error indexing documents: %s”, res.String())
}

log.Println(“Documents indexed successfully”)
}

Using Goroutines for Parallel Indexing

To further enhance performance, we can leverage Goroutines to index documents concurrently.

Implementing Concurrency

Here’s how to use Goroutines for parallel indexing.

package main

import (
“bytes”
“encoding/json”
“fmt”
“log”
“sync”
“github.com/elastic/go-elasticsearch/v8”
)

type Document struct {
Title string `json:”title”`
Content string `json:”content”`
}

func bulkIndex(es *elasticsearch.Client, docs []Document, wg *sync.WaitGroup) {
defer wg.Done()

var buf bytes.Buffer
for _, doc := range docs {
meta := []byte(fmt.Sprintf(`{ “index” : { “_index” : “my-index” } }%s`, “n”))
data, err := json.Marshal(doc)
if err != nil {
log.Fatalf(“Error marshaling document: %s”, err)
}
data = append(data, “n”…)
buf.Grow(len(meta) + len(data))
buf.Write(meta)
buf.Write(data)
}

res, err := es.Bulk(bytes.NewReader(buf.Bytes()))
if err != nil {
log.Fatalf(“Error getting response: %s”, err)
}
defer res.Body.Close()

if res.IsError() {
log.Fatalf(“Error indexing documents: %s”, res.String())
}

log.Println(“Documents indexed successfully”)
}

func main() {
cfg := elasticsearch.Config{
Addresses: []string{
“http://localhost:9200”,
},
}
es, err := elasticsearch.NewClient(cfg)
if err != nil {
log.Fatalf(“Error creating the client: %s”, err)
}

docs := []Document{
{Title: “Document 1”, Content: “This is the content of document 1”},
{Title: “Document 2”, Content: “This is the content of document 2”},
// Add more documents as needed
}

var wg sync.WaitGroup
chunkSize := 10
for i := 0; i < len(docs); i += chunkSize {
end := i + chunkSize
if end > len(docs) {
end = len(docs)
}

wg.Add(1)
go bulkIndex(es, docs[i:end], &wg)
}

wg.Wait()
log.Println(“All documents indexed successfully”)
}

Best Practices and Performance Tips

Use Bulk API: Always use the bulk API for indexing large datasets to minimize the overhead of individual requests.Tune Batch Size: Experiment with different batch sizes to find the optimal balance between request size and processing time.Monitor Cluster Health: Regularly monitor Elasticsearch cluster health and node performance to identify bottlenecks.Optimize Mapping: Define appropriate mappings to avoid dynamic mapping updates which can be costly.Use Concurrency: Leverage Goroutines for concurrent indexing to fully utilize CPU and network resources.Adjust Refresh Interval: Set a longer refresh interval during bulk indexing to reduce the overhead of frequent segment merges.{
“index”: {
“refresh_interval”: “30s”
}
}

Conclusion

Improving Elasticsearch indexing performance with Golang involves using the bulk API, implementing concurrency with Goroutines, and following best practices for cluster and index management. By adopting these strategies, you can achieve significant performance gains and handle large-scale indexing efficiently.

Remember, the key to optimization is continuous monitoring and fine-tuning based on your specific use case and data characteristics. With the right approach, Elasticsearch and Golang can form a powerful combination for high-performance search and indexing applications.

Improving Elasticsearch Indexing Performance with Golang was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

​ Level Up Coding – Medium

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner
Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.