Have you ever wondered what happens to ongoing requests when you deploy a new version of a service and it gets restarted? Do these operations finish correctly, or are they abruptly terminated with an exception? If you’re unsure, it’s likely the latter. To ensure that your ongoing operations are complete before the service terminates, you need to design your system for graceful shutdown. This means giving the system some time to finish ongoing operations before it shuts down. In this post, I will explain why graceful shutdown is important in modern software development and how to implement it for a Filesystem Listener Service written in Go.

The idea of a graceful shutdown

A graceful shutdown is the process of shutting down a system, application, or service in a manner that allows it to complete its current tasks, release resources properly, and save its state, thereby minimizing disruption and data loss.

The idea of a graceful shutdown

Imagine a service that does not care about graceful shutdown.

Diagram for a service that does not follow a graceful shutdown approach

In the above diagram, we can see, that the termination signal (e.g., SIGTERM, SIGINT) interrupts all ongoing operations, and the information about them is lost.

How can we improve it?

Diagram for a service that follows a graceful shutdown approach

As soon as the termination signal is received, we give in-progress operations some time to complete (we also have to stop accepting new tasks). And after that grace period, we close the system. However, preparing a service for graceful shutdown does require a shift in design thinking:

Tracking Ongoing Operations: You need a mechanism to track and monitor ongoing operations. This might involve using counters, flags, or other indicators that can be checked to determine if any processes are still running.Signal Handling: Your application should be able to handle termination signals (e.g., SIGTERM, SIGINT) and respond appropriately.Waiting for Completion: Once a shutdown signal is received, the system needs to allow time for ongoing operations to complete.Resource Cleanup: Ensure that all resources (e.g., database connections, and file handles) are properly closed and cleaned up during shutdown.

Why it matters

The reason we should care about graceful shutdown is that we no longer treat our servers as pets, but rather as cattle. This means that servers can be killed or restarted at any time if needed. Traditionally, servers were treated like pets. Each server was unique, carefully maintained, named, and manually configured. If a server had an issue, it was individually tended to and fixed. In contrast, the modern approach treats servers like cattle. Each server is identical and can be quickly replaced if it fails. They are managed as a group rather than individually.

The cattle approach makes autoscaling possible, a popular method in containers’ orchestrators (like Kubernetes) where the number of service replicas adjusts based on demand. This approach allows replicas to be easily started or destroyed as needed.

We want to deliver new system versions in the DevOps culture organizations quite frequently. Continuous deployment practices aim for zero downtime during updates or scaling operations. Graceful shutdowns are essential to achieving this by allowing services to withdraw gracefully, thus preventing abrupt disconnections and errors.

How to implement it

Filesystem Listner Service

As an example of a graceful shutdown implementation, we present a simple File Listener Service written in Golong that watches for events (we employ fsnotify library) from the filesystem. When the file is created, modified, or deleted, an event is handled by the service.

package main

import (
“context”
“fmt”
“os”
“os/signal”
“strings”
“sync”
“syscall”
“time”
“github.com/fsnotify/fsnotify”
“github.com/rs/zerolog”
)

func main() {
logger, err := configureLogger(“INFO”)
if err != nil {
logger.Fatal().Err(err).Msg(“Cannot configure logger”)
}
ctx := logger.WithContext(context.Background())
watcher, err := fsnotify.NewWatcher()
if err != nil {
logger.Fatal().Err(err).Msg(“Cannot create filesystem watcher”)
}
listener := NewFileListener(watcher, FileHandler{})
go listener.Start(ctx, “./”)
<-make(chan int)
}

type FileHandler struct{}

type FileListener struct {
watcher *fsnotify.Watcher
handler FileHandler
ongoindHandlers sync.WaitGroup
}

func (dfh *FileHandler) Handle(ctx context.Context, event fsnotify.Event) {
// Do something with the event
time.Sleep(15 * time.Second)
}

func NewFileListener(
watcher *fsnotify.Watcher, handler FileHandler,
) *FileListener {
return &FileListener{
watcher: watcher,
handler: handler,
}
}

// Start trigger FilesystemListener to listen on file events within sourcePath directory indefinetely
func (fl *FileListener) Start(ctx context.Context, sourcePath string) error {
logger := zerolog.Ctx(ctx)
logger.Info().Str(“sourcePath”, sourcePath).Msg(“Filesystem listener started …”)
err := fl.watcher.Add(sourcePath)
if err != nil {
return fmt.Errorf(“cannot add %s path to filesystem listener: %w”, sourcePath, err)
}
go fl.listen(ctx)
return nil
}

func (fl *FileListener) listen(ctx context.Context) {
logger := zerolog.Ctx(ctx)
for {
select {
case event, ok := <-fl.watcher.Events:
if !ok {
logger.Info().Msg(“File listener stopped listening”)
return
}
go fl.handle(ctx, event)
case err, ok := <-fl.watcher.Errors:
if !ok {
logger.Info().Msg(“File listener stopped listening”)
return
}
logger.Error().Err(err).Msg(“File listener event error”)
}
}
}

func (fl *FileListener) handle(ctx context.Context, event fsnotify.Event) {
logger := zerolog.Ctx(ctx)
logger.Info().Str(“event”, event.String()).Msg(“File event received”)

fl.ongoindHandlers.Add(1)
defer fl.ongoindHandlers.Done()

gfl.handler.Handle(ctx, event)
logger.Info().Str(“event”, event.String()).Msg(“File event handled”)
}

We imitate long handling by time.Sleep:

func (dfh *FileHandler) Handle(ctx context.Context, event fsnotify.Event) {
// Do something with the event
time.Sleep(15 * time.Second)
}

The listener is initialized to watch for the directory we executed the service:

go listener.Start(ctx, “./”)

What happens when we have some ongoing task (handling created p.txt file), but suddenly stop the service (using ctrl+c).

{“time”:”2024-05-19T10:19:59Z”,”message”:”Filesystem listener started …”}
{“message”:”File event received”, “event”:”CREATE “p.txt””,”time”:”2024-05-19T10:20:03Z”,}
^Csignal: interrupt

The service was closed immediately and handling was interrupted.

How can we improve it using a graceful shutdown approach?

// Stop listening on events
func (fl *FileListener) Stop(ctx context.Context) error {
err := fl.watcher.Close()
logger := zerolog.Ctx(ctx)
logger.Info().Msg(“Filesystem listener closed”)
logger.Info().Msg(“Waiting for ongoing handlers to finish …”)
fl.ongoindHandlers.Wait()
logger.Info().Msg(“All ongoing handlers finished”)
return err
}

func main() {

go listener.Start(ctx, “./”)

//graceful shutdown
terminateSignal := make(chan os.Signal, 1)
signal.Notify(terminateSignal, os.Interrupt, syscall.SIGTERM, syscall.SIGINT, syscall.SIGHUP)
<-terminateSignal
logger.Info().Msg(“Received termination signal. Shutting down …”)
err = listener.Stop(ctx)
if err != nil {
logger.Fatal().Err(err).Msg(“Cannot stop listener”)
}
}

Termination signal information is passed to the terminatateSignal channel. When we get the message on the channel, we can perform a graceful shutdown using listener.Stop(ctx) method. Within this method, we stop waiting for new operations err := fl.watcher.Close() and also wait till all handlers finish ongoing tasks.

Now, we can check what happens when we interrupt the service while it is handling tasks:

{“time”:”2024-05-19T10:37:26Z”,”message”:”Filesystem listener started …”}
{“time”:”2024-05-19T10:37:31Z”,”message”:”File event received”, “event”:”CREATE “x.txt””}
^C
{“time”:”2024-05-19T10:37:34Z”,”message”:”Received termination signal. Shutting down …”}
{“time”:”2024-05-19T10:37:34Z”,”message”:”Filesystem listener closed”}
{“time”:”2024-05-19T10:37:34Z”,”message”:”Waiting for ongoing handlers to finish …”}
{“time”:”2024-05-19T10:37:34Z”,”message”:”File listener stopped listening”}
{“time”:”2024-05-19T10:37:46Z”,”message”:”File event handled”, “event”:”CREATE “x.txt””,}
{“time”:”2024-05-19T10:37:46Z”,”message”:”All ongoing handlers finished”}

The termination signal was received but did not force the service to close immediately. We have waited unit all ongoing operations were handled.

As we said before, we should give some time a service to complete the operation but we cannot wait indefinitely. To implement a grace period, we can use context with timeout:

gracePeriod := 15*time.Second
ctx, cancel := context.WithTimeout(ctx, gracePeriod)
defer cancel()
err = listener.Stop(ctx)

Conclusion

Graceful shutdown is a powerful technique that enhances the reliability of your system and ensures that continuous delivery works seamlessly. By properly handling termination signals, tracking ongoing operations, and allowing time for completion, you can maintain service quality even during frequent updates and restarts. Implementing graceful shutdown practices not only prevents data loss and service interruptions but also aligns with modern infrastructure approaches that treat servers as cattle, enabling efficient autoscaling and dynamic resource management.

The codebase covering the Filesystem Listener Service can be found here.

Graceful Shutdown: Why It Matters and How to Implement It was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

​ Level Up Coding – Medium

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner
Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.