In many programming languages, using the assignment operator (=) to duplicate arrays or maps (or similar data structures) does not create a new, independent copy. Instead, it creates a new reference to the same underlying data. This can lead to unintended side effects when the original data is modified. In software terminology, we use the terms “deep copy” and “shallow copy” to describe different methods of duplicating objects:

A deep copy of an object creates a new object and recursively copies all objects found in the original. This means that the new object is a complete copy, with no shared references to the objects in the original.Deep copyA shallow copy of an object creates a new object but inserts references into it to the objects found in the original. Therefore, the new object is a copy of the original object’s structure but not of the objects that the structure references.Shallow copy

In this short article, I would like to show how using the wrong copying approach could lead to unintended consequences in our programs.

Simple example

NOTE: We use Golang to present it, but the same behavior could be seen in other popular languages like Javascript, Python, etc.

What happens when we assign a hash map to another one?

package main

import “fmt”

func main() {
print(“SHALLOW COPY”)
headers := map[string]string{
“Content-Type”: “application/json”,
}

fields := headers

fmt.Println(“Fields:”, fields)
fmt.Println(“Headers:”, headers)
fmt.Println(“ADD AUTHORIZATION”)
fields[“Authorization”] = “Bearer 123”
fmt.Println(“Fields:”, fields)
fmt.Println(“Headers:”, headers)
}

We initialized a headers map, assigned headers to fields, and eventually added Authorization to fields. Our intention was probably to copy values from headers to fields and made changes that affect fields but not headers. However, we get something like that:

SHALLOW COPY
Headers: map[Content-Type:application/json]ADD AUTHORIZATION
Fields: map[Authorization:Bearer 123 Content-Type:application/json]Headers: map[Authorization:Bearer 123 Content-Type:application/json]

By adding Authorization to fields we modified not only fields but also headers. The reason is that the assignment operator did shallow copy and we did not duplicate the object but only copied the reference to headers. Consequently, headers and fields pointed to the same object.

package main

import “fmt”

func main() {
fmt.Println(“DEEP COPY”)
headers := map[string]string{
“Content-Type”: “application/json”,
}
fmt.Println(“Headers:”, headers)

fields := deepCopy(headers)

fmt.Println(“ADD AUTHORIZATION”)
fields[“Authorization”] = “Bearer 123”
fmt.Println(“Fields:”, fields)
fmt.Println(“Headers:”, headers)
}

func deepCopy(m map[string]string) map[string]string {
newMap := make(map[string]string)
for k, v := range m {
newMap[k] = v
}
return newMap
}

We replaced the assignment operator (shallow copy) by deepCopy function (deep copy). Now headers and fields just after making a copy have the same values, but are represented by two different objects.

DEEP COPY
Headers: map[Content-Type:application/json]ADD AUTHORIZATION
Fields: map[Authorization:Bearer 123 Content-Type:application/json]Headers: map[Content-Type:application/json]

The headers is not bound to fields and any modification of fields will not affect headers.

Concurrent access

The above example was trivial and straightforward. However, things get tricky when handling many concurrent requests in our applications. In such scenarios, errors become non-deterministic, and identifying the cause of unexpected behavior becomes much harder to debug.

package main

import (
“fmt”
“time”
“math/rand/v2”
)

func main() {
fmt.Println(“CONCURRENCY”)
headers := map[string]string{
“Content-Type”: “application/json”,
}
c := client{
headers: headers,
}

operations := 10
for i := 0; i < operations; i++ {
simulateLatency()
go c.sendRequest(fmt.Sprintf(“Bearer %d”, i))
}
time.Sleep(1 * time.Second)
}

type client struct {
headers map[string]string
}

func (c *client) sendRequest(key string) {
fields := c.headers
fields[“Authorization”] = key
simulateLatency()
fmt.Println(fmt.Sprintf(“key: %s, fields: %s”, key, fields))
}

func simulateLatency() {
interval := rand.IntN(5)
time.Sleep(time.Duration(interval) * time.Millisecond)
}

In this example, there is a small time gap between setting Authorization field and printing values (or sending a request over the network in a real system). This gap and using a shallow copy of the headers map can lead to a situation where another concurrent operation overwrites the Authorization field. In the end, the Authorization field is not set to the expected value.

The Authorization key updated by operation 2 caused an unintended effect on operation 1.

When we run the code, we get something like this:

CONCURRENCY
key: Bearer 0, fields: map[Authorization:Bearer 0 Content-Type:application/json]key: Bearer 1, fields: map[Authorization:Bearer 1 Content-Type:application/json]key: Bearer 2, fields: map[Authorization:Bearer 2 Content-Type:application/json]key: Bearer 3, fields: map[Authorization:Bearer 4 Content-Type:application/json]key: Bearer 5, fields: map[Authorization:Bearer 4 Content-Type:application/json]key: Bearer 4, fields: map[Authorization:Bearer 4 Content-Type:application/json]key: Bearer 6, fields: map[Authorization:Bearer 6 Content-Type:application/json]key: Bearer 7, fields: map[Authorization:Bearer 7 Content-Type:application/json]key: Bearer 8, fields: map[Authorization:Bearer 9 Content-Type:application/json]key: Bearer 9, fields: map[Authorization:Bearer 9 Content-Type:application/json]

As we expected, some of the requests have incorrect Authorization field, because headers has been shared between concurrent operations.

The solution here is simple: use a deep copy instead of a shallow one. Although using deep copy seems straightforward when dealing with concurrent access to shared resources, even experienced developers can make mistakes. It is crucial to be extremely careful when duplicating mutable structures to avoid errors.

Conclusion

Failing to understand the difference between deep and shallow copying can lead to unintended side effects, especially in concurrent or multi-threaded environments where a shared state can be inadvertently modified. It is crucial to choose the correct type of copying based on the needs of your application to ensure data integrity and prevent subtle bugs.

How to Avoid a Common Pitfall in Concurrent Programming was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

​ Level Up Coding – Medium

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner
Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.