The missing Dictionary in Solidity
If you ever wanted to interact with smart contracts and build on top of the block-chain before you probably heard of Solidity. That’s the programing language behind the Ethereum smart contracts. Being a new language comes with missing some typical data-structures you got used to in other languages. In this post I’ll describe the journey of creating the an dictionary
data structure in Solidity.
Define: Dictionary
Coming from languages like javascript, ruby, java, c# and others, Dictionary
might represent different things usually known as hash
/map
/hashmap
/hashtable
etc… In this post I’ll define Dictionary
as a generic data structure that allows you to do these basic operations:
Set(key, value)
— store arbitrary value for a given key.
Get(key) -> value
— retrieve a value for a specific key.
Remove(key)
— remove a key.
Keys() -> [id1, id2, ...id-n]
— get a list of all keys. (The missing piece in Solidity).
The Current Problem
The easy part would be Set
and Get
as we can simply use the existed [mapping](http://solidity.readthedocs.io/en/develop/types.html#mappings)
data type for it. Think of mapping
as a hash table connecting a hashed key to a value. Hash tables will allow us to easily achieve O(1) efficiency for both Get
and Set
. The issue begins when we want to iterate over all the keys/values in the hash table.
In most of the modern languages we have a built-in .keys()
function allows us to easily retrieve all the keys of a map. In solidity we don’t have it:
Mappings are not iterable, but it is possible to implement a data structure on top of them. For an example, see iterable mapping. [source]
Ethereum offers a library that implement iterable mapping. This solution combine an array
together with mapping
so each time you add an item to the hash table you also add another item to the extra array. This structure has some few problems:
Data duplication — the Ids stored in both mapping and array. As you know, storing information on the blockchain is expensive so you want to minimize the footprint as much as possible.
Inefficient store— when you delete an item from the map, the array simply set it’s index to 0. assuming you stored 1M items and deleted them all, you’ll have an array with 1M indexes assigned to initial value.
Inefficient iteration & removal— since the array that tracks ids is in sequential order, getting the next item means looping through items that might be deleted so if for instance the indexes in the array between n
to n+x
deleted means you’ll have x
unneeded iteration operations that you wasted gas on just to get your next id.
Can we do better?
The question I love to ask in each job interview is relevant here more than ever — can we do better? Smart contract requires gas to work. Since gas priced by code efficiency and storage requirements, we have to come up with better, leaner, faster data structure.
The main problem on the solution above is choosing array as a data structure to store and track ids of the mapping. Can we find a more efficient way to keep all the data in one place (to avoid duplication) yet enforce connection between ids so we can iterate and output all keys?
First attempt
One data structure that might be a good start is a linked list. Linked list allows you easily track the next node so iteration will be very easy with no “gaps” between. Each node simply points on the next node.
linked-list nodes combined with hash-table. each list node represent hash key
Pros —no mode “gaps” or array copy in case of removing a node. as we’re not using an array we can easily delete an item just by changing the next
pointers.
Cons
- Efficiency — to get the next item of a given key, or to delete a node we will need to iterate over the linked-list items yielding O(N) for
remove
operation. - Data duplication — keys exists both in the hash table and linked list.
Final solution
To avoid data duplication of keys and to get better retrieval time what if we can combine linked-list with hash map as one data-structure? This new data structure will be represented as a hash-map where each value is a linked-list node pointing on the next node and also containing the data.
hash-table where each value is a linked-list node pointing on the next key
This data structure will combine the good of both linked-list and hash-table together.
Inserting or removing an item will require us to simply add/delete a key in the hash-table and adjust the node pointers — achieving O(1) performance (simple hash-map lookup).
Storing the nodes as part of the hash-map will also prevent a data-duplication among two different data structures as keys are saved only in one place (as keys of the mapping table) achieving less storage.
Future improvements
- Using double linked list to iterate over the array both ways.
- Storing a
size
variable to easily get the number of items. - Using hash table as a data-structure to host the double-linked-list nodes will allow us to add more operations like
insertBefore
,insertAfter
,insertBeginning
andinsertEnd
. - Guarantee keys order based on inserting strategy (append beginning/end).
- Extend the base node struct to allow different types as keys (like strings) and values.
Use it in your smart contract today
I created a repo of solidity utils (includes all the suggested improvements) so everyone can easily use it in their projects. To import the new dictionary data structure all you need to do is:
pragma solidity ^0.4.0;
// import the contract
import "github.com/sagivo/solidity-utils/contracts/lib/Dictionary.sol";
// have fun
contract Foo {
// declare and use new Dictionary structure
using Dictionary for Dictionary.Data;
Dictionary.Data private dic;
function Foo() public view returns (uint) {
dic.set(1, "value");
dic.set(2, "foo");
dic.set(123, "bar");
dic.set(1, "new value");
// get an item
dic.get(2); // => '0x666f6f' (byte hex of 'foo')
// get all keys
dic.keys(); // => [1, 2, 123]
}
}
That’s it. If you find it useful please give it a clap(s) or share with your friends ❤