I’m trying to figure out best-practice for what to put in storage
vs log in Events and require users / app-client to run listeners – how big can storage
get? What are the considerations?
Thx.
I’m trying to figure out best-practice for what to put in storage
vs log in Events and require users / app-client to run listeners – how big can storage
get? What are the considerations?
Thx.
Persistent storage is held in a sparse Merkle tree with a depth of 256. Therefore, the theoretical maximum storage size is 2^256
32-byte words (although in practice, storage key collisions would occur with much fewer storage variables than this). This is a phenomenal amount of data; for all practical purposes, storage can be considered to be (potentially) infinitely large.
Storage is charged per-use as it is read from / written to, so there’s no inherent cost to having a lot of storage variables, but actually using those variables (reads and writes) is extremely expensive compared to other operations.
It’s generally good practice to only use storage where absolutely necessary. As you mention, logs can be used for exposing data. Contracts can also be designed to store only hashes of larger data structures, and require a function call to provide the actual data itself. This will generally be considerably cheaper than reading a large amount of data from storage.
Thx.
Is the charge proportional to the storage
object size? I.e. can I read a smaller portion of the object via reference and only be charged for the size of that data, or must I load the whole object each time?
I don’t think I understand this. I could imagine maybe gzip’ing the content and storing the compressed data (?), but my understanding of hashes is that they are one-way functions. Can you elaborate on what you mean please?
thx again for the info
Is the charge proportional to the
storage
object size? I.e. can I read a smaller portion of the object via reference and only be charged for the size of that data, or must I load the whole object each time?
Storage reads occur one slot at a time. So if you have a large struct in storage, and you’re interested in accessing only one of its elements, you will only pay for one storage read/write.
I could imagine maybe gzip’ing the content and storing the compressed data (?), but my understanding of hashes is that they are one-way functions.
Imagine a contract stores a set of identities (addresses, let’s say), and when you call the contract you need to prove that you are one of those identities. The naive approach would be to store the whole set of addresses, and compare your address against each one in turn, in order to determine if there was a match. The storage cost of this would scale linearly with the number of addresses, as one storage read per address would be necessary.
What you could instead do, is to store the hash of the set of addresses, and have the caller provide the whole set as calldata. Then the contract could first hash the provided set to check whether it is correct, and then check for inclusion against this set. This would require only one storage read. *
In general, these strategies are effective because calldata and calculation are much, much cheaper than storage.
*Even better would be to store the Merkle root of the set, so that the caller only has to provide a proof, whose size scales with log(n) rather than linearly with the size of the set.
Gotcha – thanks.
My use-case involves message passing between users, so inclusion proof (while very cool) is off the table. Sounds like a careful construction of storage
+ gzip might be the way to go, though will have to assess the cost vs burden of using event listeners.
That sounds like a good candidate for logs and an indexer.