Hashing Apples, Bananas and Cherries

by muscawon 12/11/2022, 11:32 AMwith 10 comments

by kentonvon 12/11/2022, 4:18 PM

TL;DR:

Hash functions operate on byte strings. But, sometimes you want to hash data structures. So you serialize the structure and hash the serialization.

You need to be very careful about how you serialize. It's critical that the serialization actually be unique to the particular input. E.g. if you have two different types of data structures that you hash, it's important that no instance of the first type has the same serialization as some instance of the second type. Another common problem is when people hash a structure containing multiple values by simply concatenating the values and hashing the concatenation. If you serialize both `["a", "bc"]` and `["ab", "c"]` as "abc", then they will have the same hash. That's bad!

One way to think about this is to design your serialization such that it can be unambiguously parsed back to the original structure. It doesn't necessarily have to be convenient to parse, just possible. If you aren't experienced with designing serialization schemes, though, it may be best to use a common scheme like JSON or Protobuf. But, don't forget that if you have multiple types of structures, your serialization must specify its own type. For JSON, you could add a `"type": "MyType"` property. For Protobuf, define a single top-level type which is a big "oneof" (union) of all possible types, and always serialize as that top-level type.

by lmzon 12/11/2022, 2:00 PM

Isn't this something that is already solved by DER if you're using ASN.1 data structures?

by 082349872349872on 12/11/2022, 1:09 PM

key slogan: Authenticate what is being meant, not what is being said.