How to hash columns in Trino

Somtimes you may have to join certain cols that have already been hashed to cols that are not hashed. Therefore, we need to hash the unhashed columns so we're able to join.

Hashed items are generally done using sha256 unless stated otherwise. Below is an example of how to hash a col using the functions provided by Trino.

Example

In this example we need to join the column uid with another column, to do so we need to join a few functions together like so:

to_hex(sha256(to_utf8(replace(uid, 'ID123-', ''))))

Let's break down those functions

  • to_hex() - Encodes binary into a hex string representation.
  • sha256() - Computes the SHA256 hash of binary.
  • to_utf8() - Encodes string into a UTF-8 varbinary representation.
  • replace() - Removes all instances of search from string.

Sometimes we don't need the replace() function this is generally used to remove a prefix. You can see the replace() function in action below:

select
    uid,
    replace(uid, 'ID123-', '') as uid_no_prefix
from aggr.idtable
where
    and type_uid = 'id123UID' limit 2;

-- returns --
+----------------------------------------------+------------------------------------------+
|uid                                           |uid_no_prefix                             |
+----------------------------------------------+------------------------------------------+
|ID123-aG9mbmVsZm5lZm5lbmZsd2Vm|aG9mbmVsZm5lZm5lbmZsd2Vm|
|ID123-cXdmd3Fmd3Fmd2Z3cWZ3cWZxd2Zxd2Z3cWZ3Zgo|cXdmd3Fmd3Fmd2Z3cWZ3cWZxd2Zxd2Z3cWZ3Zgo|
+----------------------------------------------+------------------------------------------+

Using the below query you're able to hash the original uid column, then the column with the prefix removed using replace() followed by the hashed column using all the functoins strung together.

select
    to_hex(sha256(to_utf8(replace(uid, 'ID123-', '')))) as uid_hash
from aggr.idtable
where
    and type_uid = 'id123UID';

When needed to hash something you only really need the single line.

to_hex(sha256(to_utf8(replace(uid, '<prefiex>', '')))) as <alias>