How to hash columns in Trino
Somtimes you may have to join certain cols that have already been hashed to cols that are not hashed. Therefore, we need to hash the unhashed columns so we're able to join.
Hashed items are generally done using sha256 unless stated otherwise. Below is an example of how to hash a col using the functions provided by Trino.
Example
In this example we need to join the column uid
with another column, to do so we need to join a few functions together like so:
to_hex(sha256(to_utf8(replace(uid, 'ID123-', ''))))
Let's break down those functions
to_hex()
- Encodesbinary
into a hex string representation.sha256()
- Computes the SHA256 hash ofbinary
.to_utf8()
- Encodesstring
into a UTF-8 varbinary representation.replace()
- Removes all instances ofsearch
fromstring
.
Sometimes we don't need the replace()
function this is generally used to remove a prefix. You can see the replace()
function in action below:
select
uid,
replace(uid, 'ID123-', '') as uid_no_prefix
from aggr.idtable
where
and type_uid = 'id123UID' limit 2;
-- returns --
+----------------------------------------------+------------------------------------------+
|uid |uid_no_prefix |
+----------------------------------------------+------------------------------------------+
|ID123-aG9mbmVsZm5lZm5lbmZsd2Vm|aG9mbmVsZm5lZm5lbmZsd2Vm|
|ID123-cXdmd3Fmd3Fmd2Z3cWZ3cWZxd2Zxd2Z3cWZ3Zgo|cXdmd3Fmd3Fmd2Z3cWZ3cWZxd2Zxd2Z3cWZ3Zgo|
+----------------------------------------------+------------------------------------------+
Using the below query you're able to hash the original uid
column, then the column with the prefix removed using replace()
followed by the hashed column using all the functoins strung together.
select
to_hex(sha256(to_utf8(replace(uid, 'ID123-', '')))) as uid_hash
from aggr.idtable
where
and type_uid = 'id123UID';
When needed to hash something you only really need the single line.
to_hex(sha256(to_utf8(replace(uid, '<prefiex>', '')))) as <alias>