ablog

不器用で落着きのない技術者のメモ

Redshift で列長に合わせて文字列を切詰める

You can't use SUBSTRING to predictably extract the prefix of a string that might contain multi-byte characters because you need to specify the length of a multi-byte string based on the number of bytes, not the number of characters. To extract the beginning segment of a string based on the length in bytes, you can CAST the string as VARCHAR(byte_length) to truncate the string, where byte_length is the required length. The following example extracts the first 5 bytes from the string 'Fourscore and seven'.

select cast('Fourscore and seven' as varchar(5));

varchar
-------
Fours
SUBSTRING function - Amazon Redshift

検証結果

dev=# CREATE TABLE IF NOT EXISTS public.varchar_tbl(
    test_case CHARACTER VARYING(300),
    varchar_col CHARACTER VARYING(6) ENCODE ZSTD collate case_insensitive
);
CREATE TABLE
dev=# insert into public.varchar_tbl(test_case, varchar_col) values('substring multi-byte 4', substring('0123456789', 0, 4));
ERROR:  value too long for type character varying(6)
dev=# insert into public.varchar_tbl(test_case, varchar_col) values('substring multi-byte 3', substring('0123456789', 0, 3));
INSERT 0 1
dev=# 
dev=# insert into public.varchar_tbl(test_case, varchar_col) values('substring single-byte 8', substring('0123456789', 0, 8));
ERROR:  value too long for type character varying(6)
dev=# insert into public.varchar_tbl(test_case, varchar_col) values('substring single-byte 7', substring('0123456789', 0, 7));
INSERT 0 1
dev=# insert into public.varchar_tbl(test_case, varchar_col) values('substring single-byte 6', substring('0123456789', 0, 6));
INSERT 0 1
dev=# 
dev=# insert into public.varchar_tbl(test_case, varchar_col) values('varchar() multi-byte 9', cast('0123456789' as varchar(9)));
ERROR:  value too long for type character varying(6)
dev=# insert into public.varchar_tbl(test_case, varchar_col) values('varchar() multi-byte 8', cast('0123456789' as varchar(8)));
INSERT 0 1
dev=# insert into public.varchar_tbl(test_case, varchar_col) values('varchar() multi-byte 7', cast('0123456789' as varchar(7)));
INSERT 0 1
dev=# insert into public.varchar_tbl(test_case, varchar_col) values('varchar() multi-byte 6', cast('0123456789' as varchar(6)));
INSERT 0 1
dev=# 
dev=# insert into public.varchar_tbl(test_case, varchar_col) values('varchar() single-byte 7', cast('0123456789' as varchar(7)));
ERROR:  value too long for type character varying(6)
dev=# insert into public.varchar_tbl(test_case, varchar_col) values('varchar() single-byte 6', cast('0123456789' as varchar(6)));
INSERT 0 1
dev=# 
dev=# select * from public.varchar_tbl;
        test_case        | varchar_col 
-------------------------+-------------
 substring multi-byte 3  | 01
 substring single-byte 7 | 012345
 substring single-byte 6 | 01234
 varchar() multi-byte 8  | 01
 varchar() multi-byte 7  | 01
 varchar() multi-byte 6  | 01
 varchar() single-byte 6 | 012345
(7 rows)