Aurora MySQLとS3間でデータをロード(LOAD DATA FROM S3)&アンロード(SELECT INTO OUTFILE S3)したメモ。
手順
- IAMロールを作成する
- ロール名: rds-s3-role
- アクセス権限-Permissions policies: AmazonS3FullAccess
- 信頼されたエンティティ-IDプロバイダー: rds.amazonaws.com
- パラメータグループを作成する
- グループ名: aurora-mysql57-cluster-custom
- パラメータグループファミリー: aurora-mysql5.7
- タイプ: DB Cluster Parameter Group
- 作成したパラメータグループ "aurora-mysql57-cluster-custom" のパラメータ "aws_default_s3_role" に作成したIAMロールの ARN "arn:aws:iam::AWSアカウントid:role/rds-s3-role" を指定する。
- Aurora MySQL クラスターを作成する。
- インスタンスクラス: db.r4.8xlarge
- エンジンバージョン: 5.7.12
- Aurora MySQL クラスターのIAMロール管理で "rds-s3-role" を指定する。
- mysql と sysbench をインストールする
$ curl -s https://packagecloud.io/install/repositories/akopytov/sysbench/script.rpm.sh | sudo bash
$ sudo yum -y install sysbench mysql
$ sysbench /usr/share/sysbench/oltp_read_write.lua \
--db-driver=mysql \
--table-size=10000000 \
--mysql-host=aurora-mysql57-r4-8xl.cluster-********.ap-northeast-1.rds.amazonaws.com \
--mysql-user=awsuser \
--mysql-password=******** \
--mysql-db=mydb \
--db-ps-mode=disable \
prepare
$ mysql -h aurora-mysql57-r4-8xl.cluster-********.ap-northeast-1.rds.amazonaws.com -u awsuser -p
> create table sbtest100m (select a.* from sbtest1 a, sbtest1 b limit 100000000);
> select count(*) from sbtest100m;
5.7.12 awsuser: [mydb] 16:36> SELECT * FROM sbtest100m INTO OUTFILE S3 's3-ap-northeast-1://az-cp-dst/sbtest100m'
-> FIELDS TERMINATED BY ','
-> LINES TERMINATED BY '\n'
-> MANIFEST ON
-> OVERWRITE OFF;
Query OK, 100000000 rows affected (8 min 55.83 sec)
- S3 にアンロードしたオブジェクトのサイズを確認する。
$ aws s3 ls --human-readable s3://aurora-data-bucket/100m
2019-02-19 15:19:13 377 Bytes sbtest100m.manifest
2019-02-19 15:10:09 6.0 GiB sbtest100m.part_00000
2019-02-19 15:13:05 6.0 GiB sbtest100m.part_00001
2019-02-19 15:16:08 6.0 GiB sbtest100m.part_00002
2019-02-19 15:19:05 249.4 MiB sbtest100m.part_00003
$ aws s3 cp s3://az-cp-dst/sbtest100m.part_00000 ./
$ head -1 sbtest100m.part_00000 > 1line
$ ls -l 1line
-rw-rw-r-- 1 ec2-user ec2-user 190 Feb 19 16:24 1line
5.7.12 awsuser: [mydb] 15:22> truncate table sbtest100m;
Query OK, 0 rows affected (1.28 sec)
5.7.12 awsuser: [mydb] 15:23> LOAD DATA FROM S3 MANIFEST 's3-ap-northeast-1://aurora-data-bucket/100m.manifest'
-> INTO TABLE sbtest100m
-> FIELDS TERMINATED BY ','
-> LINES TERMINATED BY '\n' (id, k, c, pad);
Query OK, 100000000 rows affected (22 min 11.60 sec)
Records: 100000000 Deleted: 0 Skipped: 0 Warnings: 0
追記(2019/02/22)
1.8GB(1千万件)
> create table sbtest10m (select * from sbtest1);
> select count(*) from sbtest10m;
5.7.12 awsuser: [mydb] 16:03> SELECT * FROM sbtest10m INTO OUTFILE S3 's3-ap-northeast-1://az-cp-dst/sbtest10m'
-> FIELDS TERMINATED BY ','
-> LINES TERMINATED BY '\n'
-> MANIFEST ON
-> OVERWRITE OFF;
Query OK, 10000000 rows affected (55.03 sec)
5.7.12 awsuser: [mydb] 15:15> LOAD DATA FROM S3 MANIFEST 's3-ap-northeast-1://az-cp-dst/sbtest10m.manifest'
-> INTO TABLE sbtest10m
-> FIELDS TERMINATED BY ','
-> LINES TERMINATED BY '\n' (id, k, c, pad);
Query OK, 10000000 rows affected (2 min 16.17 sec)
Records: 10000000 Deleted: 0 Skipped: 0 Warnings: 0
9GB(5千万件)
> create table sbtest50m (select a.* from sbtest1 a, sbtest1 b limit 50000000);
> select count(*) from sbtest50m;
5.7.12 awsuser: [mydb] 14:55> SELECT * FROM sbtest50m INTO OUTFILE S3 's3-ap-northeast-1://az-cp-dst/sbtest50m'
-> FIELDS TERMINATED BY ','
-> LINES TERMINATED BY '\n'
-> MANIFEST ON
-> OVERWRITE OFF;
Query OK, 50000000 rows affected (4 min 18.54 sec)
5.7.12 awsuser: [mydb] 15:26> LOAD DATA FROM S3 MANIFEST 's3-ap-northeast-1://az-cp-dst/sbtest50m.manifest'
-> INTO TABLE sbtest50m
-> FIELDS TERMINATED BY ','
-> LINES TERMINATED BY '\n' (id, k, c, pad);
Query OK, 50000000 rows affected (11 min 9.39 sec)
Records: 50000000 Deleted: 0 Skipped: 0 Warnings: 0