ablog

不器用で落着きのない技術者のメモ

Glue で閉域網に閉じた構成

  • Glue 接続(VPCエンドポイント経由で S3 にアクセスするための接続)
    • タイプ: JDBC
    • JDBC URL: jdbc:mysql://dummy.com:1234/dummy # ダミー
    • VPC ID: vpc-b******1 # 任意のVPC
    • サブネット:subnet-1******b # 任意のサブネット
    • セキュリティグループ: sg-0***************6 # 任意のセキュリティグループ
    • SSL 接続が必要です: false
    • 説明: -
    • ユーザー名 dummy # ダミー
  • Glue Job(PySpark)
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init('test job')

df = spark.read.csv("s3://datalake-landing/test")
df.coalesce(1).write.mode('overwrite').csv("s3://datalake-main/test")
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Access-to-specific-VPCE-only",
            "Effect": "Deny",
            "Principal": "*",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::datalake-landing",
                "arn:aws:s3:::datalake-landing/*"
            ],
            "Condition": {
                "StringNotEquals": {
                    "aws:sourceVpce": "vpce-0a*************fa"
                }
            }
        }
    ]
}
  • VPCエンドポイントポリシー
{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "*",
            "Resource": [
                "arn:aws:s3:::datalake-landing",
                "arn:aws:s3:::datalake-landing/*",
                "arn:aws:s3:::datalake-main",
                "arn:aws:s3:::datalake-main/*",
                "arn:aws:s3:::aws-glue-scripts-123456789012-ap-northeast-1",
                "arn:aws:s3:::aws-glue-scripts-123456789012-ap-northeast-1/*",
                "arn:aws:s3:::aws-glue-temporary-123456789012-ap-northeast-1",
                "arn:aws:s3:::aws-glue-temporary-123456789012-ap-northeast-1/*"
            ]
        }
    ]
}