- Glue 接続(VPCエンドポイント経由で S3 にアクセスするための接続)
- タイプ: JDBC
- JDBC URL: jdbc:mysql://dummy.com:1234/dummy # ダミー
- VPC ID: vpc-b******1 # 任意のVPC
- サブネット:subnet-1******b # 任意のサブネット
- セキュリティグループ: sg-0***************6 # 任意のセキュリティグループ
- SSL 接続が必要です: false
- 説明: -
- ユーザー名 dummy # ダミー
- Glue Job(PySpark)
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init('test job')
df = spark.read.csv("s3://datalake-landing/test")
df.coalesce(1).write.mode('overwrite').csv("s3://datalake-main/test")
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Access-to-specific-VPCE-only",
"Effect": "Deny",
"Principal": "*",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::datalake-landing",
"arn:aws:s3:::datalake-landing/*"
],
"Condition": {
"StringNotEquals": {
"aws:sourceVpce": "vpce-0a*************fa"
}
}
}
]
}
{
"Version": "2008-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "*",
"Resource": [
"arn:aws:s3:::datalake-landing",
"arn:aws:s3:::datalake-landing/*",
"arn:aws:s3:::datalake-main",
"arn:aws:s3:::datalake-main/*",
"arn:aws:s3:::aws-glue-scripts-123456789012-ap-northeast-1",
"arn:aws:s3:::aws-glue-scripts-123456789012-ap-northeast-1/*",
"arn:aws:s3:::aws-glue-temporary-123456789012-ap-northeast-1",
"arn:aws:s3:::aws-glue-temporary-123456789012-ap-northeast-1/*"
]
}
]
}