以下では解決しない。解決策が分かったら更新予定。
事象
EMR(emr-5.19.0) で PySpark を実行すると、"java.io.FileNotFoundException: /stderr (Permission denied)"、"java.io.FileNotFoundException: /stdout (Permission denied)" というエラーメッセージが出力される。
$ pyspark Python 2.7.14 (default, May 2 2018, 18:31:34) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information. log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /stderr (Permission denied) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at java.io.FileOutputStream.<init>(FileOutputStream.java:133) at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:223) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:120) at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:108) at org.apache.spark.deploy.SparkSubmit$.initializeLogIfNecessary(SparkSubmit.scala:71) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:128) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) log4j:ERROR Either File or DatePattern options are not set for appender [DRFA-stderr]. log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /stdout (Permission denied) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at java.io.FileOutputStream.<init>(FileOutputStream.java:133) at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:223) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.parseCatsAndRenderers(PropertyConfigurator.java:672) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:516) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.<clinit>(LogManager.java:127) at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:120) at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:108) at org.apache.spark.deploy.SparkSubmit$.initializeLogIfNecessary(SparkSubmit.scala:71) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:128) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
解決策
<property> <name>spark.yarn.app.container.log.dir</name> <value>/var/log/hadoop-yarn</value> </property>
- pyspark を実行してもエラーが出なくなる。
$ pyspark Python 2.7.12 (default, Sep 1 2016, 22:14:00) [GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 18/11/23 13:01:53 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
補足
- CloudFromation(YAML) で書く場合は以下の通り。
Resources: cluster: Type: AWS::EMR::Cluster Properties: Configurations: - Classification: yarn-site ConfigurationProperties: spark.yarn.app.container.log.dir: /var/log/hadoop-yarn