ablog

不器用で落着きのない技術者のメモ

PySpark は Java の正規表現記法を使う

Regex in pyspark internally uses java regex.One of the common issue with regex is escaping backslash as it uses java regex and we will pass raw python string to spark.sql we can see it with a sample example \d represents digit in regex.Let us use spark regexp_extract to match digit

Data Wrangling in Pyspark with Regex | by somanath sankaran | Analytics Vidhya | Medium

Pattern (Java Platform SE 8)