Not logged in. Login

Cassandra + Spark + Python

We will use the spark-cassandra-connector to bring Spark and Cassandra together. When you run a Spark job using this library, you need to include the corresponding Spark Package:

spark-submit --packages datastax:spark-cassandra-connector:2.4.0-s_2.11 …

You need to configure the SparkSession object to connect correctly to our cluster. Create your spark variable like this:

cluster_seeds = ['199.60.17.32', '199.60.17.65']
spark = SparkSession.builder.appName('Spark Cassandra example') \
    .config('spark.cassandra.connection.host', ','.join(cluster_seeds)).getOrCreate()

With this done, you should be able to read DataFrames from Cassandra or write DataFrames tables to Cassandra:

df = spark.read.format("org.apache.spark.sql.cassandra") \
    .options(table=table, keyspace=keyspace).load()
df.write.format("org.apache.spark.sql.cassandra") \
    .options(table=table, keyspace=keyspace).save()
Updated Tue Aug. 27 2019, 21:34 by ggbaker.