TechnologyApril 2, 2015

Kindling Part 2: An Introduction to Spark with Cassandra

Erich Ess
Erich Ess
Kindling Part 2: An Introduction to Spark with Cassandra
(filename text, line_number int, line_text text, Primary Key(filename, line_number))
(id int, name text, age int)
(id int, name text, year int)
(id int, user_id int, movie_id int, rating float)
case class RawFileData(Filename: String, Line: Int, Contents: String )
val raw_files = sc.cassandraTable[RawFileData]("spark_demo", "raw_files")
val raw_users = raw_files.filter( r => r.Filename == "users.dat" )
raw_users.first
case class User(Id: Int, Age: Int, Gender: String, Occupation: Int,       Zip: String )
val users = raw_users.map( l => l.Contents.trim.split("::") ).map( v => User(Id = v(0).toInt, Age = v(2).toInt, Gender=v(1), Occupation=v(3).toInt, Zip=v(4)))
users.saveToCassandra("spark_demo", "users" )
val raw_movies = raw_files.filter( r => r.Filename == "movies.dat" )
val movies = raw_users.map( l => l.Contents.trim.split("::") ).map( v => Movie(Id = v(0).toInt, Title = v(1), Genres=v(2)))
movies.saveToCassandra(“spark_demo”, “movies”)
val raw_ratings = raw_files.filter( r => r.Filename == "ratings.dat" )
val ratings = raw_ratings.map( l => l.Contents.trim.split("::") ).map( v => Rating(UserId = v(0).toInt, MovieId = v(1).toInt, Rating=v(2).toFloat))
ratings.saveToCassandra(“spark_demo”, “ratings”)
val raw = sc.cassandraTable[RawFileData]("spark_demo", "raw_files").cache
Discover more
Apache Spark™
Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.