Technology•December 19, 2018

When Rotten Tomatoes Isn't Enough: Twitter Sentiment Analysis with DSE Part 3

Amanda Moran

from cassandra.cluster import Cluster cluster =

session.set_keyspace('dseanalyticsdemo')

for emotion in positiveNegative:

searchTerms = [searchTermSad, searchTermPos]

#Code from: https://stackoverflow.com/questions/33404752/removing-emojis-from-a-string-in-python def cleanUpTweet(tweet):

cleanTweet=noretweet

return cleanTweet

access_token = os.environ['ACCESS_TOKEN'] access_token_secret = os.environ['ACCESS_TOKEN_SECRET']

api = tweepy.API(auth)

countTokens = udf(lambda words: len(words), IntegerType())

spark = SparkSession.builder.appName('demo').master("local").getOrCreate()

dfPos = tokenizedPos.select("tweet", "tweetwords").withColumn("tokens", countTokens(col("tweetwords")))

showDF(dfPos)

removerPos = StopWordsRemover(inputCol="tweetwords", outputCol="tweetnostopwords") removedPos = removerPos.transform(dfPos)

dfPosStop = removedPos.select("tweet", "tweetwords", "tweetnostopwords").withColumn("tokens", countTokens(col("tweetwords"))).withColumn("notokens", countTokens(col("tweetnostopwords")))

showDF(dfPosStop)

removerSad = StopWordsRemover(inputCol="tweetwords", outputCol="tweetnostopwords") removedSad = removerSad.transform(dfSad)

dfSadStop = removedSad.select("tweet", "tweetwords", "tweetnostopwords").withColumn("tokens", countTokens(col("tweetwords"))).withColumn("notokens", countTokens(col("tweetnostopwords")))

showDF(dfSadStop)

<img alt="StopWordsRemover" data-entity-type="file" data-entity-uuid="d16a7d5f-8041-4d38-8cb9-c65cf6787ce0" src="https://www.datastax.com/sites/default/files/inline-images/Screen%20Shot%202018-12-17%20at%204.25.27%20PM.png" />

labels = ['Original Tweet', 'Sentiment Score', 'Positive', 'Assessments']

positiveTweetScores = pandas.DataFrame.from_records(poslist, columns=labels) positiveTweetScores

<img alt="Sentiment Analysis using Python package Pattern" data-entity-type="file" data-entity-uuid="44ea780f-c684-41c5-bc67-93385c3fe683" src="https://www.datastax.com/sites/default/files/inline-images/Screen%20Shot%202018-12-17%20at%204.26.16%20PM.png" />

posrating = movieScore/(dfPos.count() - countPos)

display(Markdown('**{}** \n{}'.format("Positive Rating Average Score", posrating)))

display(Markdown('**{}** \n{}'.format("Negative Rating Average Score", sadrating)))

People Like This Movie!

Discover more

PythonDataStax EnterpriseApache Spark™DSE AnalyticsApache Cassandra™

JUMP TO SECTION

What Problem Are We Trying to Solve?

How Are We Going to Solve It?

Let's Walk Through the Notebook Cell by Cell

Cell 1: When Rotten Tomatoes Isn't Enough: Twitter Sentiment Analysis with DSE

Cell 2: Things to Setup (more detail on this in Part 1 and Part 2 )

Cell 3 and 4: Adding some Environment Variables

Cell 5: Importing Packages

Cell 6: Display Formatting Helper Function

Cell 7: Connect to DSE Analytics Cluster

Cell 8: Create Demo Keyspace

Cell 9: Set keyspace

Cell 10: Set Movie Title variable --Change this to search for different movies!

Cell 11: Create two tables in Apache Cassandra for the movie title

Cell 12: Setting up Search Terms

Cell 13: Function to Clean Up Each Tweet before if is inserted into DSE

Cell 14: Required from Twitter Keys and Authorization

Cell 15: Pull Tweets from Twitter, Clean Up Tweets, Insert into DSE

Cell 16: Verify with select *

Cell 17: Create Spark Session and create DataFrames from DSE tables

Cell 18: Use Tokenizer to break up the sentences into individual words

Cell 19: Use StopWordsRemover to remove all stop words

Cell 20 and 21: Sentiment Analysis using Python package Pattern

Cell 22: Alright! Should I see this movie???

More Technology

View All

Technology • April 17, 2024

How Winweb Built its AI Assistant with DataStax Astra DB and LangChain

Technology • April 16, 2024

Vercel + Astra DB: Get Data into Your GenAI Apps Fast

Technology • April 11, 2024

Simplifying Agent Development with Astra DB Connector for Vertex AI Search

Technology • April 10, 2024

Making Astra DB easier for MongoDB developers

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.

Learn More

Get Started for Free