TechnologyDecember 19, 2018

When Rotten Tomatoes Isn't Enough: Twitter Sentiment Analysis with DSE Part 3

Amanda Moran
Amanda Moran
When Rotten Tomatoes Isn't Enough: Twitter Sentiment Analysis with DSE Part 3
from cassandra.cluster import Cluster cluster =
session.set_keyspace('dseanalyticsdemo')
for emotion in positiveNegative:
searchTerms = [searchTermSad, searchTermPos]
#Code from: https://stackoverflow.com/questions/33404752/removing-emojis-from-a-string-in-python def cleanUpTweet(tweet):
cleanTweet=noretweet
return cleanTweet
access_token = os.environ['ACCESS_TOKEN'] access_token_secret = os.environ['ACCESS_TOKEN_SECRET']
api = tweepy.API(auth)
countTokens = udf(lambda words: len(words), IntegerType())
spark = SparkSession.builder.appName('demo').master("local").getOrCreate()
dfPos = tokenizedPos.select("tweet", "tweetwords").withColumn("tokens", countTokens(col("tweetwords")))
showDF(dfPos)
removerPos = StopWordsRemover(inputCol="tweetwords", outputCol="tweetnostopwords") removedPos = removerPos.transform(dfPos)
dfPosStop = removedPos.select("tweet", "tweetwords", "tweetnostopwords").withColumn("tokens", countTokens(col("tweetwords"))).withColumn("notokens", countTokens(col("tweetnostopwords")))
showDF(dfPosStop)
removerSad = StopWordsRemover(inputCol="tweetwords", outputCol="tweetnostopwords") removedSad = removerSad.transform(dfSad)
dfSadStop = removedSad.select("tweet", "tweetwords", "tweetnostopwords").withColumn("tokens", countTokens(col("tweetwords"))).withColumn("notokens", countTokens(col("tweetnostopwords")))
showDF(dfSadStop)
<img alt="StopWordsRemover" data-entity-type="file" data-entity-uuid="d16a7d5f-8041-4d38-8cb9-c65cf6787ce0" src="https://www.datastax.com/sites/default/files/inline-images/Screen%20Shot%202018-12-17%20at%204.25.27%20PM.png" />
labels = ['Original Tweet', 'Sentiment Score', 'Positive', 'Assessments']
positiveTweetScores = pandas.DataFrame.from_records(poslist, columns=labels) positiveTweetScores
<img alt="Sentiment Analysis using Python package Pattern" data-entity-type="file" data-entity-uuid="44ea780f-c684-41c5-bc67-93385c3fe683" src="https://www.datastax.com/sites/default/files/inline-images/Screen%20Shot%202018-12-17%20at%204.26.16%20PM.png" />
posrating = movieScore/(dfPos.count() - countPos)
display(Markdown('**{}** \n{}'.format("Positive Rating Average Score", posrating)))
display(Markdown('**{}** \n{}'.format("Negative Rating Average Score", sadrating)))
People Like This Movie!
Discover more
PythonDataStax EnterpriseApache Spark™DSE AnalyticsApache Cassandra™
Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.