The role of Sentiment Analysis in Social Media Monitoring:

I’ve posted tutorial/resources about the Technical Side of Sentiment Analysis on this Blog. Here are the Links, if you need them:

LingPipe (Java Based) | Python | R language resource | Microsoft’s Tool “Social Analytics

Apart from this, I’ve used other Tools per project requirements and It’s been fun designing and developing projects on “Sentiment Analysis” primarily using Social Media Monitoring. Having worked with clients on projects that use “Sentiment Analysis” – I reflected about the role of Sentiment Analysis in Social Media Monitoring. And in this blog post, I am sharing these reflections:

What is Social Media Monitoring?

Social Media Monitoring is a process of “monitoring” conversations happening on social media channels about your brand/company.

Is it NEW? Not really. The idea of monitoring or gathering data about what is being talked about the brand/company is not new. Earlier, it was newspapers and magazine-articles and now, it’s the social media channels including online news, forums and blogs and thus the name given to this process is “Social Media Monitoring”

brand monitoring social media

What is Sentiment Analysis?

Analyzing data to categorize it under a “sentiment” (emotion).

Example. Is this review saying positive, negative or neutral thing about our product.

sentiment analysis positive negative neutral

side-note: Sentiment analysis is often categorized under “Big Data Analytics”.

What’s the Role of Sentiment Analysis in Social Media Monitoring?

We’ve seen that in social media monitoring, we gather all online conversations about a brand/product/company. Now wouldn’t it be great to take the data that we have and bucket it under “Positive”, “Negative” or “Neutral” categories for further analysis?

So few questions that can be answered after we have results from sentiment analysis:

1) Are people happy or sad about our product?

2) What do they like about our product?

3) What do they hate about our service?

4) Is there a trend or seasonality in sentiment data?

Among other business insights that may be not be easily answerable with just plain text data.

Thus sentiment analysis is one of the step in social media monitoring that assists in analyzing sentiment of all the conversations happening on the social web about a brand/product.

That’s about this for this post. Here’s a related post: Three Data Collection Tips for Social Media Analytics

your comments are very welcome!

Data Reporting ≠ Data Analysis

One of the key thing I’ve learned is importance of differentiating the concepts of “Data Reporting” and “Data Analysis”. So, let’s first see them visually:

data analysis and data reporting

Here’s the logic for putting Data Reporting INSIDE Data Analysis: if you need to do “analysis” then you need reports. But you do not have to necessarily do data analysis if you want to do data reporting.

From a process standpoint, Here’s how you can visualize Data Reporting and Data Analysis:

data analysis and data reporting process

Let’s thing about this for a moment: Why do we need “analysis”?

We need it because TOOLS are really great at generating data reports. But it requires a HUMAN BRAIN to translate those “data points/reports” into “business insights”. This process of seeing the data points and translating them into business insights is core of what is Data Analysis. Here’s how it looks visually:

Data analysis Data Reporting

Note after performing data analysis, we have information like Trends and Insights, Action items or Recommendations, Estimated impact on business that creates business value.

Conclusion:

Data Reporting ≠ Data Analysis

Quick Note about SAS’s acroynm SEMMA:

SEMMA is an acronym introduced by SAS which stands for:

Sample, Explore, Modify, Model and Assess.

I had recently posted about the Data Mining & Knowledge Discovery Process which had following sequential steps:

Raw Data => cleaning => sampling => Modeling => Testing

SEMMA follows the similar sequential steps as we had seen in the data mining process. So while Data Mining process is applicable to any data mining tool out their, SEMMA helps when you use SAS enterprise miner. In fact, it has helped me quickly find the data mining functions available in SAS tool:

sas sample explore modify model assess

Examples of Machine Generated Data from “Big Data” perspective:

I just researched about Machine Generated Data from the context of “Big data”, Here’s the list I compiled:

- Data sent from Satellites

- Temperature sensing devices

- Flood Detection/Sensing devices

- web logs

- location data

- Data collected by Toll sensors (context: Road Toll)

- Phone call records

- Financial

And a Futuristic one:

Imagine sensors on human bodies that continuously “monitor” health. How about if we use them to detect diabetes/cancer/other-diseases in their early phases. Possible? May be!

Interesting Fact:

Machine can generate data “faster” than humans. This characteristics makes it interesting to think about to analyze machine generate data and in some cases, how to analyze them in real-time or near real-time

Ending Note:

Search for Machine Generated Data, you’ll be able to find much more, it’s worth reading about from the context of Big Data.

Thanks:

http://www.dbms2.com/2010/04/08/machine-generated-data-example/

http://en.wikipedia.org/wiki/Machine-generated_data

http://tdwi.org/articles/2012/10/23/machine-generated-big-data.aspx

Data visualization: Cost of Hard Drive storage space

Here are the visualization:

1982 – 2009:

1982 2009 storage cost

2000 – 2008

2000 2008 storage cost

I grabbed data from: http://www.mkomo.com/cost-per-gigabyte And http://ns1758.ca/winch/winchest.html – Thanks!

Conclusion

Storage cost has drastically decreased. Mathematically, Storage cost has decreased exponentially. No wonder we can store lot’s of data for few dollars and no wonder that the age of Big Data has already arrived!

How to start Analyzing Twitter Data w/ R?

Over the past few weeks, I have posted notes about Analyzing Twitter Data w/ R, listing them here:

1. Install R & RStudio

2. R code to download twitter data

3. Perform Sentiment Analysis on Twitter Data (in R)

Microsoft® HDInsight Preview for Windows: How to create a directory in Hadoop File System?

In this post, we’ll see how to create a directory in the Hadoop File System for HDInsight’s windows version.

Here are the steps:

1. You have the Microsoft® HDInsight Preview for Windows Installed on your machine. Here’s a tutorial: Installing HDInsight (Microsoft’s Hadoop) on windows 7

2. Make sure that the Cluster is up & running! To check this, I click on the “Microsoft HDInsight Dashboard” or open http://localhost:8085/ on my machine

Did you get any “wait for cluster to start..” message? No? Great! Hopefully, all your services are working perfectly and you are good to go now!

3. Let’s start the Hadoop Command Line (can you see the Icon on the Desktop? Yes? Great! Open that!)

4. Here the command to create a directory looks like:

hadoop fs -mkdir /user/data/input

The above command creates /user/data/input

5. Let’s verify that the input directory was created under /user/data

hadoop fs -ls /user/data

hadoop file system list files in a directory create directory

Conclusion:
In this post, we saw how to create a directory in Hadoop (on windows) file system and also we saw how to list files/directory using the -ls command.

Related Articles:

 

Neologism is the new challenge for IT professionals, Here’s why:

What is Neologism?

Neologism means The coining or use of new words – And I believe it’s one of the challenge faced by IT professionals. Nowadays, we put our time & energy trying to get head around “new terms/words/trends”.

Let’s take couple of example(s):

Sometime back, we had cloud computing. Nowadays, its Big Data; In my mind – Big Data has been coined to mean following technologies/techniques under different contexts:

Big Data Unstrucutred External Text Public Data

Note: The above image is just for illustration purpose. It does not comprehensively cover every technology that is now called “Big Data”. Feel free to point it out if you think I missed something important.

And Neologism is challenge because:

1) Generally, it’s a new trend and there is little to no consensus on what does it “Exactly” mean

2) It means different things in different context

3) Every person can have their own “interpretation” and no one is wrong.

4) It’s a moving ball. The definition used today will change in future. So we always need a “working” definition for these terms.

Now, Don’t get me wrong, It’s fun trying to figure out what does it all mean and trying to gauge whether it matters to me and my organization or not! What do you think – as a Person in Information Technology, do you think that Neologism is one of the challenges faced by us? consider leaving a reply in the comment section!

Related Articles:

Want to learn about BigData? read Oreilly’s Book “Planning for BigData”

Quote for Big-Data / Data-Science/ Data-Analysis enthusiasts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?

See what went into building WATSON, an advanced machine learning & natural language processing system powered by Big Data!

Do you know about Jeopardy! quiz show where a computer named Watson was able to beat world champions? No! Go watch it! Yes? Nice! Isn’t it a feat as grand as the one achieved by Deep blue (chess computer); if not less?

I am always interested in how such advanced computers was built. In case of Watson, It’s fascinating how technologies such as Natural language processing, machine learning & artificial intelligence backed by massive compute & storage power was able to beat two human world champions. And as a person interested in analytic’s and Big Data – I would classify this technology under Big Data and Advanced Data Analytics where computer analyzes lots of data to answer a question asked in a natural language. It also uses advanced machine learning algorithms. To that end, If you’re interested in getting an overview of what went into building WATSON, watch this:

If you’re as amazed as I am, considering sharing what amazed you about this technology via comment section:

There’s been a growing interest in Hadoop & Big Data, Here’s the Proof:

I like to keep an eye on Technology Trends. One of the ways I do that is by subscribing to leading magazines for articles – I may not always read the entire article but I definitely read the headlines to see what Industry is talking about. during last 12 months or so I have seen a lot of buzz around Big Data and I thought to myself – It would be nice to see a Trend line for Big Data. Taking it a step further, I am also interested in seeing if there is a correlation between growing trend in “Hadoop” and “Big Data”. Also, I wanted to see how it compares with the Terms like Business Intelligence and Data Science. With this, I turned to Google Trends to quickly create a Trend report to see the results.

Here’s the report:

Big Data Hadoop Business Intelligence

Here are some observations:

1) There’s a correlation between Trend of Big Data and Hadoop. In fact, it looks like growing interest in Hadoop fueled interest in “Big Data”.

2) Trend line of Big Data and Hadoop overtook that of Business Intelligence in Oct 2012 and sep 2012 respectively.

3) Decline in Trend line of Business Intelligence.

4) There seems to be a steady increase in Trend line for Business Analytics and Data Science.

And Here’s the Google Trend report URL: http://www.google.com/trends/explore#q=Big%20Data%2C%20Hadoop%2C%20Business%20Intelligence%2C%20Business%20Analytics%2C%20Data%20Science&cmpt=q

What do you think about these trends?