Data Analysis and In Memory Technologies, let’s connect the dots:

SPEED is one of the important aspect of Data Analysis. Wouldn’t it be great if you query a data source, you get your answers as soon as possible? Yes? Right! Of course, it depends on factors like the size of the data you are trying to query but wouldn’t it be great if it’s at “SPEED OF THOUGHT“?

So Here’s the Problem:

Databases are mostly disk based and so the bottleneck here is the speed at which can get access to data off the disks.

So what can you do?

Let’s put data in a RAM (memory) because data-access via memory is faster.

If it’s sounds so easy, why didn’t people do it earlier? And why are we talking about “In Memory” NOW?

1) BIGGER Data Size/sets and so today with more data, it takes more time to query data from databases. And so researchers have looked at other approaches. One of the effective approach they found is: In-memory

(And I am not ignoring the advances in Database Technologies like Parallel databases, But for the purpose of understanding “Why In-memory”, it’s important to realize the growing size of data sets and a viable alternative we have to tackle the problem: In memory. And also I am not saying that it’s the ONLY way to go. I am just trying to understand the significance of in-memory technologies. We, as data professionals, have lot’s of choices! And only after evaluating project requirements, we can talk about tools and techniques)

2)  PRICE of Memory: Was the price of RAM/memory higher than what it is today? So even though it was a great idea to put data in memory, it was cost-prohibitive.

So Let’s connect the dots: Data Analysis + In Memory Technologies:

What’s common between Microsoft’s PowerPivot, SAP HANA, Tableau and Qlikview?

1) Tools for Data-Analysis/Business-Intelligence 2) Their Back End data architecture is “In Memory”

So since Data Analysis needs SPEED and In-Memory Technologies solves this need – Data Analysis and Business Intelligence Tools adopted “In-memory” as their back-end data architecture. And next time, when you hear a vendor saying “in-memory”, you don’t have to get confused about what they’re trying to say. They’re just saying that we got you covered by giving you ability to query your data at “speed of thought” via our In-memory technologies so that you can go back to your (data) analysis.

That’s about it for this post. Here’s a related post: What’s the benefit of columnar databases?

your comments are very welcome!

The role of Sentiment Analysis in Social Media Monitoring:

I’ve posted tutorial/resources about the Technical Side of Sentiment Analysis on this Blog. Here are the Links, if you need them:

LingPipe (Java Based) | Python | R language resource | Microsoft’s Tool “Social Analytics

Apart from this, I’ve used other Tools per project requirements and It’s been fun designing and developing projects on “Sentiment Analysis” primarily using Social Media Monitoring. Having worked with clients on projects that use “Sentiment Analysis” – I reflected about the role of Sentiment Analysis in Social Media Monitoring. And in this blog post, I am sharing these reflections:

What is Social Media Monitoring?

Social Media Monitoring is a process of “monitoring” conversations happening on social media channels about your brand/company.

Is it NEW? Not really. The idea of monitoring or gathering data about what is being talked about the brand/company is not new. Earlier, it was newspapers and magazine-articles and now, it’s the social media channels including online news, forums and blogs and thus the name given to this process is “Social Media Monitoring”

brand monitoring social media

What is Sentiment Analysis?

Analyzing data to categorize it under a “sentiment” (emotion).

Example. Is this review saying positive, negative or neutral thing about our product.

sentiment analysis positive negative neutral

side-note: Sentiment analysis is often categorized under “Big Data Analytics”.

What’s the Role of Sentiment Analysis in Social Media Monitoring?

We’ve seen that in social media monitoring, we gather all online conversations about a brand/product/company. Now wouldn’t it be great to take the data that we have and bucket it under “Positive”, “Negative” or “Neutral” categories for further analysis?

So few questions that can be answered after we have results from sentiment analysis:

1) Are people happy or sad about our product?

2) What do they like about our product?

3) What do they hate about our service?

4) Is there a trend or seasonality in sentiment data?

Among other business insights that may be not be easily answerable with just plain text data.

Thus sentiment analysis is one of the step in social media monitoring that assists in analyzing sentiment of all the conversations happening on the social web about a brand/product.

That’s about this for this post. Here’s a related post: Three Data Collection Tips for Social Media Analytics

your comments are very welcome!

Unpivoting data using the data explorer preview for Excel 2010/2013:

Introduction:

Data Explorer add-in is amazing! It’s helps you: combine, find and re-shape your data in Excel 2010/2013. I’ve blogged about: 1) How to merge Table Data and 2) How to clean duplicate data and now in this blog post, I want to share a step-by-step on Unpivoting data using the Data Explorer add-in.

Before we begin, If you haven’t downloaded and installed the data explorer add-in for Excel 2010 & 2013, you can find Information about it here: http://office.microsoft.com/en-us/excel/download-data-explorer-for-excel-FX104018616.aspx 

Problem:

What is un-pivoting? I hear you ask. Instead of explaining it, let me share an Image:

data explorer unpivot excel

BTW, the above data is from my Facebook Page Insights.

So our problem statement is (please refer to above Image): we are given table blue and we need to output table green. In other words, we need to Unpivot the data.

Solution:

Here are the steps:

1) Open Excel, Open Data Explorer add-in. And Connect to your data. Wait when you see the Query Editor.

2) (Optional) In the Query Editor, Rename the query. I renamed it to “Unpivot Data”. And this how my query editor looks:

data explorer unpivot excel 2

3) Now, Select the columns that need to be unpivoted > Right Click > Unpivot Column

Note that I’ve selected all columns that I want to UnPivot:

data explorer unpivot excel 3

4) You’ll see the updated results in the query editor window. I renamed the columns “Attribute” to “Age and Gender” and “value” to “reach”. If you want to rename the columns, select the column > Right click > rename.

data explorer unpivot excel 4

If everything looks OK, click on Done in the bottom right corner

5) There you have it, Unpivoted data in Excel 2010/2013 using Data Explorer add-in!

And then its super easy to create charts, Here’s one I created after I had unpivoted the data:

data explorer unpivot excel 5

Insight: For my blog, my Target Audience seems to Male between the age of 18-24 and then 25-34.

FYI: The Date Range of the Data Set of 1st Jan 2013 – 25th Apr 2013.

That’s about it for this post, Here are some Related articles:

Your comments are very welcome!

 

Merging/Joining datasets in Excel using Data Explorer add-in

Problem:

Merging/Joining/Combining data-sets in Excel has not been an easy task. There are third-party add-ins that makes it easy but out of the box, excel didn’t have an easy way to merge/join table data. But now with the Data Explorer add-in, we have an add-in that let’s us merge/join data in excel w/ few clicks.

If you haven’t downloaded and installed the data explorer add-in for Excel 2010 & 2013, you can find Information about it here: http://office.microsoft.com/en-us/excel/download-data-explorer-for-excel-FX104018616.aspx 

Situation:

Input is Table 1 & Table 2. The output we need is merged Table.

Table 1:

Date Daily New number Month
1/1/2012 0 1
1/2/2012 0 1
1/3/2012 0 1
1/4/2012 0 1
1/5/2012 0 1
1/6/2012 0 1
1/7/2012 0 1
1/8/2012 0 1
1/9/2012 0 1

………………………………..

Table 2:

Month Month Name
1 January
2 February
3 March
4 April
5 May
6 June
7 July

Merged Table:

Date Daily New number Month Month Name
1/1/2012 0 1 January
1/2/2012 0 1 January
1/3/2012 0 1 January
1/4/2012 0 1 January
1/5/2012 0 1 January
1/6/2012 0 1 January
1/7/2012 0 1 January
1/8/2012 0 1 January
1/9/2012 0 1 January

………………………………

Solution:

Let’s see how data explorer can help us Join/Merge Table 1 & Table 2.

1) create query that connects to Table 1 & Table 2.

data sources explorer excel

2) Once you have queries that connect to the tables need to be merged, then click on Merge

3) Once you click on Merge, you’ll see a dialog:

Here you need to configure three things:

a) First Table

b) Second Table

c) Columns that will be used to merge/join data

In this case, this is how my merge dialog looks:

merge join excel data explorer

4) Once configured correctly, click on OK. You’ll see a dialog box where you can configure the output of the merged table. click on the new column to see the options that are available to you to configure the output of the merged table:

merge join excel data explorer 2

5) In this case, I’ve selected just one column month name that needs to be merged. You can also explore the aggregate tab in case you’ve numbers that needs merging.

merge join excel table data explorer 3

6) This is how the output looks:

merge join excel table data explorer 4

7) Rename the new column.

Select the new column > Right Click > Rename

8) Click Done if it looks OK.

9) The merged data is now available to you in Excel!

And one can analyze it!

Let’s see before and after. Note that instead of month numbers, we now have month names

merged data join table visualized excel 3

In this post, we saw how to merge/join/combine data from two different sources in Excel 2010.

Remove Duplicates in Excel Tables using Data Explorer Add-in:

In this blog post, we’ll see how you can remove duplicated and clean data in excel tables using Data Explorer Add-in.

Problem:

Our Excel Table has following Data:

Month Month Name
1 January
1 January
1 January
2 February
2 February
3 March

And we want to remove duplicates to make the data-set look like this:

Month Month Name
1 January
2 February
3 March

 

In real world data-sets, we wouldn’t have few rows but lot’s of rows and doing it manually wouldn’t be the wisest option. With that in mind, let’s look for a few-clicks solution that can help us remove duplicates.

Solution:

If you haven’t already, download the Data Explorer add-in preview available for Excel 2010 & 2013. It can do a lot more than removing duplicates – it’s a great add-in and it’ll save you lots of time especially if your job involves discovering, cleaning and combining data for analysis purposes. After you’re done installing the add-in, use the steps below to remove duplicates in an excel column:

1. Open Data in Excel. Switch to Data Explorer Tab

2. For the purpose of the demo, I am assuming that you already have the data in excel file. If not, you can connect to other sources via the add-in.

3. Data Explorer add-in > Excel Data> From Table

data explorer excel remove duplicates

4. After you’ve clicked on the From Table, a query editor will pop up:

excel data explorer query editor

5. Select both columns

(you can select both columns by: select first column > hold down the ctrl key and then click on second column)

6. Right click > Remove Duplicates

data explorer remove duplicates excel

7. click on done if you see that the duplicates have been removed correctly

data explorer excel remove duplicates 2

Conclusion:

In this blog post, we saw how to remove duplicates and clean data in Excel using the Data Explorer Preview add-in.

If you’ve not downloaded and installed the data explorer add-in for Excel 2010 & 2013, you can find Information about it here: http://office.microsoft.com/en-us/excel/download-data-explorer-for-excel-FX104018616.aspx 

Note:

1) URL to download the add-in may change in future

2) The steps that I described may also change because as of today the ad-in is in “preview” stage and things may change in future.

PASS Business Analytics Conference Keynote Day #1

In this post, I’ll summarize the PASS Business Analytics Conference’s Keynote Day #1:

The structure of the Keynote:

PASSt Business Analytics Conference

One of the NEW challenges that Data Pros face today is complexity involved in building a BI solution. Following slides nicely represent the challenge from the Tools standpoint:

pass business analytics conference keynote hadoop

Image Courtesy: https://twitter.com/SQLGal/status/322342662013321216

Microsoft’s Goal is to SIMPLIFY the above situation

NEW Tools:

> Data Explorer (Excel add-in)

> Power View in Excel 2013

> Geo Flow

Key Take away from the demo’s was:

Power View is a great tool that you could use to extract insights from data.

E.g. Insights about Music Charts from Germany:

Now combine the power of Power View w/ the new capabilities like Data Explorer that let’s you find, combine & refine data via Data Explorer.

In the Demo, they combined data in hadoop w/ data in relational sources. This is Powerful!

And Also

The Preview for GeoFLow in Excel was announced!

They had a great demo on a pretty big touch device:

GEO FLOW For EXcel

Sorry for the poor image – but imagine a touch device of that size w/ an interactive data visualization that has 3D geo maps!

Conclusion:

They had a nice message at the end of the keynote:

 

An auto-refreshing Word Document that fetches data numbers from Excel:

In this post, I’ll show you how you can create a word document that:

1) Gets numbers from Excel

2) auto refreshes when the data gets changed/updated in Excel

Note: I am using Excel 2010 & Word 2010 for the purpose of this blog.

Here are the steps:

1. Scenario:

Let’s say we have a line in word that says “The revenue for year _____ was $ ______”

and we want to fetch data numbers for Year and revenue from Excel.

2. Let’s go!

here’s the data in Excel:

excel data numbers

3. In excel; copy A2 that has value 2012

4. switch to WORD:

now this is IMPORTANT. do NOT just paste it.

move your cursor to the destination position > right-click > paste options > Link and Merge formatting 

link and merge formatting

There’s also an option of link and keep source formatting, this is how it looks:

link and keep source formatting

5. repeat the steps for revenue

6. Close Word.

7. Open Excel and change the value of year to 2011 and revenue to 4000

8 Now open Word 2010, say yes if a dialog box pops up

And here you go:

linked word file to excel data

Conclusion:

In this post, we saw how to create an auto-refreshing word document that fetches data numbers from Excel.

How to add secondary axis in a chart in Excel 2010?

In this post, I’ll show you how you can add secondary axis in a chart in Excel 2010:

First up, Why do you need secondary axis?

Look at the Data and the chart that I just created:

creating a chart from Data excel 2010

Notice something? Column A is not “visible” – Is it? So Now let’s see the effect how adding a Secondary Axes:

secondary axis in a chart in excel 2010

Looks better? Here are the steps:

1. Select the chart

2. can you see charts tool options in the menu bar? Yes? Great!

3. Switch to Format Tab

4. select the column that you want to put on a secondary axes:

select column in format area data excel chart

5. After selecting the column, click on Format Selection

6. In the dialog box, select secondary axes:

secondary axis excel 2010 chartclick on close.

7. Can you see the secondary axis now? Great!

secondary axis in a chart in excel 2010

8. Let’s do one more thing!

Let’s change the chart type of the secondary axis. Here are the steps:

a. Make sure the column is selected from the format Tab

b. Switch to the Design Tab under the charts Tool

c. Click on change chart type and select the chart of your choice. I select a column chart and this is how it looks:

more than one chart type on a single chart in excel 2010

Conclusion:

In this post, we saw how to add a secondary axis. We also saw how to change the chart type of the secondary axis.

Resource: 12 recorded sessions from the 24hop business analytics edition are online! #passbac #msbi

Recently, PASS hosted a 24hop business analytics event:

And now, the 12 one hour sessions ranging from data visualization, predictive analytics to Big Data are online for you to watch! They also serve as “Trailer” for what you can expect at the PASS Business Analytics conference!

Here’s the URL: http://passbaconference.com/Sessions/SneakPeeks.aspx

And I was following some of these sessions live on the event day – and I can tell you, these sessions are great resources!

Also, I participated in the twitter contest (by Microsoft BI) that was happening along w/ the event – and this is what I got for my win!

24 hop twitter contest prize

hoodie w/ embedded earphones!

That’s about it for this post. Enjoy the recordings!

Resource: A great tutorial for Hadoop on local windows and Azure.

Here’s the resource: http://gettingstarted.hadooponazure.com/gettingStarted.html > “HDInsight Jumpstart”

The Tutorial will teach you how to analyze log files using Hadoop Tools like MapReduce, Hive, SQooP – check it out! It works with both HDInsight for local windows as well as Hadoop on Azure:

HDInsight hadoop on windows starting guide tutorial

Conclusion:

I hope this resource helps you get started on building an end-to-end solution with Hadoop on Windows/Azure.