How conditionally formatting your data in Excel can help you save time in answering business questions?

Visual analytics is amazing – it helps “data enthusiasts” save time in answering questions using Data. Let’s see one such example. For the purpose of the blog post, I am going to show how to do it in Excel 2010:

Problem:

Here’s the Business Question: What was sales of Tea in North Region in 2012 Q1

Here’s the data:

SALES DATA(2012 Q1)  East West Central North South
Coffee  $  7,348.00  $  7,238.00  $  1,543.00  $  9,837.00  $    1,823.00
Tea  $  9,572.00  $  8,235.00  $  3,057.00  $  8,934.00  $  13,814.00
Herbal Tea  $  5,782.00  $  8,941.00  $  9,235.00  $     392.00  $    1,268.00
Espresso  $  9,012.00  $  2,590.00  $  4,289.00  $  7,848.00  $       340.00

So it’s easy to give out answer using the data: $8934

But let me CHANGE the business question:

WHICH Products in WHAT regions are doing the best?

Now this questions is not as easy as the previous one? WHY? because you’ll have to manually go through each number in a linear fashion to answer the question. Now imagine a bigger data-set. It’ll take even more time.

Solution

What can Excel Power users and Data Enthusiasts do to answer the new business question in an efficient way? Well, let’s see what conditional formatting can do it:

Excel Visual Analytics Conditional formatting

Now with the Data Bars, it’s easier to just glance at the report and see best performing products and regions. For instance, it’s very easy to spot that Tea is performing best in South among all products and region.

So how do you create data bars?

1. Select the data

2. Home > Conditional Formatting > Data Bars

Excel Visual Analytics Conditional formatting 2

3.Done! you’ll see this:

Excel Visual Analytics Conditional formatting

4. You can play with other options here to see what suits the best for your needs. But I just wanted to point out that there is a way for you to highlight the data in a way that helps you save time in answering business questions using data

Conclusion:

Visual analytics is a great way to quickly analyze data. In most cases, Human brain is much faster at interpreting the visual results as oppose to text/numbers – so why not use it to your advantage. And tools like Excel have inbuilt functionality to help you do that!

Tableau: Data Cleaning for Geographic Maps

Data cleaning is a major part of any analytic’s/data-visualization undertaking. If data cleaning is ignored then it leads to inaccurate data reporting & thus suboptimal business decisions.

To that end, if you create a Tableau’s Geographic map, please check the accuracy of your data by going to:

Menu Bar > Map > Edit Locations

Let me give you some examples:

Now, I have “states/province” as my geographic role for the variable and when I created a geographic map, I created a geographic map it didn’t show any state for New York State! See Before:

data cleaning geogrphic map before

So what did I do?

I navigated to Menu bar > Map > Edit locations:

data cleaning geogrphic map State

So I fixed it!

data cleaning geogrphic map Tableau

And After:

data cleaning geogrphic map after

Note that New York State is lighted up!

In the past, I’ve also have entered Latitude & Longitude if need be.  This is when it was not able to recognize few US cities and it was saying “ambiguous” – I inputted Latitude & Longitude to clean the data:

data cleaning geogrphic map city

Conclusion:

In this post, I described how you should check the data accuracy of a Tableau Geographic Map.

Three Data Visualizations I liked this week:

I have been working on creating Dashboards for one of my projects. As a part of the research, I looked at few Dashboards out their on the inter-webs. Here are three of them that I liked:

1. Social Media & Sentiment Analysis:

What I like about this Dashboard is the creative use of Data via Sentiment Analysis:

sentiment analysis social media dashboard

2. Microsoft Research’s Viral Search Project:

What a creative way to visualize viral content!

visualize viral social network data microsoft viral search

3. Social Media analytic’s Dashboard:

Nice one page social dashbaord!

social media analytics dashboard

Do you see the bottom right part of the report that shows you engagement levels by post type, if you want to compute it – here’s my blog post on that: Social Media Analytics. Facebook Page Smackdown: Status updates vs Images?

 

Found something interesting by exploring a “List of companies by revenue” Data Set:

I like exploring data sets to find interesting patterns from them. To that end, I was exploring a data-set: List of companies by revenue and I added a column to calculate Revenue/Employee to explore the dataset:

And I found an outlier!

Here’s the outlier: Exor

Here’s what it’s interesting:

It’s revenue in 2012 is: 109.15 billion USD

And number of employees is just 40!

Just think of Revenue/Employee !

To put things in perspective, Lets Compare that with its neighbor in the data-set:

Rank | Company | Industry | Revenue in USD billion | Employees

48 Koch Industries Conglomerate 110.00 60000.00
49 EXOR Investment 109.15 40.00
50 Cardinal Health Pharmaceuticals 107.55 40000.00
51 CVS Caremark Retail 107.10 202000.00
52 IBM Computer services 106.92 433362.00

I got to know about this by quickly creating a data visualization to explore the data-set:

list of companies by revenue

And removing Trafigura, Vitol and Exor, this is what we have:

power view excel 2013 rank revenue employees

Observation: oil and gas industry have relatively higher revenue/employee ration.

That’s about it for this post. Thanks for reading about my data exploration!

Data visualization: List of largest IT companies in the world

I was going through the list of largest IT companies in the world. And I thought, it would be great to see it visually! so here it goes:

[created using Power View in Excel 2013)

power view scatter chart excel 2013Configuration of Scatter Plot:

fields list power view scatter chart excel 2013

Some of observations:

- Foxconn has low revenue/employee ratio (I guess, it’s because they must be employing a lot of workers for their electronic manufacturing plant at low cost)

- Samsung is ranked number 1 and Apple is ranked 2. But apple has better revenue/employee ratio. Also Apple’s market cap (represented by Size of bubble) is greater then that of Samsung

- there’s a cluster that comprises of MS. Google, Amazon.. etc Also one more cluster of HP, Panasonic and IBM

And here’s the Data that I’ve used:

Rank Company Revenue (USD Billion) Employees Market cap
1 Samsung Electronics $188.10 221,726 $200
2 Apple $156.50 76,100 $427.62
3 HP $120.30 331,800 $32.46
4 Foxconn $119 1,230,000 $27.20
5 IBM $104.50 433,362 $229.45
6 Panasonic $99.65 327,512 $22.70
7 Microsoft $73.72 94,000 $231.03
8 Dell $62.07 106,700 $22.97
9 Amazon.com $61.09 88,400 $120.03
10 Fujitsu $54.46 173,155 $125.83
11 Intel $53.34 104,700 $105.26
12 Google $50.17 53,546 $248.31

 

Data visualization: Cost of Hard Drive storage space

Here are the visualization:

1982 – 2009:

1982 2009 storage cost

2000 – 2008

2000 2008 storage cost

I grabbed data from: http://www.mkomo.com/cost-per-gigabyte And http://ns1758.ca/winch/winchest.html – Thanks!

Conclusion

Storage cost has drastically decreased. Mathematically, Storage cost has decreased exponentially. No wonder we can store lot’s of data for few dollars and no wonder that the age of Big Data has already arrived!

How to start Analyzing Twitter Data w/ R?

Over the past few weeks, I have posted notes about Analyzing Twitter Data w/ R, listing them here:

1. Install R & RStudio

2. R code to download twitter data

3. Perform Sentiment Analysis on Twitter Data (in R)

Sentiment Analysis in R w/ Twitter data feeds

I followed instructions on this site to perform sentiment analysis about Starbucks from Twitter data feeds.

Here are data visualizations:

1. Sentiment Analysis: Starbucks on Twitter

sentiment analysis starbucks on twitter

2. Comparison cloud:

comparison cloud data visualization

That’s about it for this post, Here are some related tutorials:

If you want to Install R on windows machine, here’s a Tutorial: http://parasdoshi.com/2012/11/13/lets-install-r-rstudio-on-windows-machine/

If you want to try out out Hadoop on windows, Hive and Hive excel add-in w/ Twitter Data, Here’s a Tutorial: http://parasdoshi.com/2012/11/16/how-to-load-twitter-data-into-hadoop-on-azure-cluster-and-then-analyze-it-via-hive-add-in-for-excel/

If you want to Grab Twitter search data using R and export to a tab delimited file. Here’s a tutorial: http://parasdoshi.com/2012/11/24/grab-twitter-search-data-using-r-and-export-to-a-tab-delimited-file/

Playing w/ the Occupational Employement Statistics Data-Set:

I found some data-sets on Occupational Employment Statistics on Bureau of Labor Statistics site and I played with it to see if I can find something interesting:

Few things about the data & visualization that I am going to share

  • US only
  • I downloaded the national level data But there’s also state level data available if you’re interested to drill down.
  • The reports that you see where created after I got a chance to “clean” the data-set a bit and created a data model that suited basic reporting on top of it.
  • For this blog post, I am going to play w/ May 2010 & 2011 data
  • With the help of original data-set, you can drill down to get statistics about a particular Job Category if you want. For this blog-post, I am going to share visualizations that correspond to Job categories.
  • click on images to see the higher resolution image.

With that, Here are some visualizations:

1) Job Category VS mean hourly salary:

1 Job category vs hourly salary mean bureau of labour statistics

2) Job Category VS number of employees:

2 Job category vs number of employees bureau of labour statistics

3) Scatter Plot:

X Axis: Number of employees

Y – Axis: Wage (Mean Hourly Salary May 2011)

Size of Bubble: Wage (Mean Hourly Salary May 2011)

*Note: This may not be the best approach to create the Scatter Plot as I have used the same value (Mean Hourly Salary May 2011) twice – But since I was just playing w/ it, I went with what I had in the model.

Here’s the visualization:

3 scatter plot number of employees vs mean hourly wage may 2011 employment statistics

Some of the things I observed:

1) I belong to an Industry (Computer and Mathematical occupations) which has relatively higher mean hourly wage.

2) There are few people working in “farming, fishing & forestry occupations” that do not get paid much.

3) There are lots of people working in “office administrative support occupations” that do not get paid much.

4) Management Occupations, Legal Occupations and computer & mathematical occupations have relatively higher mean hourly wages.

Conclusion:

In this post, I played w/ Occupational Employment statistics data-sets and shared some visualizations.

In 2012, people from 162 contries visited ParasDoshi.com!

Visualizing data is powerful! Thanks to WordPress.com for sending me the 2012′s report – a statistic that I found very encouraging was that people from 162 countries visited this blog! All thanks to the power of Inter webs!

Thanks everyone for the support, Appreciate it!

And Here’s a beautiful Data Visualization:

162 countries blog visitors paras doshi 2012