Visual analytics is amazing – it helps “data enthusiasts” save time in answering questions using Data. Let’s see one such example. For the purpose of the blog post, I am going to show how to do it in Excel 2010:
Here’s the Business Question: What was sales of Tea in North Region in 2012 Q1
Here’s the data:
|SALES DATA(2012 Q1)
|| $ 7,348.00
|| $ 7,238.00
|| $ 1,543.00
|| $ 9,837.00
|| $ 1,823.00
|| $ 9,572.00
|| $ 8,235.00
|| $ 3,057.00
|| $ 8,934.00
|| $ 13,814.00
|| $ 5,782.00
|| $ 8,941.00
|| $ 9,235.00
|| $ 392.00
|| $ 1,268.00
|| $ 9,012.00
|| $ 2,590.00
|| $ 4,289.00
|| $ 7,848.00
|| $ 340.00
So it’s easy to give out answer using the data: $8934
But let me CHANGE the business question:
WHICH Products in WHAT regions are doing the best?
Now this questions is not as easy as the previous one? WHY? because you’ll have to manually go through each number in a linear fashion to answer the question. Now imagine a bigger data-set. It’ll take even more time.
What can Excel Power users and Data Enthusiasts do to answer the new business question in an efficient way? Well, let’s see what conditional formatting can do it:
Now with the Data Bars, it’s easier to just glance at the report and see best performing products and regions. For instance, it’s very easy to spot that Tea is performing best in South among all products and region.
So how do you create data bars?
1. Select the data
2. Home > Conditional Formatting > Data Bars
3.Done! you’ll see this:
4. You can play with other options here to see what suits the best for your needs. But I just wanted to point out that there is a way for you to highlight the data in a way that helps you save time in answering business questions using data
Visual analytics is a great way to quickly analyze data. In most cases, Human brain is much faster at interpreting the visual results as oppose to text/numbers – so why not use it to your advantage. And tools like Excel have inbuilt functionality to help you do that!
Data cleaning is a major part of any analytic’s/data-visualization undertaking. If data cleaning is ignored then it leads to inaccurate data reporting & thus suboptimal business decisions.
To that end, if you create a Tableau’s Geographic map, please check the accuracy of your data by going to:
Menu Bar > Map > Edit Locations
Let me give you some examples:
Now, I have “states/province” as my geographic role for the variable and when I created a geographic map, I created a geographic map it didn’t show any state for New York State! See Before:
So what did I do?
I navigated to Menu bar > Map > Edit locations:
So I fixed it!
Note that New York State is lighted up!
In the past, I’ve also have entered Latitude & Longitude if need be. This is when it was not able to recognize few US cities and it was saying “ambiguous” – I inputted Latitude & Longitude to clean the data:
In this post, I described how you should check the data accuracy of a Tableau Geographic Map.
I have been working on creating Dashboards for one of my projects. As a part of the research, I looked at few Dashboards out their on the inter-webs. Here are three of them that I liked:
1. Social Media & Sentiment Analysis:
What I like about this Dashboard is the creative use of Data via Sentiment Analysis:
2. Microsoft Research’s Viral Search Project:
What a creative way to visualize viral content!
3. Social Media analytic’s Dashboard:
Nice one page social dashbaord!
Do you see the bottom right part of the report that shows you engagement levels by post type, if you want to compute it – here’s my blog post on that: Social Media Analytics. Facebook Page Smackdown: Status updates vs Images?
I like exploring data sets to find interesting patterns from them. To that end, I was exploring a data-set: List of companies by revenue and I added a column to calculate Revenue/Employee to explore the dataset:
And I found an outlier!
Here’s the outlier: Exor
Here’s what it’s interesting:
It’s revenue in 2012 is: 109.15 billion USD
And number of employees is just 40!
Just think of Revenue/Employee !
To put things in perspective, Lets Compare that with its neighbor in the data-set:
Rank | Company | Industry | Revenue in USD billion | Employees
I got to know about this by quickly creating a data visualization to explore the data-set:
And removing Trafigura, Vitol and Exor, this is what we have:
Observation: oil and gas industry have relatively higher revenue/employee ration.
That’s about it for this post. Thanks for reading about my data exploration!
I was going through the list of largest IT companies in the world. And I thought, it would be great to see it visually! so here it goes:
[created using Power View in Excel 2013)
Configuration of Scatter Plot:
Some of observations:
- Foxconn has low revenue/employee ratio (I guess, it’s because they must be employing a lot of workers for their electronic manufacturing plant at low cost)
- Samsung is ranked number 1 and Apple is ranked 2. But apple has better revenue/employee ratio. Also Apple’s market cap (represented by Size of bubble) is greater then that of Samsung
- there’s a cluster that comprises of MS. Google, Amazon.. etc Also one more cluster of HP, Panasonic and IBM
And here’s the Data that I’ve used:
||Revenue (USD Billion)
Here are the visualization:
1982 – 2009:
2000 – 2008
I grabbed data from: http://www.mkomo.com/cost-per-gigabyte And http://ns1758.ca/winch/winchest.html – Thanks!
Storage cost has drastically decreased. Mathematically, Storage cost has decreased exponentially. No wonder we can store lot’s of data for few dollars and no wonder that the age of Big Data has already arrived!
Over the past few weeks, I have posted notes about Analyzing Twitter Data w/ R, listing them here:
1. Install R & RStudio
2. R code to download twitter data
3. Perform Sentiment Analysis on Twitter Data (in R)
I followed instructions on this site to perform sentiment analysis about Starbucks from Twitter data feeds.
Here are data visualizations:
1. Sentiment Analysis: Starbucks on Twitter
2. Comparison cloud:
That’s about it for this post, Here are some related tutorials:
If you want to Install R on windows machine, here’s a Tutorial: http://parasdoshi.com/2012/11/13/lets-install-r-rstudio-on-windows-machine/
If you want to try out out Hadoop on windows, Hive and Hive excel add-in w/ Twitter Data, Here’s a Tutorial: http://parasdoshi.com/2012/11/16/how-to-load-twitter-data-into-hadoop-on-azure-cluster-and-then-analyze-it-via-hive-add-in-for-excel/
If you want to Grab Twitter search data using R and export to a tab delimited file. Here’s a tutorial: http://parasdoshi.com/2012/11/24/grab-twitter-search-data-using-r-and-export-to-a-tab-delimited-file/
I found some data-sets on Occupational Employment Statistics on Bureau of Labor Statistics site and I played with it to see if I can find something interesting:
Few things about the data & visualization that I am going to share
- US only
- I downloaded the national level data But there’s also state level data available if you’re interested to drill down.
- The reports that you see where created after I got a chance to “clean” the data-set a bit and created a data model that suited basic reporting on top of it.
- For this blog post, I am going to play w/ May 2010 & 2011 data
- With the help of original data-set, you can drill down to get statistics about a particular Job Category if you want. For this blog-post, I am going to share visualizations that correspond to Job categories.
- click on images to see the higher resolution image.
With that, Here are some visualizations:
1) Job Category VS mean hourly salary:
2) Job Category VS number of employees:
3) Scatter Plot:
X Axis: Number of employees
Y – Axis: Wage (Mean Hourly Salary May 2011)
Size of Bubble: Wage (Mean Hourly Salary May 2011)
*Note: This may not be the best approach to create the Scatter Plot as I have used the same value (Mean Hourly Salary May 2011) twice – But since I was just playing w/ it, I went with what I had in the model.
Here’s the visualization:
Some of the things I observed:
1) I belong to an Industry (Computer and Mathematical occupations) which has relatively higher mean hourly wage.
2) There are few people working in “farming, fishing & forestry occupations” that do not get paid much.
3) There are lots of people working in “office administrative support occupations” that do not get paid much.
4) Management Occupations, Legal Occupations and computer & mathematical occupations have relatively higher mean hourly wages.
In this post, I played w/ Occupational Employment statistics data-sets and shared some visualizations.
Visualizing data is powerful! Thanks to WordPress.com for sending me the 2012′s report – a statistic that I found very encouraging was that people from 162 countries visited this blog! All thanks to the power of Inter webs!
Thanks everyone for the support, Appreciate it!
And Here’s a beautiful Data Visualization: