Top two key techniques to analyze data:

There are many techniques to analyze data. In this post, we’re going to talk about two techniques that are critical for good data analysis! They are called “Benchmarking” and “Segmentation” techniques – Let’s talk a bit more about them:

1. Benchmarking

It means that when you analyze your numbers, you compare it against some point of reference. This would help you quickly add context to your analysis and help you assess if the number if good or bad. This is super important! it adds meaning to you data!

Let’s look at an example. CEO wants to see Revenue numbers for 2014 and an analyst is tasked to create this report. If you were the analyst, which report would you think resonated more w/ the CEO? Left or Right?

Benchmarking data providing context in analysis

I hope the above example helped you understand the importance of providing context w/ your data.

Now, let’s briefly talk about where do you get the data for benchmark?

There are two main sources: 1) Internal & 2) External

The example that you saw above was using an Internal source as a benchmark.

An example of an external benchmark could be subscribing to Industry news/data so that you understand how your business is running compared to similar other businesses. If your business sees a huge spike in sales, you need to know if it’s just your business or if it’s an Industry wide phenomenon. For instance, in Q4 most e-commerce sites would see spike in their sales – they would be able to understand what’s driving it only if they analyze by looking at Industry data and realizing that it’s shopping season!

Now, let’s shift gears and talk about technique #2: Segmentation.

2. Segmentation

Segmentation means that you break your data into categories (a.k.a segments) for analysis. So why do want to do that? Looking at the data at aggregated level is certainly helpful and helps you figure out the direction for your analysis. The real magic & powerful insights are usually derived by analyzing the segments (or sub sets of data)

Let’s a look at an example.

Let’s say CEO of a company looks at profitability numbers. He sees $6.5M and it’s $1M greater than last years – so that’s great news, right? But does that mean everything is fine and there’s no scope of optimization? Well – that could only be found out if you segment your data. So he asks his analyst to look at the data for him. So analyst goes back and after some experimentation & interviews w/ business leaders, he find an interesting insight by segmenting data by customers & sales channel! He finds that even though the company is profitable – there is a huge opportunity to optimize profitability for customer segment #1 across all sales channel (especially channel #1 where there’s a $2M+ loss!) Here’s a visual:

segmentation data Improve profitability low margin service offerings customers

I hope that helps to show that segmentation is a very important technique in data analysis!

Conclusion:

In this post, we saw segmentation & benchmark techniques that you can apply in your daily data analysis tasks!

Answering a question using data: Are marketers around the globe shifting their dollars to digital ads?

Answering a question using data: Are marketers around the globe shifting their dollars to digital ads?

According to the data shared by emarketer, we can clearly see that the Traditional Ad market is reaching a saturation state in 5 major economies (US, China, UK, Japan, Germany) while the digital ad market will see steady growth in some economies & explosive growth in US & China…but the market size of traditional ads will still certainly remain much bigger in US while market size of digital ads in china will overtake the traditional ads in 2017.

So to answer the question: Marketers are not decreasing their existing budgets for traditional ad channels but the increased marketing budget dollars seems to be directed to digital ads market.

Very interesting data-set, I encourage you to play with it!

Thanks Avinash Kaushik for sharing this interesting tool.

I was playing with the data using Excel & Tableau, here’s a public workbook if you’re interested: https://public.tableausoftware.com/profile/paras.doshi#!/vizhome/WorldWideAdSpend/Dashboard-DigitalAdSpendvsTraditionalAdSpend

YoY growth - Digital Ad Spends vs Traditional Ad Spend

Now, it’s your turn! What insights do you get from this data?

#sqlpass webinar: “Data Analytics Explained for Business Leaders” on 1/15

A quick blog post to let you know about a #sqlpass webinar on 1/15.

Data Analytics Explained for Business Leaders

Thu, Jan 15 2015 12:00 (UTC-05:00) Eastern Time (US & Canada)

RSVP: http://bit.ly/PASSBAVC011515


Abstract:

Description: The world is becoming more efficient. Today, seventy percent of the companies that graced the Fortune 1000 list a mere decade ago have vanished. Agility and survival are function of innovation, culture, and the ability to predict the future. To that end, data analytics offers a lifeline, a means of survival that will drive productivity and continue to disrupt and redefine business. However, the resources available to today’s business leaders sit on two vastly different ends of the spectrum. On the one hand, highly technical academic resources and on the other largely fluffy overviews of value propositions and potentials. The state of the industry shouldn’t be surprising. The same dynamics played out in early years of the internet. Software providers, technical leaders, and consulting firms greatly benefit from mystifying the world of data analytics into something that is incomprehensible. That lack of conceptual understanding is incredibly risky and propels the cost of analytics initiatives upwards. This webcast aims to bridge that gap between the technical data scientists and business leaders. Ultimately, this understanding will help to: – Connect the strategic goals of business leaders with the capabilities of technical advisers – Focus investments and initiatives within analytics and technology – Distill immensely complex subject matter into comprehensible examples – Accelerate the path to value and increase the ROI of analytics initiatives


Speaker Bio

Alex is a Predictive Analytics Architect in the Oil and Gas industry with a passion for distilling complexity into insights and evangelizing data science. His work has been featured on KDNuggets and he was recognized by DataScienceCentral as a top 180 blogger in 2014.

RSVP: http://bit.ly/PASSBAVC011515

I hope to see you there!

Example of using segmentation to identify low-margin service offerings:

Example of using segmentation to identify low-margin service offerings:

Problem:

Need advanced data analytics techniques to analyze profitability data

Solution:

Here’s an example of how customer segmentation helped identify some low margin service offerings:

Improve profitability low margin service offerings customers

Business Intelligence system – Customer Complaints – B2B company:

Business Intelligence system – Customer Complaints – B2B company:

Analyzing customer complaints in crucial for customer service & sales teams. It helps them increase customer loyalty and fix quality issues. To that end, here’s a mockup:

Note: Drill down reports are not shown, details are hidden to maintain confidentiality and numbers are made up.

Customer complaint dashboard quality feedback

How to get descriptive statistics in Excel?

Problem:

you are analyzing a dataset and before modeling/analyzing you need to generate descriptive statistics on a field. you have the data loaded in Excel and wondered if there’s a way to do that in Excel.

Solution:

There’s an out of the box solution that will support your needs to generate descriptive statistics on a field. Here are the steps:

Note: for the purpose of this blog post, I am using Excel 2013 but data analysis toolpak is available in Excel 2007+.

1. Active “Data Analysis” toolpak.

Follow this steps:  File > Options > Add-ins > Manage: Excel Addins > “GO”

excel data analysis toolpak

2. make sure to check the “analysis toolpak” checkbox.

3. Now you should see a “data analysis” option under the “Data” pane:

Excel Data Analysis Descriptive Statistics

4. Now click on “Data Analysis” and select one of the following options:

Anova, Correlation, Covariance, Descriptive Statistics, Exponential Smoothing, F-Test Two-Sample for Variances, Fourier Analysis, Histogram, Moving Average, Random Number Generation, Rank and Percentile, Regression, Sampling, t-Test, z-Test.

in this case, let’s go with descriptive statistics but you can see that you can perform other tasks as well.

5. Once you click on the descriptive statistics, a dialog box will show up and you will have to enter some data like your input range to generate descriptive statistics. Once you have filled the data needed, click on OK and it should generate descriptive statistics for you in EXCEL!

I hope that helps!

Conclusion:

In this post, we saw how to generate descriptive statistics in Microsoft Excel.

Author: Paras Doshi

Back to basics: Design your Business Intelligence system to have lowest level data even if it’s not asked!

Here’s a scenario:

A Business Intelligence (BI) system for Sales is being developed at a company. Here are the events that occur:

1) Based on the requirements, It is documented that the Business needs to analyze Sales numbers by product, month, customer & employee

2) While designing the system IT learns that the data is stored at each Invoice Level but since the requirements document doesn’t say anything about having details down to invoice level, they decide to aggregate data before bringing in their system.

3) They develop the BI system within the time frame and sends it to business for data validation.

4) Business Analysts starts looking at the BI system and finds some numbers that don’t look right for a few products and need to see Invoices for those products to make sure that the data is right so they ask IT to give them invoice level data.

5) IT realizes that even though business had not requested Invoice Level data explicitly but they do NEED the lowest level data! They realize it’s crucial to pass data validation. Also, they talk with their business analysts and found out that they may  sometimes need to drill down to lowest level data to find insights that may be hidden at the aggregate level.

6) so IT decides to re-work on their solution. This increases the timeline & budget set for the project. Not only that they have lost the opportunity to gain the confidence of business by missing the budget and timeline.

7) They learn to “Design BI system to have the lowest level data even if it’s not asked!” and decides to never make this mistake again in the future!

This concludes the post and it’s important to include lowest level data in your BI system even if it’s not explicitly requested – this will save you time & build your credibility as a Business Intelligence developer/architect.

Business Intelligene Dashboard for Quality Managers

Business Intelligene Dashboard for Quality Managers

Business Goal:

Need to understand the patterns in Quality test results data across all plants.

Summary:

- The solution involved creating a Business Intelligence system that gathered data from multiple plants. I was involved in mentoring IT team, development and end-user training of a Business Intelligence Dashboard that used SQL server analysis services as it’s data source.

- Dashboard development involved multiple checkpoint meetings with business leaders since this was the first time they had a chance to visualize quality test results data consolidated from multiple plants. Since they were new to data visualization, I used to prepare in advance and create 3-4 relevant visualization templates to kick off meetings.

Mockup:

(it is intended to look generic since I can’t discuss details. Also, drill down capabilities had been added to the dashboard to go down to the lowest granularity if needed)

Quality Test Results Dashboard

Back to basics: continuous Vs. Discrete variables and their importance in Data Visualization.

Take a look at the following chart, do you see any issues with it?

month trend chart line chart string to date

Notice that the month values are shown as “distinct” values instead of shown as a “continuous” values and it misleads the person looking at the chart.  Agree? Great! You already know based on your instincts what continuous and discrete values are, it’s just that we will need to label what you already know.

In the example used above, the “Date & Time” shown as a “Sales Date” is a continuous value since you can’t never say the “Exact” time that the event occurred…1/1/2008 22 hours, 15 minutes, 7 seconds, 5 milliseconds…and it goes on…it’s continuous.

But let’s say you wanted to see Number of Units Sold Vs Product Name. now that’s countable, isn’t it? You can say that we sold 150 units of Product X and 250 units of product Y. In this case, Units sold becomes discrete value.

The chart shown above was treating Sales Date as discrete values and hence causing confusion…let’s fix it since now you the difference between continuous and discrete variables:

Statistics Discrete Continuos Variable Data Visualization

Conclusion:

To develop effective data visualizations, it’s important to understand the data types of your data. In this post, you saw the difference between continuous and discrete variables and their importance in data visualization.

SQL Server Analysis services – How to set the order by attribute sort key?

Problem:

How to sort the dimension attribute by something other than the key and name column? How do you set the “OrderBy” property?

Example: You have created an Inventory age buckets 1-50,51-100,101-150 and so if a business user uses this dimension attribute then the sorting won’t be logical. It would be 1-50, 101-150,51-100 – so how to show the buckets in the logical order?

Solution:

1. make sure that the table/view that you are bringing in has the sort key.

Example:

1 SSAS Attribute order by sort key2. Now, switch to SSAS and open your dimension. I am assuming that you’ve already configured your data source views and you are already bringing in these columns in the dimension:

Dim Inventory SSAS SSIS VIEW Data source VIEW

3. Let’s start with hiding Aging Bucket Sort key so that it’s not visible to user. Change the AttributeHierarchyVisible to False

4. Now, switch to Attribute Relationships – Right Click on Aging Bucket and click on New Attribute Relationship. And set the attribute relanship between Aging bucket and Aging Bucket Sort Key

Attribute Relationships SSAS

And you should see something like this in your attribute relationship section:

SSAS Attribute Relationship Sort Key

5. Now, one more thing to configure. Go back to dimension structure section. Open the properties section for the Aging Bucket Attribute and change the OrderBy property to AttributeKey. Also, change the orderByAttribute property to Aging Bucket Sort Key (in your case, choose the sort key that you have)

SSAS Order Sort by attribute property

That’s it, after you process the model then you should see the attribute being sorted based on the sort key that you had.

Conclusion:

In this post, you saw how to configure sort/order property of a dimension attribute.