Need advanced data analytics techniques to analyze profitability data
Here’s an example of how customer segmentation helped identify some low margin service offerings:
you are analyzing a dataset and before modeling/analyzing you need to generate descriptive statistics on a field. you have the data loaded in Excel and wondered if there’s a way to do that in Excel.
There’s an out of the box solution that will support your needs to generate descriptive statistics on a field. Here are the steps:
Note: for the purpose of this blog post, I am using Excel 2013 but data analysis toolpak is available in Excel 2007+.
1. Active “Data Analysis” toolpak.
Follow this steps: File > Options > Add-ins > Manage: Excel Addins > “GO”
2. make sure to check the “analysis toolpak” checkbox.
3. Now you should see a “data analysis” option under the “Data” pane:
4. Now click on “Data Analysis” and select one of the following options:
Anova, Correlation, Covariance, Descriptive Statistics, Exponential Smoothing, F-Test Two-Sample for Variances, Fourier Analysis, Histogram, Moving Average, Random Number Generation, Rank and Percentile, Regression, Sampling, t-Test, z-Test.
in this case, let’s go with descriptive statistics but you can see that you can perform other tasks as well.
5. Once you click on the descriptive statistics, a dialog box will show up and you will have to enter some data like your input range to generate descriptive statistics. Once you have filled the data needed, click on OK and it should generate descriptive statistics for you in EXCEL!
I hope that helps!
In this post, we saw how to generate descriptive statistics in Microsoft Excel.
Author: Paras Doshi
Here’s a scenario:
A Business Intelligence (BI) system for Sales is being developed at a company. Here are the events that occur:
1) Based on the requirements, It is documented that the Business needs to analyze Sales numbers by product, month, customer & employee
2) While designing the system IT learns that the data is stored at each Invoice Level but since the requirements document doesn’t say anything about having details down to invoice level, they decide to aggregate data before bringing in their system.
3) They develop the BI system within the time frame and sends it to business for data validation.
4) Business Analysts starts looking at the BI system and finds some numbers that don’t look right for a few products and need to see Invoices for those products to make sure that the data is right so they ask IT to give them invoice level data.
5) IT realizes that even though business had not requested Invoice Level data explicitly but they do NEED the lowest level data! They realize it’s crucial to pass data validation. Also, they talk with their business analysts and found out that they may sometimes need to drill down to lowest level data to find insights that may be hidden at the aggregate level.
6) so IT decides to re-work on their solution. This increases the timeline & budget set for the project. Not only that they have lost the opportunity to gain the confidence of business by missing the budget and timeline.
7) They learn to “Design BI system to have the lowest level data even if it’s not asked!” and decides to never make this mistake again in the future!
This concludes the post and it’s important to include lowest level data in your BI system even if it’s not explicitly requested – this will save you time & build your credibility as a Business Intelligence developer/architect.
Need to understand the patterns in Quality test results data across all plants.
- The solution involved creating a Business Intelligence system that gathered data from multiple plants. I was involved in mentoring IT team, development and end-user training of a Business Intelligence Dashboard that used SQL server analysis services as it’s data source.
- Dashboard development involved multiple checkpoint meetings with business leaders since this was the first time they had a chance to visualize quality test results data consolidated from multiple plants. Since they were new to data visualization, I used to prepare in advance and create 3-4 relevant visualization templates to kick off meetings.
(it is intended to look generic since I can’t discuss details. Also, drill down capabilities had been added to the dashboard to go down to the lowest granularity if needed)
Take a look at the following chart, do you see any issues with it?
Notice that the month values are shown as “distinct” values instead of shown as a “continuous” values and it misleads the person looking at the chart. Agree? Great! You already know based on your instincts what continuous and discrete values are, it’s just that we will need to label what you already know.
In the example used above, the “Date & Time” shown as a “Sales Date” is a continuous value since you can’t never say the “Exact” time that the event occurred…1/1/2008 22 hours, 15 minutes, 7 seconds, 5 milliseconds…and it goes on…it’s continuous.
But let’s say you wanted to see Number of Units Sold Vs Product Name. now that’s countable, isn’t it? You can say that we sold 150 units of Product X and 250 units of product Y. In this case, Units sold becomes discrete value.
The chart shown above was treating Sales Date as discrete values and hence causing confusion…let’s fix it since now you the difference between continuous and discrete variables:
To develop effective data visualizations, it’s important to understand the data types of your data. In this post, you saw the difference between continuous and discrete variables and their importance in data visualization.
How to sort the dimension attribute by something other than the key and name column? How do you set the “OrderBy” property?
Example: You have created an Inventory age buckets 1-50,51-100,101-150 and so if a business user uses this dimension attribute then the sorting won’t be logical. It would be 1-50, 101-150,51-100 – so how to show the buckets in the logical order?
1. make sure that the table/view that you are bringing in has the sort key.
3. Let’s start with hiding Aging Bucket Sort key so that it’s not visible to user. Change the AttributeHierarchyVisible to False
4. Now, switch to Attribute Relationships – Right Click on Aging Bucket and click on New Attribute Relationship. And set the attribute relanship between Aging bucket and Aging Bucket Sort Key
And you should see something like this in your attribute relationship section:
5. Now, one more thing to configure. Go back to dimension structure section. Open the properties section for the Aging Bucket Attribute and change the OrderBy property to AttributeKey. Also, change the orderByAttribute property to Aging Bucket Sort Key (in your case, choose the sort key that you have)
That’s it, after you process the model then you should see the attribute being sorted based on the sort key that you had.
In this post, you saw how to configure sort/order property of a dimension attribute.
In part 1, I wrote about why is it important to enable business users to create their own BI reports. In this post, part 2, I am going to share the pre-training preparations before you start training. I’ve classified into three categories: 1) Data 2) Tools 3) Culture. Let’s go through them:
Data assets needs to be published before you start training. It should be a business friendly analytic layer on top of your data sources. It could something as simple as a Power Pivot Model to a SQL server analysis cube. As long as you have an analytics layer – you’re good! Do NOT grant access to transactional systems. I’ve seen a business analyst who was considered the go-to-business-expert of a system having issues trying to create reports using the system’s relational data source – He had challenge trying to get his head around multiple tables, keys, unfriendly field names. He got something up & running but it was hard for him! What’s the lesson here? Try to make it as easy as possible for business users to use data – create an analytic layer over your data sources.
Apart from this, Data Integrity is very important! If the users don’t trust data, they are not going to use it. Invite selected set of business users to test the integrity of the data before you publish the data assets.
Also, the analytic layer that you developed should perform well. if it takes a minute to return fairly simple result, then you will have challenge driving adoption.
What tools would you use to teach business users reporting? Of course, Excel is a top choice since many of the users are already familiar with using excel. Also, Show them a Power View using YOUR data – that may get them excited enough to learn Power View.
How about SSRS report builder & performance point dashboard designer? This is mainly targeted for IT developers so it won’t be great idea to train business users using this tools.
What about Power Pivot/ Power Query? from #1, business users face challenge trying to analyze relational data sources, so test your audience to see Power Pivot/ Power Query is a fit or not. It might work well if they want to combine from couple of spreadsheets with IT’s data assets, that would work! But don’t expect business users to spend time trying to learn Power Pivot & Power Query to analyze data. Again, test it with your audience, see if they pick it up, some of your users may pick it up, great! But usually, you’ll have to create data assets (cubes/power pivot models/tabular models) to reach the masses!
Image Source: Economist & Tableau
The more time that you spend understanding the culture, the more successful you are going to get in training users. It’s because the things you’ll learn while trying to understanding an organization’s culture will be useful in content creation, delivery mechanisms, target audience selection & communicating business value of data driven decisions.
let’s step back. what is culture? It is a characteristics of a group of people. What characteristics are you trying to find before you start training? Try answering following questions:
a. Where are the “analysts”?
- Are they part of IT teams? Who requests reports from “IT analysts”. (they are your target audience!)
- Are they part of business units? (Great! Make them efficient by removing manual data copy-pasting from their to-do list)
- Do they report to CxO’s/presidents/VP’s? (Great! request examples of data driven decisions)
b. Are there examples of value generated using data driven decisions?
- understanding how business uses data to generate value is very important. you will create content using these examples!
c. How comfortable are users learning new technology?
- Have they shown resistance in learning new systems?
- Are they used to receiving ready-to-consume reports! (don’t expect them to change their behavior. But figure out the person creating reports for them. Train them! Make them better)
In summary, understand the culture of the organization, it would help you prepare before you start the training.
In this post, we saw three pre training preparations (1. Data 2. Tool Selection 3. Culture) before you start training users.
In this series, I intend to document common questions asked by Power users about Excel connected to SSAS cubes (or data warehouse) after they go beyond the basic stage of understanding & using Row labels, column labels, report filter & values in Pivot Tables. This post is #3 of N:
1. Select the Pivot Table.
2. From the toolbar, go to Pivot Table Tools > Design > Sub Totals > Do not Show Sub Totals.
you can also choose to remove it just for rows or columns. it depends on your requirement on how to layout data.
Similarly, you can remove Grand Totals:
There’s also an alternate method for removing subtotals. Put your mouse over the hierarchy level for which you do not need subtotals. Right click > remove subtotals “field name”. But you’ll have to do that for all hierarchy levels if you need to remove all subtotals.
A common question asked by power user is how to make sure that the excel file is pulling the latest data from the cube? Good news for them is that Excel files that are configured to connect to a data source like SSAS cube can be configured to automatically refresh. Here’s how:
1. From the toolbar, Go to Data > Connections
In this post, we saw how to remove grand totals & subtotals and, how to configure the automatic data refresh.
Long Title! Let me explain. This post will help you solve following problem if you run into it:
1) You are using SSMS Import data wizard to load data from a comma (or tab) separated value (CSV/TSV) file into SQL Server Table & you find that your source data values has double quotes and so you want to strip them before loading to destination table.
2) You are using SSIS to load data from a CSV/TSV file into SQL Server Table & you want to strip the double quotes in source table fields before you load the data to destination table.
1. After you’ve configured the Flat File connection. you’ll reach to a point where you’ll see “Flat File Connection Manager” in SSIS. Or in the SQL Server Import & Export data wizard, you’ll see a dialog box to configure flat file connections.
2. In the Text Qualifier, enter “
4. That’s it! You’ve successfully configured the flat file connection manager to strip double quotes.