Data Reporting ≠ Data Analysis

One of the key thing I’ve learned is importance of differentiating the concepts of “Data Reporting” and “Data Analysis”. So, let’s first see them visually:

data analysis and data reporting

Here’s the logic for putting Data Reporting INSIDE Data Analysis: if you need to do “analysis” then you need reports. But you do not have to necessarily do data analysis if you want to do data reporting.

From a process standpoint, Here’s how you can visualize Data Reporting and Data Analysis:

data analysis and data reporting process

Let’s thing about this for a moment: Why do we need “analysis”?

We need it because TOOLS are really great at generating data reports. But it requires a HUMAN BRAIN to translate those “data points/reports” into “business insights”. This process of seeing the data points and translating them into business insights is core of what is Data Analysis. Here’s how it looks visually:

Data analysis Data Reporting

Note after performing data analysis, we have information like Trends and Insights, Action items or Recommendations, Estimated impact on business that creates business value.

Conclusion:

Data Reporting ≠ Data Analysis

Resource: 12 recorded sessions from the 24hop business analytics edition are online! #passbac #msbi

Recently, PASS hosted a 24hop business analytics event:

And now, the 12 one hour sessions ranging from data visualization, predictive analytics to Big Data are online for you to watch! They also serve as “Trailer” for what you can expect at the PASS Business Analytics conference!

Here’s the URL: http://passbaconference.com/Sessions/SneakPeeks.aspx

And I was following some of these sessions live on the event day – and I can tell you, these sessions are great resources!

Also, I participated in the twitter contest (by Microsoft BI) that was happening along w/ the event – and this is what I got for my win!

24 hop twitter contest prize

hoodie w/ embedded earphones!

That’s about it for this post. Enjoy the recordings!

Found something interesting by exploring a “List of companies by revenue” Data Set:

I like exploring data sets to find interesting patterns from them. To that end, I was exploring a data-set: List of companies by revenue and I added a column to calculate Revenue/Employee to explore the dataset:

And I found an outlier!

Here’s the outlier: Exor

Here’s what it’s interesting:

It’s revenue in 2012 is: 109.15 billion USD

And number of employees is just 40!

Just think of Revenue/Employee !

To put things in perspective, Lets Compare that with its neighbor in the data-set:

Rank | Company | Industry | Revenue in USD billion | Employees

48 Koch Industries Conglomerate 110.00 60000.00
49 EXOR Investment 109.15 40.00
50 Cardinal Health Pharmaceuticals 107.55 40000.00
51 CVS Caremark Retail 107.10 202000.00
52 IBM Computer services 106.92 433362.00

I got to know about this by quickly creating a data visualization to explore the data-set:

list of companies by revenue

And removing Trafigura, Vitol and Exor, this is what we have:

power view excel 2013 rank revenue employees

Observation: oil and gas industry have relatively higher revenue/employee ration.

That’s about it for this post. Thanks for reading about my data exploration!

Quick Note about SAS’s acroynm SEMMA:

SEMMA is an acronym introduced by SAS which stands for:

Sample, Explore, Modify, Model and Assess.

I had recently posted about the Data Mining & Knowledge Discovery Process which had following sequential steps:

Raw Data => cleaning => sampling => Modeling => Testing

SEMMA follows the similar sequential steps as we had seen in the data mining process. So while Data Mining process is applicable to any data mining tool out their, SEMMA helps when you use SAS enterprise miner. In fact, it has helped me quickly find the data mining functions available in SAS tool:

sas sample explore modify model assess

Back to basics: Data Mining and Knowledge Discovery Process

Once in a while I go back to basics to revisit some of the fundamental technology concepts that I’ve learned over past few years. Today, I want to revisit Data Mining and Knowledge Discovery Process:

Here are the steps:

1) Raw Data

2) Data Pre processing (cleaning, sampling, transformation, integration etc)

3) Modeling (Building a Data Mining Model)

4) Testing the Model a.k.a assessing the Model

5) Knowledge Discovery

Here is the visualization:

knowledge discovery process data miningAdditional Note:

In the world of Data Mining and Knowledge discovery, we’re looking for a specific type of intelligence from the data which is Patterns. This is important because patterns tend to repeat and so if we find patterns from our data, we can predict/forecast that such things can happen in future.

Conclusion:

In this blog post, we saw the Knowledge Discovery and Data Mining process.

Three V’s of Big Data with Example:

In this blog-post, we would see the Three V’s of Big Data with Example:

1. Volume:

TB’s and PB’s and ZB’s of data that gets created:

From the webinar “How to Walk The Path from BI to Data Science: An interview with Michael Driscoll, data scientist and CEO of Metamarkets” – A global surge in Data

2. Velocity:

The speed at which information flows.

Example: 50 Million tweets per day!

twitter 50 million tweets per day

(This is back in Nov. of 2010 – the number must have increased!)

UPDATE 23 Nov 2012: on, wikipedia it says – 340 million tweets per day!

twitter 2012 340 million tweets per day

3. Variety:

All types of data is now being captured which may be in structured format or not.

Example: Text from PDF’s, Emails, Social network updates, voice calls, web traffic logs, sensor data, click streams, etc

data variety big data

Image courtesy

And this may be followed by other V’s like V for Value.

Conclusion:

In this blog-post, we saw Three V’s of Big Data with Example.

Related Posts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?

Book Review: The Data Journalism Handbook

Data Journalism Book CoverIn this post, I am going to write the Book Review for The Data Journalism Handbook

Earlier, I had shared an insight from the Book with you, Here it is: “World has changed, from what’s NEW to “what does it all Mean” – This means that Professionals who focus on reporting “what’s new” would soon be “out of job”. And they should start equip themselves with Analytics skills that helps them uncover insights from all the news around us and help us all make sense of information that’s all around us.

To that end, The book “Data Journalism” is a great inspiration for Journalist and it seems it’s meant to encourage journalist to start embracing the change. It inspires Journalists to think of stories and find data about it. So what’s it for Data Geeks? It encourages Data Geeks to help journalists weave story around the data that they found. The book also outlines resources that Data Geeks could use.

Now, Two things I really Liked about the book:

1. Examples & case-studies, Lots of them! very inspiring!

2. I came to know about Tools that I didn’t knew about before. I am going to use them!

You can read the book online (web version) for free here: http://datajournalismhandbook.org/

Five examples of Recommendation Systems on the web:

Recommendation systems is application of Data Mining Technologies. I have researched about how to implement a recommendation system and as a part of my research, I studied recommendation systems that are already out there on the Internet and here are five examples of Recommendation systems on the web:

1. Amazon

Customers Who Bought This Item Also Bought:

recommendation systems amazon customers who bought this also bought

Frequently Bought Together: (Example of Market Basket Analysis a.k.a Association Rules):

recommendation systems amazon frequently bought together

2. LinkedIn

You should read this: How does LinkedIn’s recommendation system work? – it would open up your brain to “recommendation” opportunities around you!

Jobs you may like + Groups you may like + Companies you may follow:

recommendation systems Linkedin Groups Jobs Companies

3. Netflix

Did you knew about Netflix Prize for improving their recommendation engine? If not you should read that!

Here’s their Movies you’ll love recommendation system:

netflix prize recommendation system

4. Twitter

People you may want to follow:

twitter who to follow recommendations data mining

5. Google

I do not have a screenshot but just wanted to point out the Google “personalize” (a.k.a recommends based on past behavior) search results based on your search history. And you can switch that off, if you want: Turn off search history personalization

Conclusion

In this blog-post, we saw examples of recommendation systems. The key take away is that there is more than one approach to building a recommendation system. The approaches can be based on 1. Past Behavior 2. Past Behavior of “friends” 3. Recommendation based on the Item that is being searched And you can definitely, Mix and Match!

And I hope this post helped you understand an application of data mining that’s all around us! And question: Where else do you see recommendation systems in action? Leave a comment!

Things I shared on Social Media Networks during Oct 19 – Nov 11

The Goal of this series is to recap the conversations that I’m having on social networks and I do not want my Blog readers to miss that. So Here is the recap of last three weeks:

1. I was at SQL PASS 2012!

SQL PASS 2012 Paras Doshi

2. A nice Dashboard!

Metro fied Business Intelligene Dashboard windows 8

3. Learn to build an Enterprise Information management system using SSIS, DQS and MDS:

http://parasdoshi.com/2012/11/07/resource-learn-to-build-a-enterprise-information-management-system-using-data-quality-services-master-data-services-and-sql-server-integration-services/

 Enterprise Information management system using SSIS, DQS and MDS

4. Fake Data!

5. I reached 2000 points on MSDN!Paras Doshi reached 2000 points on MSDN!

6. A nice video by Jeremy Howard on Predictive Analytics:

7. A nice data visualization via the Data Mining add-in excel

nice data visualization via the Data Mining add-in excel

8. Get started on Hadoop on windows 7/server!

Download here: http://parasdoshi.com/2012/10/27/getting-started-with-hdinsight-a-k-a-microsofts-big-data-hadoop-platform-on-local-windows-machine/

Demo Here: http://parasdoshi.com/2012/11/02/end-to-end-demo-hadoop-hdinsight-hive-excel-power-view-azure-data-market/

Hadoop on windows 7/server!

9. I was at Give Camp 2012! if you do not know about “Give Camp”, then you should check it out!

Here’s last year’s (2011) post: http://parasdoshi.com/2011/10/24/i-gave-back-at-dallas-givecamp-and-why-i-think-every-software-professional-should-consider-doing-so-too/

Give Camp 2012

Let’s connect and converse on any of these people networks!

paras doshi blog on facebookparas doshi twitter paras doshi google plus paras doshi linkedin

Data Mining Demo for Marketing vertical: How to create a Targeted mailing list?

Tools I’ll be using for the Demo:

Excel 2010

SQL Server 2012 (specifically SQL Server Analysis Services)

Excel Add-in for Excel.

Sample data-set that comes with the excel add-in

Scenario:

Marketing Department needs to create Targeted Mailing list.

What data do we need?

To create a Targeted mailing list – we’ll need a historical data-set of customer purchase history

What will we do with the data?

Based on the historical data-set, we’ll be able to find “patterns” in the past consumer behavior. E.g. A single male going to college living in Europe is likely to buy a bike. And the using these patterns – we would then classify NEW customers.

Technically, we’ll be using the classification method using the Microsoft’s decision Tree algorithm

(Read the difference between classification and clustering)

Let’s get in action!

STEP 1: Build a Model

Data Mining Tab > click on classify:

data mining in excel example customer classification for maketing maling list 0

Follow the steps:

data mining in excel example customer classification for maketing maling list 1

Select the data:

data mining in excel example customer classification for maketing maling list 2

In this case, since we want to predict the likelihood of buying a bike – our column to analyze is BikeBuyer

 

data mining in excel example customer classification for maketing maling list 0 3

For the Demo, I am going to just leave it default. There are “optimization” steps that you can do but for the demo I am going to keep it super simple

data mining in excel example customer classification for maketing maling list 4

Name the model:

data mining in excel example customer classification for maketing maling list 5

The Model has been created!

data mining in excel example customer classification for maketing maling list 6

STEP 2: Query the MODEL to predict the likelihood of bike purchase of a new customer

data mining in excel example customer classification for maketing maling list 7

Select the model:

data mining in excel example customer classification for maketing maling list 8

Select the data:

data mining in excel example customer classification for maketing maling list 9

Specify the columns that would be used in predicting the likelihood:

data mining in excel example customer classification for maketing maling list 10

Add the column that will have the “predicted value”

 

data mining in excel example customer classification for maketing maling list 11

And example of Data Mining Expressions (DMX):

data mining in excel example customer classification for maketing maling list 12

For the demo, I am just going to add the column to the existing table:

data mining in excel example customer classification for maketing maling list 13

Yay! Here’s our Targeted Mailing list – see the last column:

Screenshot 1

data mining in excel example customer classification for maketing maling list 14

Screenshot 2:

data mining in excel example customer classification for maketing maling list 15

Now what?

Marketers can now send “coupons” to ONLY those people who are most likely to buy a bike! And so that’s how you create a targeted mailing list using the Excel Data Mining add-in.