Data Analysis and In Memory Technologies, let’s connect the dots:

SPEED is one of the important aspect of Data Analysis. Wouldn’t it be great if you query a data source, you get your answers as soon as possible? Yes? Right! Of course, it depends on factors like the size of the data you are trying to query but wouldn’t it be great if it’s at “SPEED OF THOUGHT“?

So Here’s the Problem:

Databases are mostly disk based and so the bottleneck here is the speed at which can get access to data off the disks.

So what can you do?

Let’s put data in a RAM (memory) because data-access via memory is faster.

If it’s sounds so easy, why didn’t people do it earlier? And why are we talking about “In Memory” NOW?

1) BIGGER Data Size/sets and so today with more data, it takes more time to query data from databases. And so researchers have looked at other approaches. One of the effective approach they found is: In-memory

(And I am not ignoring the advances in Database Technologies like Parallel databases, But for the purpose of understanding “Why In-memory”, it’s important to realize the growing size of data sets and a viable alternative we have to tackle the problem: In memory. And also I am not saying that it’s the ONLY way to go. I am just trying to understand the significance of in-memory technologies. We, as data professionals, have lot’s of choices! And only after evaluating project requirements, we can talk about tools and techniques)

2)  PRICE of Memory: Was the price of RAM/memory higher than what it is today? So even though it was a great idea to put data in memory, it was cost-prohibitive.

So Let’s connect the dots: Data Analysis + In Memory Technologies:

What’s common between Microsoft’s PowerPivot, SAP HANA, Tableau and Qlikview?

1) Tools for Data-Analysis/Business-Intelligence 2) Their Back End data architecture is “In Memory”

So since Data Analysis needs SPEED and In-Memory Technologies solves this need – Data Analysis and Business Intelligence Tools adopted “In-memory” as their back-end data architecture. And next time, when you hear a vendor saying “in-memory”, you don’t have to get confused about what they’re trying to say. They’re just saying that we got you covered by giving you ability to query your data at “speed of thought” via our In-memory technologies so that you can go back to your (data) analysis.

That’s about it for this post. Here’s a related post: What’s the benefit of columnar databases?

your comments are very welcome!

A Social Media Analytics Sample Dashboard in Excel Powered by PowerPivot.

I found a great sample Dashboard on Social Media Analytics in Excel that is powered by PowerPivot. Here’s the screenshot of the Dashboard.

excel powerpivot twitter social media analytics dashboard 1

Here are the steps if you want to download and play with the Dashboard:

  1. Install Power Pivot add-in
  2. Download the “Analytics for Twitter” excel sample (powered by PowerPivot). Link: http://www.microsoft.com/en-us/download/details.aspx?id=26213
  3. It creates an “Analytics for Twitter” excel file on Desktop > Open it.
  4. The dashboard is powered by data it pulls in the Power Pivot:excel powerpivot twitter social media analytics dashboard
  5. You can change the search queries:
    a. Edit the default search terms:excel powerpivot twitter social media analytics dashboard
    b. Refresh Data:excel powerpivot refresh data
    c. Updated Dashboard!excel powerpivot twitter social media analytics dashboard

That’s about it. And here’s a Youtube Video showing some features in this sample:

Conclusion:
In this blog-post, I shared a great sample dashboard built on top of PowerPivot model.

 

PowerPivot: Quick Tip regarding Copy-Pasting Tables in PowerPivot 2010

Power Pivot Paras Doshi Microsoft Business Intelligence Excel 2010I was working on couple of PowerPivot models recently and both models needed a Date Table. So here’s what I did:

1. Imported the filtered version of the Date Table (DateStream) from Azure Data Market. Added Calculated Columns to the table. Let me call this Table1

2. I copied entire Table1 and pasted it to an another PowerPivot model – let’s call that Table2

 

After this:

And I went to happily create data models! Later, I while testing the model (that had Table2), I realized that it had lost the “formulas” – in other words when I copy-pasted from Table1 to Table2 in PowerPivot – it did not copy-paste the formulas but it just pasted the values.

So, 1. Copy Pasting a Table in a PowerPivot does not copy-paste’s the Formula’s in the columns.

And also it does not keep the “Data connection”. In this case, Table1 was connected to Azure Data source and I could “refresh” data if I chose to. But Table2 was “Static” – In other words, It lost the data connection with the data sources. So I could not “refresh/update” data from the data source if need be.

S0, 2. If you copy Paste a Table in a PowerPivot, the newly copied table does not keep the “Data connection” to data source.

 

I have also posted this here: http://beyondrelational.com/modules/1/justlearned/tips/17976/power-pivot-copy-paste-a-table-in-a-powerpivot-does-not-copy-pastes-the-formulas.aspx

Crunch more than 1 million rows in Excel 2010 with free addin called Power Pivot!

Lately, I have been talking to few business folks who do their own data analysis in excel (2010) and sometimes they run into the excel 2010 limit of 1 million rows. And so when I hear that, I talk about Power Pivot and I talk about what It can do and what it cannot and they are just amazed that there’s a FREE add-in that will help them crunch more than 1 million rows!

happy suprised business user excel power pivot

Image courtesy

You can explore more about this amazing add-in here: http://www.microsoft.com/en-us/bi/powerpivot.aspx

And Read more about pros/benefits of PowerPivot:

Top 5 Ways PowerPivot Helps Excel Pros

PowerPivot? But I use pivot tables in Excel

Things I shared on Social Media Networks during Oct 3 – Oct 10

I am fairly active on social network channels. I engage with people on Social networks and I thought It would be great if my blog readers also get a recap of conversations I am having on different social networks and so I’ve thought of compiling a “Things I shared on Social Media networks during the week” . Here is the recap for the week of OCT 3 – OCT 10:

1. Gaurang Patel commented about the Blog: “Five things I like about RescueTime” on FB page: i am using this one it is really fantastic . 

The blog is here: Five things I like about RescueTime:

Gaurang Patel on Five Things I like about RescueTime

2. Photo with a caption that I added:  That’s the CLOUD.

Originally shared by Windows Azure account

That's the cloud windows azure servers data centers

3. Quote my APJ Abdul Kalam (Ex-President of India) . Did you knew that in 2011, I got a chance to meet him! Read here: Met revered APJ Abdul Kalam (Ex President, India) at an event organized by Times Foundation

Quote by APJ Abdul Kalam

4. Nice Data Visualization: Originally shared by https://www.facebook.com/avinash.kaushik and link to the source of the data Visualization: http://www.geology.wisc.edu/homepages/g100s2/public_html/history_of_life.htm

Nice Data visualization paras doshi

5. “I was working on Business Intelligence project requirement analysis. One requirement that I saw across all department: Need a data mart (single version of truth)”

This post was shared by SolidQ on Google+

Enterprises need data mart and data warehouses

6. On G+, SolidQ shared my post that I wrote a while Back. Here’s the post: Step by Step guide to Export a SQL Azure Database to Azure storage via Import and Export CTP

SolidQ shared paras doshi blog on Google plus

BTW, did you knew the same blog post was Re-Tweeted by Scott Gu on 31st Dec 2011! Yup!

Scott Gu RT'ed Paras Doshi's Tweet

7. RescueTime tweeted:

Check the end of this post to see how RescueTime user @Paras_Doshi was able to cut his social networking time in half! http://buff.ly/PlQfdg

8. SolidQ shared this post: “Earlier Today: delivered a two-hour session on PowerPivot and Power View to a client. they had some very Interesting questions for their scenario!”

https://plus.google.com/u/0/105010538932095629627/posts/2doLbjrQLLo

paras doshi delivered a power pivot and power view session solidq

9. SolidQ shared this post on G+: https://plus.google.com/u/0/105010538932095629627/posts/dQVXEH7Zm1U

Data Visualization: Created HeatMap/TreeMap like the one shown below for a client. Looking forward to receiving their feedback. btw, I used the Black, Grey and white for the shades. image courtesy: http://www.labescape.com/info/articles/what-is-a-heat-map.html

paras doshi created a heat map for a client at SolidQ

Let’s connect and converse on any of these people networks!

paras doshi blog on facebookparas doshi twitter paras doshi google plus paras doshi linkedin

New Journal Article Published – Title: “Building an Ideal Tabular Model for Power View reports”

Part 2/2 of the series on building Ideal Models for Power View reports is live!

Summary: “In this article, we will first compare the PowerPivot and Tabular models, which will help you choose between these two models for your scenarios, and then we will study the reporting properties in a Tabular Model that you can configure to build an Ideal model for Power View reports.”

Read here: SolidQ Journal: Building an Ideal Tabular Model for Power View reports by Paras Doshi

Blog-Post about Part 1 is here: SolidQ Journal: Building Ideal PowerPivot Model for Power View reports

 

If you have any feedback or comments, please drop a comment or contact me.

PowerPivot Model: How to TEST or EDIT existing connections?

Have you ever had someone sent you a PowerPivot model and asked you to do something with it? And if so – may be, you would have to see what data source(s) the model is using and if applicable, you test the existing connections. If you find yourselves in such a situation, this blog post is for you:

1) Open the excel file and switch to PowerPivot Window

2) Now, switch to Design Tab > Click on Existing Connections:

existing connection design tab of powerpivot model

3) Here you’ll find the list of connections under “PowerPivot Data Connections” > Select the connection you wish to TEST or EDIT > Click on EDIT button

4)  Now here you can edit the data source. And if you click on “Test Connection”, you’ll be able to test it too.

edit data source connection power pivot model

5) After you’re done, click on SAVE.

And you’ll now see the “Existing Connections” Box again:

powerpivot select a connection to a data source that contains the data you want to import

Click CLOSE and you’re done, you have successfully edited or tested the existing connection.

 

 

PowerPivot Model: Why am I not seeing “Month Names” in correct logical order?

This blog post is for people who have seen the reports built on PowerPivot model where the Month Names are not in correct logical order. So instead of  “January, February, March, April …” (which is correct logical order), the order in the report would be displayed as “April, February, January, March..” (which is NOT correct).

This is what I am talking about:

powerpivot model month name not sorted correctly

Here, Month names are not sorted correctly, right? So how do we solve this? Let’s see this in this blog post!

Understanding the Sample Data-set

Optional: Download a sample data-set to practice what’s described in this blog-post: Download – Paras Doshi Blog’s sample data set

Now the data looks like this:

Date Daily New number Month Month Name
1/1/2012 0:00 0 1 January
1/2/2012 0:00 0 1 January
1/3/2012 0:00 0 1 January
1/4/2012 0:00 0 1 January
1/5/2012 0:00 0 1 January
1/6/2012 0:00 0 1 January
1/7/2012 0:00 0 1 January
….. …… …… ……

Note that “Daily new number” is used for the demo purpose. I had to anonymize the data before I could share it with you all!

Originally, the table had just two columns but since I wanted to add “time Intelligence” to the model. I added two columns Month and Month Name.

Now here’s the DAX behind these two columns:

Column DAX
Month MONTH(‘Sample Table’[Date])
Month Name RELATED(Month[MonthName])

There’s a relationship between Month Column of “Sample Table” and Month column of “Month” table

And here’s what Month Table looks like:

Month MonthName
1 January
2 February
3 March
4 April
5 May
6 June
7 July
8 August
9 September
10 October
11 November
12 December

Creating a Report on Top of Sample Data-set.

Now if you create the report of Month Name vs Daily New Number, it will look like:

powerpivot model month name not sorted correctly

Oops, Problem detected: Month name are not in correct order.

So now let’s solve it.

Solution to sort month name correctly

Let’s solve this issue in our PowerPivot Model. Here’s what you do:

1. Go to PowerPivot model, Select “Month Name” column from the sample table

2 Toolbar > Home > Sort By column > click on “Sort by column”

3. And set the properties in the dialog box as:

powerpivot model sort by column dialog box

Now, go back to the report and refresh the connection to the model.

Done!

powerpivot model month name are sorted correctly

This is so because by default since the month name is “Text” – it was sorted in A to Z format. But you saw how we can use the sort by column property in PowerPivot model to fix this issue.

That’s about it for the post! your feedback is welcome!

Want to Read More? Here are few links:

How to Solve Error: After Reboot, SharePoint sites says “Service Unavailable”

I had to reboot my Dev Machine on which I have my SQL Server 2012 Business Intelligence Setup. After reboot, It turns out that when I wanted to open the SharePoint Site (for Power View related work) – it gives me error:

“Service Unavailable

HTTP Error 503. The Service is unavailable”

sharepoint service unavailable http error 503

Now, I did quick searches found this and this

Turns out, for my scenario – the default application pool had stopped:

default application pool stopped

So this is what I did to solve the error:

1) Verified that the account and password for the App Pool is correct

2) Started it.

This is how you do it:

Start > IIS Manager > Application pools > Locate Default Application Pool that stopped > Verify that Account and Password are correct. > Start it.

After doing that, the SharePoint site started working again!

Related articles:

What’s the benefit of columnar databases?

I hear you ask: “why are you writing about columnar databases?”. To answer that, I have spent some time researching about PowerPivot and Tabular Model these days which is powered by what Microsoft calls xVelocity (previously called Vertipaq) engine. It’s a columnar in memory Engine. And curiosity got better of me when I read the word: Columnar and I wondered what’s the benefit of Columnar Databases (aka column oriented databases) ? And why do they just not use the row-oriented database that powers OLTP workload and to answer the questions I did some research and here’s what I found:

First of all, let’s understand the difference between how row-oriented database and column-oriented database stores data:

Consider that we have a Table like this:

ID Quarter ….. Sales
1 Q1 ….. 100
2 Q1 ….. 120
3 Q1 ….. 110
4 Q1 ….. 130
5 Q2 ….. 150
6 Q2 ….. 100

Now, in a row-oriented databases, it’s stored like this:

1 Q1 …. 100
2 Q1 …. 120

..

So on.

 

And in a column oriented databases, it’s stored like this:

1 2 3 4 5 6

 

Q1 Q1 Q1 Q1 Q2 Q2

 

100 120 110 130 150 100

 

Benefits of columnar approach:

Imagine that we want to compute the aggregation for column “Sales”. So if we have row-oriented database then we have to get access to every “page” and then extract the value of column from each row. Lot of work, right? But in columnar databases, if we want to compute aggregations over Sales column then we just access the page that has ALL values of sales column. Does it not reduce the IO by not accessing lot of pages?

I know there’s more to the story and this is just a over-simplification of the process – But you get the point, don’t you?

Thus remember that:

Column-oriented systems are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data, because reading that smaller subset of data can be faster than reading all data. - wikipedia

 

The OTHER benefit is in the level of compression that can be achieved. Let’s see this concept in simplest of terms.

consider the following data stored in columnar database:

Q1 Q1 Q1 Q1 Q1……30 more times Q2 Q2 Q2…….40 more times

The above data can be compressed by using the form:

value Begin position End position
Q1 1 34
Q2 35 76

And as you can see, we stored 76 values using just 9 values. So when they say we do 10x compression – this is how they do it!

This was an example of Run-length encoding.

Technically there are algorithms like: Run Length Encoding, Dictionary Encoding, GZIP, LZ compression that are used to compress data. The engine (like xVelocity) decides the best algorithm to use.

so let’s connect the dots.

In-memory technologies need that ENTIRE data is loaded into MEMORY before processing. It’s beneficial IF we can compress the data and so more data can be loaded into the same amount of memory. Also, in the realm of OLAP, aggregating  needs to be as efficient as it can be and as we have seen columnar databases are efficient at computing aggregations over many rows of a given column. So there’s certainly some serious benefits of using columnar databases in OLAP scenario’s and NOW I GET IT that why xVelocity is columnar and not row-oriented. Do not get me wrong, I am not saying that row oriented databases are bad, but in context of OLAP, column stores offer some benefits. And in general, the row oriented databases and column oriented databases have their set of pros and cons and there’s no superior way of doing things. Also, column stores does not need to be in-memory or in-memory does not need be to columnar. In case of xVelocity – it’s in-memory + columnar. But again In-memory does NOT equal to columnar databases.

Conclusion

Thus, in this blog post:

we’ve seen:

1) How do columnar databases store data?

2) Benefits of column stores

What we’ve not seen:

1) we’ve not explored the disadvantages and caveats of columnar databases.

Related Articles: