SQL Server 2014!

SQL Server 2014 was Announced in Teched’s Keynote!

SQL Server 2014 Teched KeynoteSo while the focus of the SQL server 2012 was around in-memory OLAP, the focus of this new release seems to be In-memory OLTP (Along with Cloud & Big Data)

Here’s the Blog Post: http://blogs.technet.com/b/dataplatforminsider/archive/2013/06/03/sql-server-2014-unlocking-real-time-insights.aspx  (Also Thanks for the Picture!)



Resource: Introduction to Data Science by Prof Bill Howe, UW

Introduction to Data Science course taught by Bill Howe just started on coursera platform. Having studied the Data Intensive Computing in Cloud course at UW taught by Prof Bill Howe, I can say that this course would be great resource too!

Check it out: https://www.coursera.org/course/datasci

Introduction to Data Science

Resource: A great tutorial for Hadoop on local windows and Azure.

Here’s the resource: http://gettingstarted.hadooponazure.com/gettingStarted.html > “HDInsight Jumpstart”

The Tutorial will teach you how to analyze log files using Hadoop Tools like MapReduce, Hive, SQooP – check it out! It works with both HDInsight for local windows as well as Hadoop on Azure:

HDInsight hadoop on windows starting guide tutorial


I hope this resource helps you get started on building an end-to-end solution with Hadoop on Windows/Azure.

Event Recap: SQL Saturday 185 Trinidad!

I was selected to a be a speaker at SQL Saturday Trinidad! And it was amazing because not only did I get a chance to interact with the wonderful people who are part of SQL Server community there but also visited some beautiful places on this Caribbean island!

I visited Trinidad in January, just before their carnival season! And even though, people were busy preparing for carnival season, it was great to see them attend an entire day of SQL Server Training:

SQL Saturday 185 trinidad attendees

And here’s me presenting on “Why Big Data Matters”:

(Thanks Niko for the photo!)

paras presenting on big data

And after the event, I also got a chance to experience the beauty of this Caribbean island!

view trinidad island port of spain

port of spain sql saturday

Thank you SQL Saturday 185 Team for a memorable time!

Presentation Slides: The slides had been posted for the attendees prior to my presentation and if you want you can view them here:


Examples of Machine Generated Data from “Big Data” perspective:

I just researched about Machine Generated Data from the context of “Big data”, Here’s the list I compiled:

- Data sent from Satellites

- Temperature sensing devices

- Flood Detection/Sensing devices

- web logs

- location data

- Data collected by Toll sensors (context: Road Toll)

- Phone call records

- Financial

And a Futuristic one:

Imagine sensors on human bodies that continuously “monitor” health. How about if we use them to detect diabetes/cancer/other-diseases in their early phases. Possible? May be!

Interesting Fact:

Machine can generate data “faster” than humans. This characteristics makes it interesting to think about to analyze machine generate data and in some cases, how to analyze them in real-time or near real-time

Ending Note:

Search for Machine Generated Data, you’ll be able to find much more, it’s worth reading about from the context of Big Data.





Data visualization: Cost of Hard Drive storage space

Here are the visualization:

1982 – 2009:

1982 2009 storage cost

2000 – 2008

2000 2008 storage cost

I grabbed data from: http://www.mkomo.com/cost-per-gigabyte And http://ns1758.ca/winch/winchest.html – Thanks!


Storage cost has drastically decreased. Mathematically, Storage cost has decreased exponentially. No wonder we can store lot’s of data for few dollars and no wonder that the age of Big Data has already arrived!

How to start Analyzing Twitter Data w/ R?

Over the past few weeks, I have posted notes about Analyzing Twitter Data w/ R, listing them here:

1. Install R & RStudio

2. R code to download twitter data

3. Perform Sentiment Analysis on Twitter Data (in R)

Hadoop on Windows: How to Browse the Hadoop Filesystem?

This Blog post applies to Microsoft® HDInsight Preview for a windows machine. In this Blog Post, we’ll see how you can browse the HDFS (Hadoop Filesystem)?

1. I am assuming Hadoop Services are working without issues on your machine.

2. Now, Can you see the Hadoop Name Node Status Icon on your desktop? Yes? Great! Open it (via Browser)

3. Here’s what you’ll see:

Hadoop File System Browse

4. Can you see the “Browse the filesystem” link? click on it. You’ll see:

hadoop file system name node status windows

5. I’ve used the /user/data lately, so Let me browse to see what’s inside this directory:

user data hadoop sqoop hive mapreduce

6. You can also type in the location in the check box that says Goto

7. If you’re on command line, you can do so via the command:

hadoop fs -ls /

hadoop command line list all files system

And if you want to browse files inside a particular directory:

hadoop command line sqoop mapreduce hdfs file system

Official Resource:

HDFS File System Shell Guide


In this post, we saw how to browse Hadoop File system via Hadoop Command Line & Hadoop Name Node Status

Related Articles:

Microsoft® HDInsight Preview for Windows: How to use Sqoop to load data into HDFS from SQL Server?

In this post, we’ll see how to use Sqoop to load data into HDFS from SQL Server?

With that, here are the steps:

1. You have the Microsoft® HDInsight Preview for Windows Installed on your machine. Here’s a tutorial: Installing HDInsight (Microsoft’s Hadoop) on windows 7

2. Make sure that the Cluster is up & running! To check this, I click on the “Microsoft HDInsight Dashboard” or open http://localhost:8085/ on my machine

Did you get any “wait for cluster to start..” message? No? Great! Hopefully, all your services are working perfectly and you are good to go now!

3. Before we begin, decide on three things:

3a: Username and Password that Sqoop would use to login to the SQL Server database. If you create a new username and pasword, test it via SSMS before you proceed.

3b. select the table that you want to load into HDFS

In my case, it’s this table:

sql table to be loaded into hadoop hdfs from sql server3c: The target directory in HDFS. in my case I want it to be /user/data/sqoopstudent1

You can create by command: hadoop fs -mkdir /user/data/sqoopstudent1

[to learn about how to create directory, read: How to create a directory in Hadoop File System? ]

4. Now Let’s start the Hadoop Command Line (can you see the Icon on the Desktop? Yes? Great! Open that!)

5. Navigate to: c:\Hadoop\sqoop-1.4.2\bin>

*This path may change in future, but navigate to the bin folder under the SQOOP_HOME.

6. Run dir command to see various files under this directory.

sqoop list files under the HOMe directory import export

Also you can run sqoop help for more information on the command that we are about to run.

sqoop list of commands help

7. Now here’s the command to Load data from SQL Server to HDFS:

c:\Hadoop\sqoop-1.4.2\bin>sqoop import –connect “jdbc:sqlserver://localhost;dat
abase=UniversityDB;username=sqoop;password=**********” –table student –tar
get-dir /user/data/sqoopstudent1 -m 1

sqoop command to load data from sql server to hadoop file system

8. After successfully running the above command, let’s browse the file in HDFS!

sqoop see the content of the file

That’s about it for this post!


Thanks Aviad Ezra who answered my question on this MSDN thread: An error while trying to use Sqoop on HDInsight to import data from SQL server to HDFS


In this post, we saw how to load data into Hadoop from SQL Server using Sqoop (SQL Hadoop)

Related Articles: