Getting Started with Big Data Fundamentals

All of us must have heard these two words nowadays, i.e. Big Data. What is Big Data? Is it a database? Is it storage? There are many people who may not understand the true meaning of Big Data OR who might be assuming it in wrong context. This blog post (and upcoming posts on same topic) is an effort to explain Big Data in simple words. So, let’s begin.

Data Set

What is Big Data?

Does Big Data means data which is big in size? Well, that’s how the name sounds. The term “Big Data” doesn’t actually refer to the size of the data sets, but to the solutions used to extract volumes from the data sets – solutions involving new architectures and technologies.

It does not matter whether the size of data is big or small, these new methods are applicable on every data set. Even if you have small data set, you can use these solutions to manage the data in a better way and extract useful information out of that.

Now the next question is – when should we start using these solutions to get the best out of them?

Well, before I explain the actual factors to be considered, I must say that you should NOT consider “size” as the primary factor for opting big data solutions. The three considerations of Big Data, which is also called concept of three Vs, are:

  1. Volume
  2. Variety, and
  3. Velocity

Volume describes the size, that is, the amount of data generated.

Variety refers to the actual contents of the data set. There could be multiple sources for the data sets and all these sources might be using different formats. That brings variety in the data sets.

Velocity is the frequency at which data is generated, captured and made available for users or other systems for consumption. This is the key factor nowadays which play an important role for opting Big Data solutions. It also lead to evolution of new frameworks, like Apache Spark, Amazon Kinesis, etc.

Continue reading