Hive Performance Tuning

Hive Performance Tuning

The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL syntax. To know how to use Hive please read https://cwiki.apache.org/confluence/display/Hive/Tutorial…

Read more »

Spark ML Classification

Quoting from wiki, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership…

Read more »

Spark and JSON

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. We can define complex nested…

Read more »

Spark AI Summit Europe – The Experience

I attended the Spark AI summit 2018 in London from 3rd – 04th October. Well it’s was in London and I couldn’t have missed the opportunity to know/meet the latest…

Read more »

Monitoring Spark Application with Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit. It is basically a key value pair time series data model. In this blog post, we will use prometheus to monitor a…

Read more »

Apache Spark Unit Testing

Apache Spark is a lightning-fast unified analytics engine for big data and machine learning. Apache Spark is included in almost all of the Hadoop distributions. Apache Spark is the hottest…

Read more »
global hadoop

Hadoop to explore data

Big data by definition denotes datasets that are so large or complex that traditional data processing application frameworks and software are inadequate to deal with them. Hadoop is the answer…

Read more »