Hive Performance Tuning
The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL syntax. To know how to use Hive please read https://cwiki.apache.org/confluence/display/Hive/Tutorial…
Read more »Apache Spark Unit Testing
Apache Spark is a lightning-fast unified analytics engine for big data and machine learning. Apache Spark is included in almost all of the Hadoop distributions. Apache Spark is the hottest…
Read more »Hadoop to explore data
Big data by definition denotes datasets that are so large or complex that traditional data processing application frameworks and software are inadequate to deal with them. Hadoop is the answer…
Read more »