Downloads: 117

Research Paper | Computer Science & Engineering | India | Volume 7 Issue 9, September 2018

BigData: A Case Study of Spark Mllib and Hive

Shubhajoy Das

Abstract: The extent to which data is generated has shown a tremendous increase in the past decade because of social networks, sensornetworks, geographicinformationsystems, Financial Institutions, Supply chains. The storage capacity of computers have increased to stay competitive, but a big problem is that the access speeds of the disk has not improved to that extent to be at par with disk space improvement. Big Data comes to the rescue with a framework to analyse massive amounts of data in a distributed environment which is both horizontally and vertically scalable. Data sets with trillions of rows can be analysed very fast to provide valuable insights from data. Cloud service providers such as amazon, Alibaba Cloud have made available robust infrastructure for Big Data. We study Apache Hive, Spark Mllib in profiling a Stack Overflow Dataset and Collaborative Filtering algorithm in Spark Mllib for movie recommendations.

Keywords: BigData, SparkMllib, Collaborative Filtering, Hadoop, Spark, Apache, Hive, Amazon aws, HDFS

Edition: Volume 7 Issue 9, September 2018,

Pages: 865 - 868

Variable Size Bin Packing Algorithm for IoT

Kshitija Kalaskar

Share this Article

Downloads: 105

Informative Article, Computer Science & Engineering, India, Volume 5 Issue 11, November 2016

Pages: 1482 - 1485

Overview of Big Data

Nivedita Manohar

Share this Article

BigData: A Case Study of Spark Mllib and Hive

Similar Articles with Keyword 'BigData'

Variable Size Bin Packing Algorithm for IoT

Overview of Big Data