Skip to content
ConfusedCoders
  • data engineering
    • ETL
      • hive
      • pig
    • distributed systems
      • spark
      • hadoop
      • hbase
      • apache drill
    • airflow
    • data storage
      • mongodb
    • search
      • solr
  • data science
    • deep learning
    • machine learning
    • visualization
  • general programming
    • whitepapers
    • open source
    • mobile
    • raspberrypi
    • data structures
    • golang
    • java
      • design patterns
      • hibernate
    • random
      • life
  • about us
    • Nikita Sharma | Data Science Student

hive

Data Engineering Part 1 – How to become a Big Data Engineer

January 15, 2019February 11, 2019 Nikita SharmaAWS, data engineering, ETL, hive, spark

Hey Readers, I am a Data Science Student and recently I have started learning more about Data Engineering. Data Science […]

Read more

Query S3 data via Hive on local box

December 28, 2018December 29, 2018 Nikita Sharmahive

In the last post we discussed about how to generate synthetic data. Here we will talk about how to query […]

Read more

Part 2: How to create EMR cluster with Apache Spark and Apache Zeppelin

October 28, 2018October 28, 2018 Nikita SharmaAWS, data engineering, hive, spark

This is part-2 of the blog series – How to analyze Kaggle data with Apache Spark and Zeppelin. In the […]

Read more

Debugging : Hive Dynamic partition Error : [Fatal Error] total number of created files now is 100028, which exceeds 100000. Killing the job.

September 6, 2016April 24, 2017 Yash Sharmahive

[Fatal Error] total number of created files now is 900320, which exceeds 900000. Killing the job. tldr; quick fix – […]

Read more

Hive – Selected data import/query – Files and folders (mapred.input.dir.recursive)

December 25, 2013May 27, 2014 Yash Sharmahive

Data import in Hive by default expects a directory name in its query specified by LOCATION keyword. By default Hive […]

Read more

Recent Posts

  • Data Engineering Part 2 – Productionizing Big data ETL with Apache Airflow
  • I am starting my Masters in Data Science at UTS
  • CNN with TensorFlow for Deep Learning Beginners
  • Data Engineering Part 1 – How to become a Big Data Engineer
  • Query S3 data via Hive on local box

Recent Comments

  • Madars Vitolins on Create a basic distributed system in Go lang – Part 1
  • Anushka Mudholkar on How to view content of parquet files on S3/HDFS from Hadoop cluster using parquet-tools
  • Radha on How to install Appium in Ubuntu
  • STP on Setup PyCharm for Deep learning with TensorFlow, Keras and Jupyter (with virtualenv)
  • Shyam on How to view content of parquet files on S3/HDFS from Hadoop cluster using parquet-tools

We love to hear back



Tweet

  • about us
  • Nikita Sharma | Data Science Student
Powered by WordPress | Theme: Astrid by aThemes.