AWS cost monitoring for visibility and $$$ control
If you are a student and are working on few available credits to get you through your Uni projects, Cost is going to be a… Read More »AWS cost monitoring for visibility and $$$ control
If you are a student and are working on few available credits to get you through your Uni projects, Cost is going to be a… Read More »AWS cost monitoring for visibility and $$$ control
Hey readers, I am learning Data Engineering from last few months and I thought of sharing my learning with you all. Recently I made a project… Read More »Real world application project for Big Data – with Apache Spark and AWS-EMR
Hey Readers, I am a Data Science Student and recently I have started learning more about Data Engineering. Data Science and Data Engineering teams co-exist… Read More »Data Engineering Part 1 – How to become a Big Data Engineer
A lot of time, we want some synthetic data to start our journey on data analysis. In this post we will discuss how to generate… Read More »How to generate synthetic log data for data analysis
Here is a collection of spark configs that have helped make the job runs faster. Most of the configs come with trade-offs but work very… Read More »Handpicked Spark configs to make the job runs faster
This is a 3 post series on querying Kaggle data on EMR cluster. I will be using Apache Zeppein for the data exploration, and internally… Read More »Query Kaggle data via Apache Spark and Zeppelin via EMR cluster
This is part-3 of the blog series – How to analyze Kaggle data with Apache Spark and Zeppelin. In the first part we saw how to copy Kaggle data… Read More »Part 3: Query Kaggle data via Apache Zeppelin
This is part-2 of the blog series – How to analyze Kaggle data with Apache Spark and Zeppelin. In the first part we saw how… Read More »Part 2: How to create EMR cluster with Apache Spark and Apache Zeppelin
This is part-1 of the blog series — How to analyze Kaggle data with Apache Spark and Zeppelin. This post provides a brief description on how to… Read More »Part 1: How to copy Kaggle data to Amazon S3