Skip to content

ConfusedCoders

the world is opensource
  • data engineering
    • ETL
      • hive
      • pig
    • distributed systems
      • spark
      • hadoop
      • hbase
      • apache drill
    • airflow
    • data storage
      • mongodb
    • search
      • solr
  • data science
    • deep learning
    • machine learning
    • visualization
  • general programming
    • whitepapers
    • open source
    • mobile
    • raspberrypi
    • data structures
    • golang
    • java
      • design patterns
      • hibernate
    • random
      • life
  • about us
    • Nikita Sharma | Data Science Student

ConfusedCoders

the world is opensource
  • data engineering
    • ETL
      • hive
      • pig
    • distributed systems
      • spark
      • hadoop
      • hbase
      • apache drill
    • airflow
    • data storage
      • mongodb
    • search
      • solr
  • data science
    • deep learning
    • machine learning
    • visualization
  • general programming
    • whitepapers
    • open source
    • mobile
    • raspberrypi
    • data structures
    • golang
    • java
      • design patterns
      • hibernate
    • random
      • life
  • about us
    • Nikita Sharma | Data Science Student

How to work on Kaggle data on your local Jupyter Notebook

  • Nikita Sharma Nikita Sharma
  • November 18, 2018
  • data science

This post briefly describe about how to use Kaggle data on your local Jupyter Notebook. Env details: Ubuntu Python 3.6.3 Steps We need these steps… Read More »How to work on Kaggle data on your local Jupyter Notebook

Handpicked Spark configs to make the job runs faster

  • Yash Sharma Yash Sharma
  • November 9, 2018November 9, 2018
  • AWS, data engineering, spark

Here is a collection of spark configs that have helped make the job runs faster. Most of the configs come with trade-offs but work very… Read More »Handpicked Spark configs to make the job runs faster

Certification – Applied Text Mining in Python

  • Nikita Sharma Nikita Sharma
  • November 5, 2018June 11, 2019
  • certification

Certification details: https://www.coursera.org/account/accomplishments/verify/2SQPBTEAW3VR     Course Description This course will introduce the learner to text mining and text manipulation basics. The course begins with an understanding… Read More »Certification – Applied Text Mining in Python

Query Kaggle data via Apache Spark and Zeppelin via EMR cluster

  • Nikita Sharma Nikita Sharma
  • October 29, 2018October 29, 2018
  • AWS, data engineering, spark

This is a 3 post series on querying Kaggle data on EMR cluster. I will be using Apache Zeppein for the data exploration, and internally… Read More »Query Kaggle data via Apache Spark and Zeppelin via EMR cluster

Part 3: Query Kaggle data via Apache Zeppelin

  • Nikita Sharma Nikita Sharma
  • October 29, 2018February 24, 2019
  • AWS, data engineering, spark

This is part-3 of the blog series – How to analyze Kaggle data with Apache Spark and Zeppelin. In the first part we saw how to copy Kaggle data… Read More »Part 3: Query Kaggle data via Apache Zeppelin

Part 2: How to create EMR cluster with Apache Spark and Apache Zeppelin

  • Nikita Sharma Nikita Sharma
  • October 28, 2018October 28, 2018
  • AWS, data engineering, hive, spark

This is part-2 of the blog series – How to analyze Kaggle data with Apache Spark and Zeppelin. In the first part we saw how… Read More »Part 2: How to create EMR cluster with Apache Spark and Apache Zeppelin

Part 1: How to copy Kaggle data to Amazon S3

  • Nikita Sharma Nikita Sharma
  • October 25, 2018October 28, 2018
  • AWS, data engineering

This is part-1 of the blog series — How to analyze Kaggle data with Apache Spark and Zeppelin. This post provides a brief description on how to… Read More »Part 1: How to copy Kaggle data to Amazon S3

Certification – Applied Machine Learning in Python

  • Nikita Sharma Nikita Sharma
  • October 23, 2018June 11, 2019
  • certification

Certification details: https://www.coursera.org/account/accomplishments/verify/B7M4BW23K8R6     Course Description This course will introduce the learner to applied machine learning, focusing more on the techniques and methods than on… Read More »Certification – Applied Machine Learning in Python

Takeaways from Sydney MeasureCamp – Unconference for Analytics and DataScience

  • Nikita Sharma Nikita Sharma
  • October 20, 2018October 21, 2018
  • data science

I attended the Sydney MeasureCamp conference today (20th october 2018). This conference took place in Google Sydney office. It was a great experience and here… Read More »Takeaways from Sydney MeasureCamp – Unconference for Analytics and DataScience

Cleaning data for data visualisation

  • Nikita Sharma Nikita Sharma
  • October 18, 2018
  • data science, visualization

This small post provides information on cleaning data by dealing  with missing data present in a dataframe. Data cleaning is the process of ensuring that… Read More »Cleaning data for data visualisation

Certification – Applied Plotting, Charting & Data Representation in Python

  • Nikita Sharma Nikita Sharma
  • October 6, 2018June 4, 2019
  • certification

Certification details: https://www.coursera.org/account/accomplishments/verify/9CDEFKKBK9EN     Course Description This course will introduce the learner to information visualization basics, with a focus on reporting and charting using the… Read More »Certification – Applied Plotting, Charting & Data Representation in Python

Certification – Introduction to Data Science in Python

  • Nikita Sharma Nikita Sharma
  • September 23, 2018June 4, 2019
  • certification

Certification Details: https://www.coursera.org/account/accomplishments/verify/RCHZWM8QBJEW     Course Description This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques… Read More »Certification – Introduction to Data Science in Python

  • « Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • …
  • 12
  • Next »

Neve | Powered by WordPress