Later we will use a label encoder for converting these categorical variables into a numerical form that I will explain in my next write-up on model building and evaluation. We fill the missing values by 0 if the type of the input is an integer, and we fill ‘None’ if it’s categorical. train = pd.read_csv('./data/train.csv') test = pd.read_csv('./data/test.csv') train.head() import numpy as np import pandas as pd %matplotlib inline import matplotlib.pyplot as plt import seaborn as sns color = sns.color_palette() sns.set_style('darkgrid') import warnings def ignore_warn(*args, **kwargs): pass warnings.warn = ignore_warn #ignore annoying warning (from sklearn and seaborn) #statistics from scipy import stats from scipy.stats import norm, skew Table of Contentsįirst, let us import the necessary libraries used in this notebook. Part III - Model Building, Evaluation, and EnsemblingĪ detailed notebook on comprehensive EDA can be found at reyscode/start-datascience.Part II - EDA on Iowa Housing Prices Prediction.Part I - EDA on Titanic Survival Problem.It covers EDA on Iowa Housing Prices data from the Kaggle competition - House Prices: Advanced Regression Techniques. This is Part II of my series of Getting Started with Data Science. In this write-up, we tackle the problem of predicting the sale price of houses located in Ames, Iowa, using 79 explanatory variables that explain almost every aspect of the house. Part II - Analysis, Cleaning & Visualization
0 Comments
Leave a Reply. |