I have no counter-argument about why don’t we use something else beside import pandas as pd
Import Data
pd.read_csv(csvfile)
and above that I mostly use file=r'c:\datasets\csvfile.csv'
because I put ipynb in Git.
pd.read_excel(excelfile)
to work with excel file.
pd.read_sql(query, connection_object)
not frequently use today since Power BI or Tableau is more faster, but in case it’s repeat process.
pd.read_json(jsonfile)
to work with json, this one is fast and very useful when you want to manipulate json files again and again.
Export Data
normally after we import the data, we put them into dataframe called df
like df = pd.read_csv('csv.csv')
so, when we export, we begin with df
df.to_csv(csvfile)
export to csv to continue working with other program
df.to_excel(excelfile)
mostly use when we do a quick ETL
df.to_json(jsonfile)
after edited json
Inspect data
df.head()
lookup on first 5 rows
df.tail()
lookup on last 5 rows
df.shape
check number of rows and columns, it will return like (20640, 10) meaning this dataframe has 20640 rows and 10 columns
df.info()
we mostly use .info rather than .shape because it contains important data to continue working on
Clean data
df.dropna()
quick and easy way to get rid of row that has null value