top of page

Python Pandas cheat sheet

Writer's picture: Soo ReedSoo Reed

Updated: Oct 2, 2022

Python Pandas Library Cheat Sheet for a Beginner

This is a python pandas cheat sheet for a quick lookup and review. For more information and explanation, please visit pandas doc: https://pandas.pydata.org/docs/reference/frame.html

Loading data:

  1. Import: Import pandas as pd

  2. Read CSV: pd.read_csv(‘file_name.csv’)

  3. Union all multiple dataframes: pd.concat([df1, df2, df3, …])

  4. Merge:

  5. pd.merge(table1, table2)

  6. table1.merge(table2)

  7. Using left_on or right_on: pd.merge(table1, table2, left_on: table1_column, right_on: table2_column)

  8. Useful option: suffixes = [table1_suffix, table2_suffix]

  9. Join types: pd.merge(table1, table2, how=’join_type’)

  10. Join types = [‘left’, ‘right’, ‘outer’]  

  11. Default join type = ‘inner’

Inspecting data:

  1. Inspection: 

  2. dataframe.columns

  3. dataframe.head()

  4. dataframe.describe()  

  5. dataframe.info()

  6. dataframe.dtypes

  7. dataframe.isnull().sum() << shows you the # of rows per column with null value

  8. Duplicates:

  9. Finding duplicates: dataframe.duplicated() 

  10. Removing duplicates: dataframe.drop_duplicates()

  11. Getting the column information using aggregate: 

  12. Available: sum(), mean(), median(), max(), min(), std(), nunique(), unique()

  13. dataframe.column_name.aggregate_function(): ex) dataframe.column_name.sum()

  14. Group by:

  15. One groupby: dataframe.groupby(‘column1’).column2.count()

  16. Multiple groupby: dataframe.groupby([‘column1’, ‘column2’]).column3.count()

  17. Selecting columns: 

  18. One column: 

  19. Dataframe[‘column1’]

  20. If the column name is one word: dataframe.column1  

  21. Multiple columns: dataframe[[‘column1’,’column2’,’column3’]]

Looking up:

  1. Filtering dataframe using loc or iloc:

  2. loc: using label. EX) dataframe.loc[]

  3. iloc: using index. EX) dataframe.iloc[]

  4. Getting the result for specific values: 

  5. One: dataframe[dataframe.column_name == ‘specific value]

  6. You can use other logical operations like: <, >, != 

  7. Multiple: use format of: 

  8. And: () & ()

  9. Or: () | ()

Cleaning and Tidying data:

  1. Column name change: dataframe.rename(columns = {‘old_column_name’:’new_column_name’, ‘old_column_name2’:’new_column_name2’})

  2. Create a new column: 

  3. New_column_value can be one value: ex)  dataframe[‘new_column_name’] = ‘Yes’

  4. New_column_value can be a simple calc: ex)  dataframe[‘new_column_name’] = dataframe[column1] * dataframe[column2]

  5. New_column_value can be more complex calc using lambda: ex) dataframe[‘new_column_name’] = dataframe.apply( lambda x: x[‘column1’]**2 if x[‘column2’] > (x[‘column1’]*2) else x[‘column2’]*3)

  6. Dealing with missing data: 

  7. Drop them: dataframe.dropna()

  8. Fill them: 

  9. One column: dataframe.column.fillna(value)

  10. Multiple columns: dataframe.fillna(value = {column1: value1, column2:value2})

  11. Pivot: 

  12. dataframe.pivot(index = , columns = , values = ).reset_index()

  13. pd.melt(dataframe, id_vars = , var_name = , value_name = )

Export:

  1. Export: 

  2. To csv: dataframe.to_csv()

  3. To json: dataframe.to_json()


2 views0 comments

Recent Posts

See All

Comments


©2022 by datasnooze. Proudly created with Wix.com

bottom of page