Python Pandas - Joining and merging DataFrame
Exercise - Pandas Joining and merging DataFrame
import pandas as pd
import numpy as np
- Write a Pandas program to join the two given dataframes along rows and assign all data.
df1_1 = pd.DataFrame({
'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],
'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],
'marks': [200, 210, 190, 222, 199]})
df1_2 = pd.DataFrame({
'student_id': ['S4', 'S5', 'S6', 'S7', 'S8'],
'name': ['Scarlette Fisher', 'Carla Williamson', 'Dante Morse', 'Kaiser William', 'Madeeha Preston'],
'marks': [201, 200, 198, 219, 201]})
pd.concat([df1_1,df1_2],axis=0)
#df1_1.append(df1_2,ignore_index=False)
student_id | name | marks | |
---|---|---|---|
0 | S1 | Danniella Fenton | 200 |
1 | S2 | Ryder Storey | 210 |
2 | S3 | Bryce Jensen | 190 |
3 | S4 | Ed Bernal | 222 |
4 | S5 | Kwame Morin | 199 |
0 | S4 | Scarlette Fisher | 201 |
1 | S5 | Carla Williamson | 200 |
2 | S6 | Dante Morse | 198 |
3 | S7 | Kaiser William | 219 |
4 | S8 | Madeeha Preston | 201 |
- Write a Pandas program to join the two given dataframes along columns and assign all data.
pd.concat([df1_1,df1_2],axis=1)
student_id | name | marks | student_id | name | marks | |
---|---|---|---|---|---|---|
0 | S1 | Danniella Fenton | 200 | S4 | Scarlette Fisher | 201 |
1 | S2 | Ryder Storey | 210 | S5 | Carla Williamson | 200 |
2 | S3 | Bryce Jensen | 190 | S6 | Dante Morse | 198 |
3 | S4 | Ed Bernal | 222 | S7 | Kaiser William | 219 |
4 | S5 | Kwame Morin | 199 | S8 | Madeeha Preston | 201 |
- Write a Pandas program to append rows to an existing DataFrame and display the combined data.
s3 = pd.Series(['S6', 'Scarlette Fisher', 205], index=['student_id', 'name', 'marks'])
s3
student_id S6
name Scarlette Fisher
marks 205
dtype: object
df1_1.append(s3,ignore_index=True)
student_id | name | marks | |
---|---|---|---|
0 | S1 | Danniella Fenton | 200 |
1 | S2 | Ryder Storey | 210 |
2 | S3 | Bryce Jensen | 190 |
3 | S4 | Ed Bernal | 222 |
4 | S5 | Kwame Morin | 199 |
5 | S6 | Scarlette Fisher | 205 |
- Write a Pandas program to append a list of dictioneries or series to a existing DataFrame and display the combined data.
d4 = [{'student_id': 'S6', 'name': 'Scarlette Fisher', 'marks': 203},
{'student_id': 'S7', 'name': 'Bryce Jensen', 'marks': 207}]
d4
[{'student_id': 'S6', 'name': 'Scarlette Fisher', 'marks': 203},
{'student_id': 'S7', 'name': 'Bryce Jensen', 'marks': 207}]
df1_1.append(d4,ignore_index=True)
student_id | name | marks | |
---|---|---|---|
0 | S1 | Danniella Fenton | 200 |
1 | S2 | Ryder Storey | 210 |
2 | S3 | Bryce Jensen | 190 |
3 | S4 | Ed Bernal | 222 |
4 | S5 | Kwame Morin | 199 |
5 | S6 | Scarlette Fisher | 203 |
6 | S7 | Bryce Jensen | 207 |
- Write a Pandas program to join the two given dataframes along rows and merge with another dataframe along the common column id.
df5_1 = pd.DataFrame({
'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],
'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],
'marks': [200, 210, 190, 222, 199]})
df5_2 = pd.DataFrame({
'student_id': ['S4', 'S5', 'S6', 'S7', 'S8'],
'name': ['Scarlette Fisher', 'Carla Williamson', 'Dante Morse', 'Kaiser William', 'Madeeha Preston'],
'marks': [201, 200, 198, 219, 201]})
df5_3 = pd.DataFrame({
'student_id': ['S1', 'S2', 'S3', 'S4', 'S5', 'S7', 'S8', 'S9', 'S10', 'S11', 'S12', 'S13'],
'exam_id': [23, 45, 12, 67, 21, 55, 33, 14, 56, 83, 88, 12]})
pd.merge(pd.concat([df5_1,df5_2],ignore_index=True),df5_3,how='inner',on=['student_id'])
student_id | name | marks | exam_id | |
---|---|---|---|---|
0 | S1 | Danniella Fenton | 200 | 23 |
1 | S2 | Ryder Storey | 210 | 45 |
2 | S3 | Bryce Jensen | 190 | 12 |
3 | S4 | Ed Bernal | 222 | 67 |
4 | S4 | Scarlette Fisher | 201 | 67 |
5 | S5 | Kwame Morin | 199 | 21 |
6 | S5 | Carla Williamson | 200 | 21 |
7 | S7 | Kaiser William | 219 | 55 |
8 | S8 | Madeeha Preston | 201 | 33 |
- Write a Pandas program to join the two dataframes using the common column of both dataframes.
df5_1.merge(df5_2,how='outer',on='student_id')
student_id | name_x | marks_x | name_y | marks_y | |
---|---|---|---|---|---|
0 | S1 | Danniella Fenton | 200.0 | NaN | NaN |
1 | S2 | Ryder Storey | 210.0 | NaN | NaN |
2 | S3 | Bryce Jensen | 190.0 | NaN | NaN |
3 | S4 | Ed Bernal | 222.0 | Scarlette Fisher | 201.0 |
4 | S5 | Kwame Morin | 199.0 | Carla Williamson | 200.0 |
5 | S6 | NaN | NaN | Dante Morse | 198.0 |
6 | S7 | NaN | NaN | Kaiser William | 219.0 |
7 | S8 | NaN | NaN | Madeeha Preston | 201.0 |
- Write a Pandas program to join the two dataframes with matching records from both sides where available.
df5_1.merge(df5_2,how='outer',on='student_id')
student_id | name_x | marks_x | name_y | marks_y | |
---|---|---|---|---|---|
0 | S1 | Danniella Fenton | 200.0 | NaN | NaN |
1 | S2 | Ryder Storey | 210.0 | NaN | NaN |
2 | S3 | Bryce Jensen | 190.0 | NaN | NaN |
3 | S4 | Ed Bernal | 222.0 | Scarlette Fisher | 201.0 |
4 | S5 | Kwame Morin | 199.0 | Carla Williamson | 200.0 |
5 | S6 | NaN | NaN | Dante Morse | 198.0 |
6 | S7 | NaN | NaN | Kaiser William | 219.0 |
7 | S8 | NaN | NaN | Madeeha Preston | 201.0 |
- Write a Pandas program to join (left join) the two dataframes using keys from left dataframe only.
d8_1 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'P': ['P0', 'P1', 'P2', 'P3'],
'Q': ['Q0', 'Q1', 'Q2', 'Q3']})
d8_2 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'R': ['R0', 'R1', 'R2', 'R3'],
'S': ['S0', 'S1', 'S2', 'S3']})
pd.merge(d8_1,d8_2,how='left',on=['key1','key2'])
key1 | key2 | P | Q | R | S | |
---|---|---|---|---|---|---|
0 | K0 | K0 | P0 | Q0 | R0 | S0 |
1 | K0 | K1 | P1 | Q1 | NaN | NaN |
2 | K1 | K0 | P2 | Q2 | R1 | S1 |
3 | K1 | K0 | P2 | Q2 | R2 | S2 |
4 | K2 | K1 | P3 | Q3 | NaN | NaN |
- Write a Pandas program to join two dataframes using keys from right dataframe only.
pd.merge(d8_1,d8_2,how='right',on=['key1','key2'])
key1 | key2 | P | Q | R | S | |
---|---|---|---|---|---|---|
0 | K0 | K0 | P0 | Q0 | R0 | S0 |
1 | K1 | K0 | P2 | Q2 | R1 | S1 |
2 | K1 | K0 | P2 | Q2 | R2 | S2 |
3 | K2 | K0 | NaN | NaN | R3 | S3 |
- Write a Pandas program to merge two given datasets using multiple join keys.
pd.merge(d8_1,d8_2,on=['key1','key2'])
key1 | key2 | P | Q | R | S | |
---|---|---|---|---|---|---|
0 | K0 | K0 | P0 | Q0 | R0 | S0 |
1 | K1 | K0 | P2 | Q2 | R1 | S1 |
2 | K1 | K0 | P2 | Q2 | R2 | S2 |
- Write a Pandas program to create a new DataFrame based on existing series, using specified argument and override the existing columns names.
s11_1 = pd.Series([0, 1, 2, 3], name='col1')
s11_2 = pd.Series([0, 1, 2, 3])
s11_3 = pd.Series([0, 1, 4, 5], name='col3')
pd.concat([s11_1,s11_2,s11_3],axis=1,keys=['Column1','Columns2','Column3'])
Column1 | Columns2 | Column3 | |
---|---|---|---|
0 | 0 | 0 | 0 |
1 | 1 | 1 | 1 |
2 | 2 | 2 | 4 |
3 | 3 | 3 | 5 |
- Write a Pandas program to create a combination from two dataframes where a column id combination appears more than once in both dataframes.
df12_1 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'P': ['P0', 'P1', 'P2', 'P3'],
'Q': ['Q0', 'Q1', 'Q2', 'Q3']})
df12_2 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'R': ['R0', 'R1', 'R2', 'R3'],
'S': ['S0', 'S1', 'S2', 'S3']})
pd.merge(df12_1,df12_2,on='key1')
key1 | key2_x | P | Q | key2_y | R | S | |
---|---|---|---|---|---|---|---|
0 | K0 | K0 | P0 | Q0 | K0 | R0 | S0 |
1 | K0 | K1 | P1 | Q1 | K0 | R0 | S0 |
2 | K1 | K0 | P2 | Q2 | K0 | R1 | S1 |
3 | K1 | K0 | P2 | Q2 | K0 | R2 | S2 |
4 | K2 | K1 | P3 | Q3 | K0 | R3 | S3 |
- Write a Pandas program to combine the columns of two potentially differently-indexed DataFrames into a single result DataFrame.
df13_1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=['K0', 'K1', 'K2'])
df13_2 = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
'D': ['D0', 'D2', 'D3']},
index=['K0', 'K2', 'K3'])
df13_1.join(df13_2,how='outer')
A | B | C | D | |
---|---|---|---|---|
K0 | A0 | B0 | C0 | D0 |
K1 | A1 | B1 | NaN | NaN |
K2 | A2 | B2 | C2 | D2 |
K3 | NaN | NaN | C3 | D3 |
- Write a Pandas program to merge two given dataframes with different columns.
df14_1 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'P': ['P0', 'P1', 'P2', 'P3'],
'Q': ['Q0', 'Q1', 'Q2', 'Q3']})
df14_2 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'R': ['R0', 'R1', 'R2', 'R3'],
'S': ['S0', 'S1', 'S2', 'S3']})
pd.concat([df14_1,df14_2],axis=0,sort=False,ignore_index=True)
key1 | key2 | P | Q | R | S | |
---|---|---|---|---|---|---|
0 | K0 | K0 | P0 | Q0 | NaN | NaN |
1 | K0 | K1 | P1 | Q1 | NaN | NaN |
2 | K1 | K0 | P2 | Q2 | NaN | NaN |
3 | K2 | K1 | P3 | Q3 | NaN | NaN |
4 | K0 | K0 | NaN | NaN | R0 | S0 |
5 | K1 | K0 | NaN | NaN | R1 | S1 |
6 | K1 | K0 | NaN | NaN | R2 | S2 |
7 | K2 | K0 | NaN | NaN | R3 | S3 |
- Write a Pandas program to Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame.
df15_1 = pd.DataFrame({'A': [None, 0, None], 'B': [3, 4, 5]})
df15_2 = pd.DataFrame({'A': [1, 1, 3], 'B': [3, None, 3]})
df15_1.combine_first(df15_2)
A | B | |
---|---|---|
0 | 1.0 | 3 |
1 | 0.0 | 4 |
2 | 3.0 | 5 |