Python Pandas - Joining and merging DataFrame

6 minute read

Exercise - Pandas Joining and merging DataFrame

import pandas as pd
import numpy as np
  1. Write a Pandas program to join the two given dataframes along rows and assign all data.
df1_1 = pd.DataFrame({
        'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],
         'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],
        'marks': [200, 210, 190, 222, 199]})

df1_2 = pd.DataFrame({
        'student_id': ['S4', 'S5', 'S6', 'S7', 'S8'],
        'name': ['Scarlette Fisher', 'Carla Williamson', 'Dante Morse', 'Kaiser William', 'Madeeha Preston'],
        'marks': [201, 200, 198, 219, 201]})
pd.concat([df1_1,df1_2],axis=0)
#df1_1.append(df1_2,ignore_index=False)
student_id name marks
0 S1 Danniella Fenton 200
1 S2 Ryder Storey 210
2 S3 Bryce Jensen 190
3 S4 Ed Bernal 222
4 S5 Kwame Morin 199
0 S4 Scarlette Fisher 201
1 S5 Carla Williamson 200
2 S6 Dante Morse 198
3 S7 Kaiser William 219
4 S8 Madeeha Preston 201
  1. Write a Pandas program to join the two given dataframes along columns and assign all data.
pd.concat([df1_1,df1_2],axis=1)
student_id name marks student_id name marks
0 S1 Danniella Fenton 200 S4 Scarlette Fisher 201
1 S2 Ryder Storey 210 S5 Carla Williamson 200
2 S3 Bryce Jensen 190 S6 Dante Morse 198
3 S4 Ed Bernal 222 S7 Kaiser William 219
4 S5 Kwame Morin 199 S8 Madeeha Preston 201
  1. Write a Pandas program to append rows to an existing DataFrame and display the combined data.
s3 = pd.Series(['S6', 'Scarlette Fisher', 205], index=['student_id', 'name', 'marks'])
s3
student_id                  S6
name          Scarlette Fisher
marks                      205
dtype: object
df1_1.append(s3,ignore_index=True)
student_id name marks
0 S1 Danniella Fenton 200
1 S2 Ryder Storey 210
2 S3 Bryce Jensen 190
3 S4 Ed Bernal 222
4 S5 Kwame Morin 199
5 S6 Scarlette Fisher 205
  1. Write a Pandas program to append a list of dictioneries or series to a existing DataFrame and display the combined data.
d4 = [{'student_id': 'S6', 'name': 'Scarlette Fisher', 'marks': 203},
         {'student_id': 'S7', 'name': 'Bryce Jensen', 'marks': 207}]
d4
[{'student_id': 'S6', 'name': 'Scarlette Fisher', 'marks': 203},
 {'student_id': 'S7', 'name': 'Bryce Jensen', 'marks': 207}]
df1_1.append(d4,ignore_index=True)
student_id name marks
0 S1 Danniella Fenton 200
1 S2 Ryder Storey 210
2 S3 Bryce Jensen 190
3 S4 Ed Bernal 222
4 S5 Kwame Morin 199
5 S6 Scarlette Fisher 203
6 S7 Bryce Jensen 207
  1. Write a Pandas program to join the two given dataframes along rows and merge with another dataframe along the common column id.
df5_1 = pd.DataFrame({
        'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],
         'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed Bernal', 'Kwame Morin'],
        'marks': [200, 210, 190, 222, 199]})
df5_2 = pd.DataFrame({
        'student_id': ['S4', 'S5', 'S6', 'S7', 'S8'],
        'name': ['Scarlette Fisher', 'Carla Williamson', 'Dante Morse', 'Kaiser William', 'Madeeha Preston'],
        'marks': [201, 200, 198, 219, 201]})

df5_3 = pd.DataFrame({
        'student_id': ['S1', 'S2', 'S3', 'S4', 'S5', 'S7', 'S8', 'S9', 'S10', 'S11', 'S12', 'S13'],
        'exam_id': [23, 45, 12, 67, 21, 55, 33, 14, 56, 83, 88, 12]})
pd.merge(pd.concat([df5_1,df5_2],ignore_index=True),df5_3,how='inner',on=['student_id'])
student_id name marks exam_id
0 S1 Danniella Fenton 200 23
1 S2 Ryder Storey 210 45
2 S3 Bryce Jensen 190 12
3 S4 Ed Bernal 222 67
4 S4 Scarlette Fisher 201 67
5 S5 Kwame Morin 199 21
6 S5 Carla Williamson 200 21
7 S7 Kaiser William 219 55
8 S8 Madeeha Preston 201 33
  1. Write a Pandas program to join the two dataframes using the common column of both dataframes.
df5_1.merge(df5_2,how='outer',on='student_id')
student_id name_x marks_x name_y marks_y
0 S1 Danniella Fenton 200.0 NaN NaN
1 S2 Ryder Storey 210.0 NaN NaN
2 S3 Bryce Jensen 190.0 NaN NaN
3 S4 Ed Bernal 222.0 Scarlette Fisher 201.0
4 S5 Kwame Morin 199.0 Carla Williamson 200.0
5 S6 NaN NaN Dante Morse 198.0
6 S7 NaN NaN Kaiser William 219.0
7 S8 NaN NaN Madeeha Preston 201.0
  1. Write a Pandas program to join the two dataframes with matching records from both sides where available.
df5_1.merge(df5_2,how='outer',on='student_id')
student_id name_x marks_x name_y marks_y
0 S1 Danniella Fenton 200.0 NaN NaN
1 S2 Ryder Storey 210.0 NaN NaN
2 S3 Bryce Jensen 190.0 NaN NaN
3 S4 Ed Bernal 222.0 Scarlette Fisher 201.0
4 S5 Kwame Morin 199.0 Carla Williamson 200.0
5 S6 NaN NaN Dante Morse 198.0
6 S7 NaN NaN Kaiser William 219.0
7 S8 NaN NaN Madeeha Preston 201.0
  1. Write a Pandas program to join (left join) the two dataframes using keys from left dataframe only.
d8_1 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                     'P': ['P0', 'P1', 'P2', 'P3'],
                     'Q': ['Q0', 'Q1', 'Q2', 'Q3']})
d8_2 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'R': ['R0', 'R1', 'R2', 'R3'],
                      'S': ['S0', 'S1', 'S2', 'S3']})
pd.merge(d8_1,d8_2,how='left',on=['key1','key2'])
key1 key2 P Q R S
0 K0 K0 P0 Q0 R0 S0
1 K0 K1 P1 Q1 NaN NaN
2 K1 K0 P2 Q2 R1 S1
3 K1 K0 P2 Q2 R2 S2
4 K2 K1 P3 Q3 NaN NaN
  1. Write a Pandas program to join two dataframes using keys from right dataframe only.
pd.merge(d8_1,d8_2,how='right',on=['key1','key2'])
key1 key2 P Q R S
0 K0 K0 P0 Q0 R0 S0
1 K1 K0 P2 Q2 R1 S1
2 K1 K0 P2 Q2 R2 S2
3 K2 K0 NaN NaN R3 S3
  1. Write a Pandas program to merge two given datasets using multiple join keys.
pd.merge(d8_1,d8_2,on=['key1','key2'])
key1 key2 P Q R S
0 K0 K0 P0 Q0 R0 S0
1 K1 K0 P2 Q2 R1 S1
2 K1 K0 P2 Q2 R2 S2
  1. Write a Pandas program to create a new DataFrame based on existing series, using specified argument and override the existing columns names.
s11_1 = pd.Series([0, 1, 2, 3], name='col1')
s11_2 = pd.Series([0, 1, 2, 3])
s11_3 = pd.Series([0, 1, 4, 5], name='col3')
pd.concat([s11_1,s11_2,s11_3],axis=1,keys=['Column1','Columns2','Column3'])
Column1 Columns2 Column3
0 0 0 0
1 1 1 1
2 2 2 4
3 3 3 5
  1. Write a Pandas program to create a combination from two dataframes where a column id combination appears more than once in both dataframes.
df12_1 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                     'P': ['P0', 'P1', 'P2', 'P3'],
                     'Q': ['Q0', 'Q1', 'Q2', 'Q3']})
df12_2 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'R': ['R0', 'R1', 'R2', 'R3'],
                      'S': ['S0', 'S1', 'S2', 'S3']})
pd.merge(df12_1,df12_2,on='key1')
key1 key2_x P Q key2_y R S
0 K0 K0 P0 Q0 K0 R0 S0
1 K0 K1 P1 Q1 K0 R0 S0
2 K1 K0 P2 Q2 K0 R1 S1
3 K1 K0 P2 Q2 K0 R2 S2
4 K2 K1 P3 Q3 K0 R3 S3
  1. Write a Pandas program to combine the columns of two potentially differently-indexed DataFrames into a single result DataFrame.
df13_1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                      'B': ['B0', 'B1', 'B2']},
                     index=['K0', 'K1', 'K2'])

df13_2 = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                      'D': ['D0', 'D2', 'D3']},
                     index=['K0', 'K2', 'K3'])
df13_1.join(df13_2,how='outer')
A B C D
K0 A0 B0 C0 D0
K1 A1 B1 NaN NaN
K2 A2 B2 C2 D2
K3 NaN NaN C3 D3
  1. Write a Pandas program to merge two given dataframes with different columns.
df14_1 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                     'P': ['P0', 'P1', 'P2', 'P3'],
                     'Q': ['Q0', 'Q1', 'Q2', 'Q3']})
df14_2 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'R': ['R0', 'R1', 'R2', 'R3'],
                      'S': ['S0', 'S1', 'S2', 'S3']})
pd.concat([df14_1,df14_2],axis=0,sort=False,ignore_index=True)
key1 key2 P Q R S
0 K0 K0 P0 Q0 NaN NaN
1 K0 K1 P1 Q1 NaN NaN
2 K1 K0 P2 Q2 NaN NaN
3 K2 K1 P3 Q3 NaN NaN
4 K0 K0 NaN NaN R0 S0
5 K1 K0 NaN NaN R1 S1
6 K1 K0 NaN NaN R2 S2
7 K2 K0 NaN NaN R3 S3
  1. Write a Pandas program to Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame.
df15_1 = pd.DataFrame({'A': [None, 0, None], 'B': [3, 4, 5]})
df15_2 = pd.DataFrame({'A': [1, 1, 3], 'B': [3, None, 3]})
df15_1.combine_first(df15_2)
A B
0 1.0 3
1 0.0 4
2 3.0 5