Python Pandas - DataFrame

29 minute read

Exercise - Python Pandas DataFrame

import pandas as pd
import numpy as np
  1. Write a Pandas program to create and display a DataFrame from a specified dictionary data.
data1= {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]}
df1=pd.DataFrame(data1)
print(df1)
    X   Y   Z
0  78  84  86
1  85  94  97
2  96  89  96
3  80  83  72
4  86  86  83
  1. Write a Pandas program to create and display a DataFrame from a specified dictionary data which has the index labels.
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df2=pd.DataFrame(exam_data,index=labels,columns=['attempts','name','qualify','score'])
df2
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 8.0
j 1 Jonas yes 19.0
  1. Write a Pandas program to display a summary of the basic information about a specified DataFrame and its data.
df2.info()
<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
attempts    10 non-null int64
name        10 non-null object
qualify     10 non-null object
score       8 non-null float64
dtypes: float64(1), int64(1), object(2)
memory usage: 400.0+ bytes
df2.describe()
attempts score
count 10.000000 8.000000
mean 1.900000 13.562500
std 0.875595 4.693746
min 1.000000 8.000000
25% 1.000000 9.000000
50% 2.000000 13.500000
75% 2.750000 17.125000
max 3.000000 20.000000
  1. Write a Pandas program to get the first 3 rows of a given DataFrame.
df2.head(3)
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
    1. Write a Pandas program to select the ‘name’ and ‘score’ columns from the above DataFrame.
df2[['name','score']]
name score
a Anastasia 12.5
b Dima 9.0
c Katherine 16.5
d James NaN
e Emily 9.0
f Michael 20.0
g Matthew 14.5
h Laura NaN
i Kevin 8.0
j Jonas 19.0
  1. Write a Pandas program to select the specified columns and rows from a given data frame.
df2[['name','score']].iloc[[1,3,5,6],:]
name score
b Dima 9.0
d James NaN
f Michael 20.0
g Matthew 14.5
  1. Write a Pandas program to select the rows where the number of attempts in the examination is greater than 2.
df2[df2.attempts>2]
attempts name qualify score
b 3 Dima no 9.0
d 3 James no NaN
f 3 Michael yes 20.0
  1. Write a Pandas program to count the number of rows and columns of a DataFrame.
df2.columns
Index(['attempts', 'name', 'qualify', 'score'], dtype='object')
print("Number of rows : {} \nNumber of columns : {}".format(df2.shape[0],df2.shape[1]))
Number of rows : 10
Number of columns : 4
print("Number of rows : {} \nNumber of columns : {}".format(len(df2),len(df2.columns)))
Number of rows : 10
Number of columns : 4
  1. Write a Pandas program to select the rows where the score is missing, i.e. is NaN.
df2[df2.score.isna()]
attempts name qualify score
d 3 James no NaN
h 1 Laura no NaN
  1. Write a Pandas program to select the rows the score is between 15 and 20 (inclusive).
df2[(df2.score>=15) & (df2.score<=20)]
attempts name qualify score
c 2 Katherine yes 16.5
f 3 Michael yes 20.0
j 1 Jonas yes 19.0
  1. Write a Pandas program to select the rows where number of attempts in the examination is less than 2 and score greater than 15.
df2[(df2.attempts<2) & (df2.score>15)]
attempts name qualify score
j 1 Jonas yes 19.0
  1. Write a Pandas program to change the score in row ‘d’ to 11.5.
df12=df2.copy()
df12.loc['d','score']=11.5
df12.loc['d']
attempts        3
name        James
qualify        no
score        11.5
Name: d, dtype: object
  1. Write a Pandas program to calculate the sum of the examination attempts by the students.
df2.attempts.sum()
19
  1. Write a Pandas program to calculate the mean score for each different student in DataFrame.
df2.score.mean()
13.5625
  1. Write a Pandas program to append a new row ‘k’ to data frame with given values for each column. Now delete the new row and return the original DataFrame.
df15=df2.copy()
df15.loc['k']=[1,'Suresh','yes',15.5]
df15
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 8.0
j 1 Jonas yes 19.0
k 1 Suresh yes 15.5
  1. Write a Pandas program to sort the DataFrame first by ‘name’ in descending order, then by ‘score’ in ascending order.
df2.sort_values(['name','score'],ascending=[False,True])
attempts name qualify score
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 8.0
c 2 Katherine yes 16.5
j 1 Jonas yes 19.0
d 3 James no NaN
e 2 Emily no 9.0
b 3 Dima no 9.0
a 1 Anastasia yes 12.5
  1. Write a Pandas program to replace the ‘qualify’ column contains the values ‘yes’ and ‘no’ with True and False.
df17=df2.copy()
df17.qualify=df17.qualify.apply(lambda x:True if x=='yes' else False)
df17
attempts name qualify score
a 1 Anastasia True 12.5
b 3 Dima False 9.0
c 2 Katherine True 16.5
d 3 James False NaN
e 2 Emily False 9.0
f 3 Michael True 20.0
g 1 Matthew True 14.5
h 1 Laura False NaN
i 2 Kevin False 8.0
j 1 Jonas True 19.0
  1. Write a Pandas program to change the name ‘James’ to ‘Suresh’ in name column of the DataFrame.
df18=df2.copy()
df18.name=df18.name.replace('James','Suresh')
df18
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 Suresh no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 8.0
j 1 Jonas yes 19.0
  1. Write a Pandas program to delete the ‘attempts’ column from the DataFrame.
df19=df2.copy()
df19.drop('attempts',axis=1)
name qualify score
a Anastasia yes 12.5
b Dima no 9.0
c Katherine yes 16.5
d James no NaN
e Emily no 9.0
f Michael yes 20.0
g Matthew yes 14.5
h Laura no NaN
i Kevin no 8.0
j Jonas yes 19.0
  1. Write a Pandas program to insert a new column in existing DataFrame.
df20=df2.copy()
color = ['Red','Blue','Orange','Red','White','White','Blue','Green','Green','Red']
df20['color']=color
df20
attempts name qualify score color
a 1 Anastasia yes 12.5 Red
b 3 Dima no 9.0 Blue
c 2 Katherine yes 16.5 Orange
d 3 James no NaN Red
e 2 Emily no 9.0 White
f 3 Michael yes 20.0 White
g 1 Matthew yes 14.5 Blue
h 1 Laura no NaN Green
i 2 Kevin no 8.0 Green
j 1 Jonas yes 19.0 Red
  1. Write a Pandas program to iterate over rows in a DataFrame.
exam_data = [{'name':'Anastasia', 'score':12.5}, {'name':'Dima','score':9}, {'name':'Katherine','score':16.5}]
df21=pd.DataFrame(exam_data)
for index,row in df21.iterrows():
    print(row['name'],row['score'])
Anastasia 12.5
Dima 9.0
Katherine 16.5
  1. Write a Pandas program to get list from DataFrame column headers.
df2.columns.tolist()
['attempts', 'name', 'qualify', 'score']
  1. Write a Pandas program to rename columns of a given DataFrame.
df23= pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]})
df23
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 9
df23.columns=['Column1','Column2','Column3']
df23
Column1 Column2 Column3
0 1 4 7
1 2 5 8
2 3 6 9
df23=df23.rename(columns={'Column1':'Columns1'})
df23
Columns1 Column2 Column3
0 1 4 7
1 2 5 8
2 3 6 9
  1. Write a Pandas program to select rows from a given DataFrame based on values in some columns.
df24= pd.DataFrame({'col1': [1, 4,3,4,5], 'col2': [4, 5, 6,7,8], 'col3': [7, 8, 9,0,1]})
df24[df24.col1==4]
col1 col2 col3
1 4 5 8
3 4 7 0
  1. Write a Pandas program to change the order of a DataFrame columns.
df25=df24[['col3','col2','col1']]
df25
col3 col2 col1
0 7 4 1
1 8 5 4
2 9 6 3
3 0 7 4
4 1 8 5
  1. Write a Pandas program to add one row in an existing DataFrame.
df26=df24.copy()
df26.loc[len(df26)]=[10,11,12]
df26
col1 col2 col3
0 1 4 7
1 4 5 8
2 3 6 9
3 4 7 0
4 5 8 1
5 10 11 12
  1. Write a Pandas program to write a DataFrame to CSV file using tab separator.
df24.to_csv('27.csv',sep='\t',index=False)
df27=pd.read_csv('27.csv')
df27
col1\tcol2\tcol3
0 1\t4\t7
1 4\t5\t8
2 3\t6\t9
3 4\t7\t0
4 5\t8\t1
  1. Write a Pandas program to count city wise number of people from a given of data set (city, name of the person).
df28 = pd.DataFrame({'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'city': ['California', 'Los Angeles', 'California', 'California', 'California', 'Los Angeles', 'Los Angeles', 'Georgia', 'Georgia', 'Los Angeles']})

df28
name city
0 Anastasia California
1 Dima Los Angeles
2 Katherine California
3 James California
4 Emily California
5 Michael Los Angeles
6 Matthew Los Angeles
7 Laura Georgia
8 Kevin Georgia
9 Jonas Los Angeles
df28.groupby('city').size().reset_index(name='Number of people')
city Number of people
0 California 4
1 Georgia 2
2 Los Angeles 4
  1. Write a Pandas program to delete DataFrame row(s) based on given column value.
df29=df24.copy()
df29=df29[df29.col2!=5]
df29
col1 col2 col3
0 1 4 7
2 3 6 9
3 4 7 0
4 5 8 1
  1. Write a Pandas program to widen output display to see more columns.
pd.set_option('display.max_rows',500)
pd.set_option('display.max_columns',500)
pd.set_option('display.width',1000)
print(df24)
   col1  col2  col3
0     1     4     7
1     4     5     8
2     3     6     9
3     4     7     0
4     5     8     1
  1. Write a Pandas program to select a row of series/dataframe by given integer index.
df24.loc[[2]]
col1 col2 col3
2 3 6 9
  1. Write a Pandas program to replace all the NaN values with Zero’s in a column of a dataframe.
df32=df2.copy()
df32.fillna(0)
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no 0.0
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no 0.0
i 2 Kevin no 8.0
j 1 Jonas yes 19.0
  1. Write a Pandas program to convert index in a column of the given dataframe.
df33=df2.copy()
df33.reset_index()
index attempts name qualify score
0 a 1 Anastasia yes 12.5
1 b 3 Dima no 9.0
2 c 2 Katherine yes 16.5
3 d 3 James no NaN
4 e 2 Emily no 9.0
5 f 3 Michael yes 20.0
6 g 1 Matthew yes 14.5
7 h 1 Laura no NaN
8 i 2 Kevin no 8.0
9 j 1 Jonas yes 19.0
print(df33.reset_index().to_string(index=False))
index  attempts       name qualify  score
    a         1  Anastasia     yes   12.5
    b         3       Dima      no    9.0
    c         2  Katherine     yes   16.5
    d         3      James      no    NaN
    e         2      Emily      no    9.0
    f         3    Michael     yes   20.0
    g         1    Matthew     yes   14.5
    h         1      Laura      no    NaN
    i         2      Kevin      no    8.0
    j         1      Jonas     yes   19.0
#Back to original be setting index once again
df33.reset_index().set_index('index')
attempts name qualify score
index
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 8.0
j 1 Jonas yes 19.0
  1. Write a Pandas program to set a given value for particular cell in DataFrame using index value.
df2
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 8.0
j 1 Jonas yes 19.0
df34=df2.copy()
df34.loc['i',:].score=10.2
C:\Users\a112471\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py:5208: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value
df34
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 8.0
j 1 Jonas yes 19.0
df34.set_value('i','score',10.2)
C:\Users\a112471\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: set_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead
  """Entry point for launching an IPython kernel.
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 10.2
j 1 Jonas yes 19.0
df34.at['i','score']=10.2
df34
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 10.2
j 1 Jonas yes 19.0
df34.iat[8,3]
10.2
  1. Write a Pandas program to count the NaN values in one or more columns in DataFrame.
df2.isna().sum().sum()
2
  1. Write a Pandas program to drop a list of rows from a specified DataFrame.
df36=df24.copy()
df36.drop([2,4],axis=0)
col1 col2 col3
0 1 4 7
1 4 5 8
3 4 7 0
  1. Write a Pandas program to reset index in a given DataFrame.
df37=df2.copy()
df37.drop(['a','b']).reset_index()
index attempts name qualify score
0 c 2 Katherine yes 16.5
1 d 3 James no NaN
2 e 2 Emily no 9.0
3 f 3 Michael yes 20.0
4 g 1 Matthew yes 14.5
5 h 1 Laura no NaN
6 i 2 Kevin no 8.0
7 j 1 Jonas yes 19.0
df37
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 8.0
j 1 Jonas yes 19.0
  1. Write a Pandas program to divide a DataFrame in a given ratio.
df38=pd.DataFrame(np.random.randn(10,2))
df38
0 1
0 1.248443 -1.241437
1 1.350421 -1.539320
2 -0.118075 1.586380
3 0.126995 1.419173
4 -0.062786 0.070672
5 -0.077202 0.442252
6 -0.547482 0.472672
7 -0.154437 -0.138206
8 0.634787 1.480123
9 -0.029771 0.834099
df38_70per=df38.sample(frac=.7,random_state=0)
df38_30per=df38.drop(df38_70per.index)
df38_70per
0 1
2 -0.118075 1.586380
8 0.634787 1.480123
4 -0.062786 0.070672
9 -0.029771 0.834099
1 1.350421 -1.539320
6 -0.547482 0.472672
7 -0.154437 -0.138206
df38_30per
0 1
0 1.248443 -1.241437
3 0.126995 1.419173
5 -0.077202 0.442252
  1. Write a Pandas program to combining two series into a DataFrame.
s39_1 = pd.Series(['100', '200', 'python', '300.12', '400'])
s39_2 = pd.Series(['10', '20', 'php', '30.12', '40'])
pd.concat([s39_1,s39_2],axis=1)
0 1
0 100 10
1 200 20
2 python php
3 300.12 30.12
4 400 40
  1. Write a Pandas program to shuffle a given DataFrame rows.
df40=df2.copy()
df40.sample(frac=1)
attempts name qualify score
g 1 Matthew yes 14.5
f 3 Michael yes 20.0
d 3 James no NaN
a 1 Anastasia yes 12.5
e 2 Emily no 9.0
h 1 Laura no NaN
i 2 Kevin no 8.0
c 2 Katherine yes 16.5
j 1 Jonas yes 19.0
b 3 Dima no 9.0
df40
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 8.0
j 1 Jonas yes 19.0
  1. Write a Pandas program to convert DataFrame column type from string to datetime.
s41 = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'])
s41
0    3/11/2000
1    3/12/2000
2    3/13/2000
dtype: object
pd.to_datetime(s41)
0   2000-03-11
1   2000-03-12
2   2000-03-13
dtype: datetime64[ns]
  1. Write a Pandas program to rename a specific column name in a given DataFrame.
df42= pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]})
df42.rename(columns={'col2':'column2'})
col1 column2 col3
0 1 4 7
1 2 5 8
2 3 6 9
  1. Write a Pandas program to get a list of a specified column of a DataFrame.
df42.col2.tolist()
[4, 5, 6]
  1. Write a Pandas program to create a DataFrame from a Numpy array and specify the index column and column headers.
pd.DataFrame(np.random.randn(30).reshape(10,3),index=np.linspace(0,9,10,dtype='int'),columns=['column1','columns2','columns3'])
column1 columns2 columns3
0 -0.345403 2.167887 0.495698
1 1.146200 -1.078212 0.231268
2 -0.619157 1.287958 0.216466
3 -0.859700 -0.735332 0.437529
4 0.395431 1.002545 -0.999019
5 -0.474449 -0.106919 -2.089148
6 0.375528 0.173526 -1.103849
7 -1.468913 1.206882 -0.983365
8 -0.751145 -0.289252 -0.346848
9 0.923659 1.038505 -1.394882
  1. Write a Pandas program to find the row for where the value of a given column is maximum.
df45 = pd.DataFrame({'col1': [1, 2, 3, 4, 7], 'col2': [4, 5, 6, 9, 5], 'col3': [7, 8, 12, 1, 11]})

df45
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 12
3 4 9 1
4 7 5 11
#Solution 1 (not efficient)
df45[df45.col1==df45.col1.max()].index[0]
4
df45['col1'].idxmax()
4
df45['col2'].idxmax()
3
df45['col3'].idxmax()
2
  1. Write a Pandas program to check whether a given column is present in a DataFrame or not.
col='Col4'
if col in df45.columns.tolist():
    print(col,'present in DataFrame')
else:
    print(col,'not present in DataFrame')
Col4 not present in DataFrame
  1. Write a Pandas program to get the specified row value of a given DataFrame.
df45.iloc[0]
col1    1
col2    4
col3    7
Name: 0, dtype: int64
  1. Write a Pandas program to get the datatypes of columns of a DataFrame.
df2.dtypes
attempts      int64
name         object
qualify      object
score       float64
dtype: object
  1. Write a Pandas program to append data to an empty DataFrame.
df49=pd.DataFrame()
d49=pd.DataFrame({'col1':range(3),'col2':range(3)})
df49.append(d49)
col1 col2
0 0 0
1 1 1
2 2 2
  1. Write a Pandas program to sort a given DataFrame by two or more columns.
df50=df2.copy()
df50.sort_values(['attempts','name'],ascending=[True,False])
attempts name qualify score
g 1 Matthew yes 14.5
h 1 Laura no NaN
j 1 Jonas yes 19.0
a 1 Anastasia yes 12.5
i 2 Kevin no 8.0
c 2 Katherine yes 16.5
e 2 Emily no 9.0
f 3 Michael yes 20.0
d 3 James no NaN
b 3 Dima no 9.0
  1. Write a Pandas program to convert the datatype of a given column (floats to ints).
df51=df2.copy()
df51.dtypes
attempts      int64
name         object
qualify      object
score       float64
dtype: object
df51.fillna(0).astype({'score':int}).dtypes
attempts     int64
name        object
qualify     object
score        int32
dtype: object
  1. Write a Pandas program to remove infinite values from a given DataFrame.
df52 = pd.DataFrame([1000, 2000, 3000, -4000, np.inf, -np.inf])
df52
0
0 1000.0
1 2000.0
2 3000.0
3 -4000.0
4 inf
5 -inf
df52.replace([np.inf,-np.inf],np.NaN)
0
0 1000.0
1 2000.0
2 3000.0
3 -4000.0
4 NaN
5 NaN
  1. Write a Pandas program to insert a given column at a specific column index in a DataFrame.
df53 = pd.DataFrame({'col2': [4, 5, 6, 9, 5], 'col3': [7, 8, 12, 1, 11]})

df53
col2 col3
0 4 7
1 5 8
2 6 12
3 9 1
4 5 11
col1= [1, 2, 3, 4, 7]
idx=0
df53.insert(loc=idx,column='col1',value=col1)
df53
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 12
3 4 9 1
4 7 5 11
  1. Write a Pandas program to convert a given list of lists into a Dataframe.
d54=[[2, 4], [1, 3]]
pd.DataFrame(d54,columns=['col1','col2'])
col1 col2
0 2 4
1 1 3
  1. Write a Pandas program to group by the first column and get second column as lists in rows.
df55=pd.DataFrame({'col1':['C1','C1','C2','C2','C2','C3','C2'],'col2':[1,2,3,3,4,6,5]})
g55=df55.groupby('col1')['col2']
#Wrong Solution (or not gettring desired result)
for ele in g55:
    print(ele[1])
0    1
1    2
Name: col2, dtype: int64
2    3
3    3
4    4
6    5
Name: col2, dtype: int64
5    6
Name: col2, dtype: int64
#Correct Solution
df55.groupby('col1')['col2'].apply(list)
col1
C1          [1, 2]
C2    [3, 3, 4, 5]
C3             [6]
Name: col2, dtype: object
  1. Write a Pandas program to get column index from column name of a given DataFrame.
df45.columns.tolist().index('col2')
1
df45.columns.get_loc('col2')
1
  1. Write a Pandas program to count number of columns of a DataFrame.
df45.shape[1]
3
len(df45.columns)
3
  1. Write a Pandas program to select all columns, except one given column in a DataFrame.
#Solution 1
df45.drop('col3',axis=1)
col1 col2
0 1 4
1 2 5
2 3 6
3 4 9
4 7 5
#Solution 2
df45.loc[:,df45.columns!='col3']
col1 col2
0 1 4
1 2 5
2 3 6
3 4 9
4 7 5
  1. Write a Pandas program to get first n records of a DataFrame.
df45.head(3)
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 12
  1. Write a Pandas program to get last n records of a DataFrame.
df45.iloc[-3:,:]
col1 col2 col3
2 3 6 12
3 4 9 1
4 7 5 11
df45.tail(3)
col1 col2 col3
2 3 6 12
3 4 9 1
4 7 5 11
  1. Write a Pandas program to get topmost n records within each group of a DataFrame.
df61 = pd.DataFrame({'col1': [1, 2, 3, 4, 7, 11], 'col2': [4, 5, 6, 9, 5, 0], 'col3': [7, 5, 8, 12, 1,11]})

df61
col1 col2 col3
0 1 4 7
1 2 5 5
2 3 6 8
3 4 9 12
4 7 5 1
5 11 0 11
df61.nlargest(3,'col1')
col1 col2 col3
5 11 0 11
4 7 5 1
3 4 9 12
df61.nlargest(3,'col2')
col1 col2 col3
3 4 9 12
2 3 6 8
1 2 5 5
df61.nlargest(3,'col3')
col1 col2 col3
3 4 9 12
5 11 0 11
2 3 6 8
  1. Write a Pandas program to remove first n rows of a given DataFrame.
df62=df61.copy()
df62
col1 col2 col3
0 1 4 7
1 2 5 5
2 3 6 8
3 4 9 12
4 7 5 1
5 11 0 11
df62=df62.tail(-3)
df62
col1 col2 col3
3 4 9 12
4 7 5 1
5 11 0 11
  1. Write a Pandas program to remove last n rows of a given DataFrame.
df63=df61.copy()
df63=df63.iloc[:-3,:]
df63
col1 col2 col3
0 1 4 7
1 2 5 5
2 3 6 8
  1. Write a Pandas program to add a prefix or suffix to all columns of a given DataFrame.
df64 = pd.DataFrame({'W':[68,75,86,80,66],'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});

df64_1=df64.copy()
df64_1.columns='A_'+df64.columns
df64_1
A_W A_X A_Y A_Z
0 68 78 84 86
1 75 85 94 97
2 86 96 89 96
3 80 80 83 72
4 66 86 86 83
df64_2=df64.copy()
df64_2.columns=df64.columns+'_1'
df64_2
W_1 X_1 Y_1 Z_1
0 68 78 84 86
1 75 85 94 97
2 86 96 89 96
3 80 80 83 72
4 66 86 86 83
#sol2
df64.add_prefix('A_')
A_W A_X A_Y A_Z
0 68 78 84 86
1 75 85 94 97
2 86 96 89 96
3 80 80 83 72
4 66 86 86 83
df64.add_suffix('_1')
W_1 X_1 Y_1 Z_1
0 68 78 84 86
1 75 85 94 97
2 86 96 89 96
3 80 80 83 72
4 66 86 86 83
  1. Write a Pandas program to reverse order (rows, columns) of a given DataFrame.
df64[df64.columns[::-1]]
#df64.iloc[:,::-1] same answer
Z Y X W
0 86 84 78 68
1 97 94 85 75
2 96 89 96 86
3 72 83 80 80
4 83 86 86 66
df64.iloc[::-1]
W X Y Z
4 66 86 86 83
3 80 80 83 72
2 86 96 89 96
1 75 85 94 97
0 68 78 84 86
print("Reverse row order and reset index:")
df64.iloc[::-1].reset_index(drop=True)
Reverse row order and reset index:
W X Y Z
0 66 86 86 83
1 80 80 83 72
2 86 96 89 96
3 75 85 94 97
4 68 78 84 86
  1. Write a Pandas program to select columns by data type of a given DataFrame.
df66 = pd.DataFrame({
    'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
    'date_of_birth': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'age': [18.5, 21.2, 22.5, 22, 23]
})
df66
name date_of_birth age
0 Alberto Franco 17/05/2002 18.5
1 Gino Mcneill 16/02/1999 21.2
2 Ryan Parkes 25/09/1998 22.5
3 Eesha Hinton 11/05/2002 22.0
4 Syed Wharton 15/09/1997 23.0
df66.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
name             5 non-null object
date_of_birth    5 non-null object
age              5 non-null float64
dtypes: float64(1), object(2)
memory usage: 248.0+ bytes
df66.select_dtypes(include='number')
age
0 18.5
1 21.2
2 22.5
3 22.0
4 23.0
df66.select_dtypes(include='object')
name date_of_birth
0 Alberto Franco 17/05/2002
1 Gino Mcneill 16/02/1999
2 Ryan Parkes 25/09/1998
3 Eesha Hinton 11/05/2002
4 Syed Wharton 15/09/1997
  1. Write a Pandas program to split a given DataFrame into two random subsets.
df67=df66.copy()
df67_1=df67.sample(frac=0.7,random_state=0)
df67_2=df67.drop(df67_1.index)
df67_1
name date_of_birth age
2 Ryan Parkes 25/09/1998 22.5
0 Alberto Franco 17/05/2002 18.5
1 Gino Mcneill 16/02/1999 21.2
3 Eesha Hinton 11/05/2002 22.0
df67_2
name date_of_birth age
4 Syed Wharton 15/09/1997 23.0
  1. Write a Pandas program to rename all columns with the same pattern of a given DataFrame.
df68 = pd.DataFrame({
    'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
    'Date_of_Birth': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'Age': [18.5, 21.2, 22.5, 22, 23]
})
df68
Name Date_of_Birth Age
0 Alberto Franco 17/05/2002 18.5
1 Gino Mcneill 16/02/1999 21.2
2 Ryan Parkes 25/09/1998 22.5
3 Eesha Hinton 11/05/2002 22.0
4 Syed Wharton 15/09/1997 23.0
df68.columns=df68.columns.str.lower().str.strip()
df68
name date_of_birth age
0 Alberto Franco 17/05/2002 18.5
1 Gino Mcneill 16/02/1999 21.2
2 Ryan Parkes 25/09/1998 22.5
3 Eesha Hinton 11/05/2002 22.0
4 Syed Wharton 15/09/1997 23.0
  1. Write a Pandas program to merge datasets and check uniqueness.
df69 = pd.DataFrame({
    'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
    'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'Age': [18.5, 21.2, 22.5, 22, 23]
})
df69_1=df69.copy()
df69_1=df69_1.drop([0,1])
df69_1
Name Date_Of_Birth Age
2 Ryan Parkes 25/09/1998 22.5
3 Eesha Hinton 11/05/2002 22.0
4 Syed Wharton 15/09/1997 23.0
df69_2=df69.copy()
df69_2=df69_2.drop([2])
df69_2
Name Date_Of_Birth Age
0 Alberto Franco 17/05/2002 18.5
1 Gino Mcneill 16/02/1999 21.2
3 Eesha Hinton 11/05/2002 22.0
4 Syed Wharton 15/09/1997 23.0
#one_to_one
pd.merge(df69_1,df69_2,validate='one_to_one')
Name Date_Of_Birth Age
0 Eesha Hinton 11/05/2002 22.0
1 Syed Wharton 15/09/1997 23.0
#one_to_many
pd.merge(df69_1,df69_2,validate='one_to_many')
Name Date_Of_Birth Age
0 Eesha Hinton 11/05/2002 22.0
1 Syed Wharton 15/09/1997 23.0
#many_to_one
pd.merge(df69_1,df69_2,validate='many_to_one')
Name Date_Of_Birth Age
0 Eesha Hinton 11/05/2002 22.0
1 Syed Wharton 15/09/1997 23.0
#many_to_many
pd.merge(df69_1,df69_2,validate='many_to_many')
Name Date_Of_Birth Age
0 Eesha Hinton 11/05/2002 22.0
1 Syed Wharton 15/09/1997 23.0
  1. Write a Pandas program to convert continuous values of a column in a given DataFrame to categorical.
df70 = pd.DataFrame({
    'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton', 'Kierra Gentry'],
      'age': [18, 22, 85, 50, 80, 5]
})
df70
name age
0 Alberto Franco 18
1 Gino Mcneill 22
2 Ryan Parkes 85
3 Eesha Hinton 50
4 Syed Wharton 80
5 Kierra Gentry 5
df70['age_group']=pd.cut(df70.age,bins=[0,18,65,95],labels=['kids','adult','elderly'])
df70
name age age_group
0 Alberto Franco 18 kids
1 Gino Mcneill 22 adult
2 Ryan Parkes 85 elderly
3 Eesha Hinton 50 adult
4 Syed Wharton 80 elderly
5 Kierra Gentry 5 kids
  1. Write a Pandas program to display memory usage of a given DataFrame and every column of the DataFrame.
df70.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
name         6 non-null object
age          6 non-null int64
age_group    6 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 334.0+ bytes
df70.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
name         6 non-null object
age          6 non-null int64
age_group    6 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 903.0 bytes
df70.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
name         6 non-null object
age          6 non-null int64
age_group    6 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 903.0 bytes
df70.memory_usage(deep=True)
Index        128
name         416
age           48
age_group    311
dtype: int64
df70.memory_usage(deep=False)
Index        128
name          48
age           48
age_group    110
dtype: int64
  1. Write a Pandas program to combine many given series to create a DataFrame.
s72_1=pd.Series('php python jawa c# c++'.split())
s72_2=pd.Series([1,2,3,4,5])
pd.concat([s72_1,s72_2],axis=1)
0 1
0 php 1
1 python 2
2 jawa 3
3 c# 4
4 c++ 5
pd.DataFrame({'col1':s72_1,'col2':s72_2})
col1 col2
0 php 1
1 python 2
2 jawa 3
3 c# 4
4 c++ 5
  1. Write a Pandas program to create DataFrames that contains random values, contains missing values, contains datetime values and contains mixed values.
pd.DataFrame(np.random.randn(40).reshape(10,4),columns='A B C D'.split())
A B C D
0 0.976226 -1.231195 -1.927390 1.687527
1 0.265455 -0.872881 -1.234580 0.025491
2 2.309553 -0.368772 1.281644 0.903505
3 -0.490682 -0.934150 0.395868 -0.614944
4 1.270408 0.105810 -0.934098 -0.193174
5 -0.412797 0.123316 0.656035 -0.801055
6 -0.627806 -0.182581 0.200891 -0.471121
7 -0.705074 0.416434 0.424676 0.495411
8 -0.157094 -1.339826 1.379757 1.843449
9 1.265529 -0.211037 0.542715 1.862077
pd.util.testing.makeDataFrame()
A B C D
mIrUvYIpau -0.132329 0.369235 -2.518745 0.463750
IIuIlUG3TZ 0.224135 1.173692 -1.402806 -0.634084
lhz7LziqT6 -1.747848 1.112314 -1.173867 -1.275242
bDRGRAVZ5L -1.125598 1.167309 0.119561 -0.315563
tltm1jawR2 -0.170738 -0.020117 0.807885 -1.399174
gyHU2hW6dO 0.228735 -0.685193 -0.241533 -1.230713
GH6l1o4kp8 -0.683374 -1.373173 1.563031 -1.344920
Ee15AMPrrZ -0.229806 -1.190730 0.691445 -0.614836
CVtTzdhYOH -1.806060 0.126193 1.921333 0.655736
u7yGpt7ojr 0.418048 -0.560950 -2.247065 2.067856
JQRGGsJRHG -1.116554 0.614687 -0.367641 0.160849
6NTUAVqlrk 0.622822 0.119354 -1.052081 -0.347517
8XKVbhaoCO 1.380594 -0.907565 -1.298444 0.627115
HSlwYeeFDf 0.566009 -0.185294 0.437134 -1.943550
441fk4pWhD 0.214447 -1.183068 1.761380 -0.652720
VdiWlWttWB 0.743793 1.722755 0.757516 0.615187
ls77KaEvOq -1.344870 0.364236 0.093453 -0.609153
5K0bPzfMFi -1.065251 0.841662 1.479194 -0.731203
AXsuHNuwHo -0.827250 1.965530 0.773863 0.665516
T0GlToPXZi -0.320390 -0.728882 2.018289 -0.578019
22vFAApOA5 -0.494500 0.550730 -1.067188 0.044000
gPiu7GA3im -0.521648 -0.856955 0.709654 -0.932031
OZnBwP9gm9 -0.181220 -1.985956 0.794047 -0.335244
9U23Nv30l1 0.126521 -0.269881 -0.475025 0.873421
NLLYWOAAPo -1.399080 -0.614755 -0.431522 -1.314192
0snfJwr8Zi 1.104287 -0.747049 -0.494870 -0.280476
hyIRHRTG6c 0.447759 0.375540 -1.348494 -1.576017
nO4Ybp7ia5 -0.665945 1.525348 0.291121 0.763574
OU81k9dvGD 0.897210 0.587101 1.375282 -0.842839
ktS4vr3Pqy -0.270603 -0.081540 -1.243459 0.965349
pd.util.testing.makeMissingDataframe()
A B C D
K1jtgA2gtF -1.142478 1.478004 0.777470 0.517304
Fz0OYcxMQn -0.096315 0.070175 0.176539 -0.511371
7gLTJCMX3w -0.236882 0.453078 -0.588427 1.412213
Id0qucUCeB NaN -1.088763 -0.637960 1.000012
MlJfZSTHJj -0.732373 1.112463 0.441055 0.148622
p74iwf5CWk 0.880165 NaN 0.319050 -0.100719
k4wEIqvtKd -1.076164 -1.052221 NaN 0.817084
6gdeN8A4gg 1.160957 0.109959 0.587556 1.699185
c0GHnKP4vD -1.127555 0.933663 0.527032 -0.450704
0yltZq25JS -0.177030 0.225647 0.413793 1.491172
BAGqtJef0S 0.356970 0.621645 -1.182119 -0.008877
iOriKpo3am 0.734046 -0.004511 0.921544 0.719030
Ogd3TaGzC0 -0.076480 -1.460346 0.026264 -0.775999
XZp591LpQ7 -0.998655 NaN 1.518520 0.453103
nvd2jdiBX5 1.592946 -1.306997 -0.168547 NaN
ICJxEvHfx4 -0.577977 0.387412 -0.594324 0.294138
mLyqvY1KVk 0.901325 -1.560872 0.160145 -1.170798
23TmyM4Vjt 0.291432 -0.810652 -0.184815 0.211070
J2JBz9nc2N 1.818592 NaN NaN -0.257770
77utwjDcEg NaN 0.183574 0.111528 1.381484
yAg15dn3KU -0.259229 -0.201913 -1.912119 -0.271751
yRbk019rAz -0.913102 -0.120397 -0.721780 0.725017
U5oGxD3WnW -0.565617 0.579980 NaN 0.031062
sNGtSOERD7 NaN 0.074556 1.525357 -1.744255
ubpFxokUim 1.045186 1.165908 -1.138931 -0.082937
b6ZCgA1dZD 1.620180 0.962883 0.966021 1.709217
3lYiadTEpV -0.147411 0.950925 -0.122691 -0.973400
qvn2AUqCgF 1.207141 0.611017 -0.733806 0.072065
dqO35ESFKx 2.076349 0.519654 NaN 0.217811
am4HPhiP5I -1.403172 NaN 0.797241 -0.641350
pd.util.testing.makeTimeDataFrame()
A B C D
2000-01-03 1.003781 -0.356442 -0.690759 -2.543469
2000-01-04 0.307821 -0.425427 -0.134576 0.824121
2000-01-05 -0.976996 -1.113531 -0.522290 -0.567436
2000-01-06 0.438595 0.129363 -0.433556 1.080008
2000-01-07 -0.001516 -0.291726 -1.008141 -0.605705
2000-01-10 0.015357 -1.016492 -0.162590 -2.282291
2000-01-11 0.523147 0.587595 0.451137 -0.031271
2000-01-12 0.917431 1.109996 -0.561045 0.305243
2000-01-13 1.150708 -0.069645 -0.561491 -0.165408
2000-01-14 1.122621 -1.035067 1.201935 -0.690251
2000-01-17 0.813662 0.924708 0.545710 -0.002078
2000-01-18 1.379715 -0.229399 -0.726494 0.493481
2000-01-19 0.622288 -0.757303 0.689828 0.229719
2000-01-20 -0.680374 1.011196 -0.551390 -1.713440
2000-01-21 -1.524046 -1.010875 -0.997332 0.666093
2000-01-24 -0.941409 1.025098 -0.905963 1.082745
2000-01-25 0.042881 1.382499 1.049391 1.074299
2000-01-26 0.967413 -0.923113 2.007157 0.628600
2000-01-27 -0.051459 1.172377 0.464885 1.129185
2000-01-28 0.533879 -1.320932 0.503104 -0.695140
2000-01-31 -1.133007 0.437374 0.505606 -1.571305
2000-02-01 0.979238 -0.721377 0.234655 -0.061103
2000-02-02 0.231642 -0.627750 -0.133623 0.748030
2000-02-03 -1.431866 1.060292 -0.040169 0.229246
2000-02-04 -0.350627 -1.165422 0.436717 0.446511
2000-02-07 -1.886733 0.926730 0.604955 -1.034671
2000-02-08 1.205402 1.982943 -0.653149 -0.675256
2000-02-09 0.212245 -0.374800 -0.472396 1.577047
2000-02-10 -0.732806 -1.332025 1.173203 0.602202
2000-02-11 -0.681866 -0.170969 -1.366112 0.779899
pd.util.testing.makeMixedDataFrame()
A B C D
0 0.0 0.0 foo1 2009-01-01
1 1.0 1.0 foo2 2009-01-02
2 2.0 0.0 foo3 2009-01-05
3 3.0 1.0 foo4 2009-01-06
4 4.0 0.0 foo5 2009-01-07
  1. Write a Pandas program to fill missing values in time series data.
s74 = {"c1":[120, 130 ,140, 150, np.nan, 170], "c2":[7, np.nan, 10, np.nan, 5.5, 16.5]}
pd.util.testing.makeDateIndex()[:6]
DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-10'], dtype='datetime64[ns]', freq='B')
df74=pd.DataFrame(s74,index=pd.util.testing.makeDateIndex()[:6])
df74
c1 c2
2000-01-03 120.0 7.0
2000-01-04 130.0 NaN
2000-01-05 140.0 10.0
2000-01-06 150.0 NaN
2000-01-07 NaN 5.5
2000-01-10 170.0 16.5
df74.interpolate()
c1 c2
2000-01-03 120.0 7.00
2000-01-04 130.0 8.50
2000-01-05 140.0 10.00
2000-01-06 150.0 7.75
2000-01-07 160.0 5.50
2000-01-10 170.0 16.50
  1. Write a Pandas program to use a local variable within a query.
df75 = pd.DataFrame({'W':[68,75,86,80,66],'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});

df75
W X Y Z
0 68 78 84 86
1 75 85 94 97
2 86 96 89 96
3 80 80 83 72
4 66 86 86 83
maxx = df75["W"].max()
df75.query("W < @maxx")
W X Y Z
0 68 78 84 86
1 75 85 94 97
3 80 80 83 72
4 66 86 86 83
#Solution 2 (without Query)
df75[df75.W<df75.W.max()]
W X Y Z
0 68 78 84 86
1 75 85 94 97
3 80 80 83 72
4 66 86 86 83
  1. Write a Pandas program to clean object column with mixed data of a given DataFrame using regular expression.
df76 = pd.DataFrame({"agent": ["a001", "a002", "a003", "a003", "a004"], "purchase":[4500.00, 7500.00, "$3000.25", "$1250.35", "9000.00"]})

df76
agent purchase
0 a001 4500
1 a002 7500
2 a003 $3000.25
3 a003 $1250.35
4 a004 9000.00
df76.purchase.apply(type)
0    <class 'float'>
1    <class 'float'>
2      <class 'str'>
3      <class 'str'>
4      <class 'str'>
Name: purchase, dtype: object
df76.purchase.replace("[$]","",regex=True).astype('float').apply(type)
0    <class 'float'>
1    <class 'float'>
2    <class 'float'>
3    <class 'float'>
4    <class 'float'>
Name: purchase, dtype: object
  1. Write a Pandas program to get the numeric representation of an array by identifying distinct values of a given column of a dataframe.
df77 = pd.DataFrame({
    'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill'],
    'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'Age': [18.5, 21.2, 22.5, 22, 23]
})
df77
Name Date_Of_Birth Age
0 Alberto Franco 17/05/2002 18.5
1 Gino Mcneill 16/02/1999 21.2
2 Ryan Parkes 25/09/1998 22.5
3 Eesha Hinton 11/05/2002 22.0
4 Gino Mcneill 15/09/1997 23.0
a77_1=df77.Name.unique()
d77={}
for ind,name in df77.iterrows():
    print(ind,name[0])
    if name[0] not in d77.keys():
        d77[name[0]]=ind
0 Alberto Franco
1 Gino Mcneill
2 Ryan Parkes
3 Eesha Hinton
4 Gino Mcneill
d77
{'Alberto Franco': 0, 'Gino Mcneill': 1, 'Ryan Parkes': 2, 'Eesha Hinton': 3}
df77.replace({'Name':d77})
Name Date_Of_Birth Age
0 0 17/05/2002 18.5
1 1 16/02/1999 21.2
2 2 25/09/1998 22.5
3 3 11/05/2002 22.0
4 1 15/09/1997 23.0
#solution 1
df77['Name'].map(d77).tolist()
[0, 1, 2, 3, 1]
#Solution 2
df77.replace({'Name':d77}).Name.tolist()
[0, 1, 2, 3, 1]
#one step Solution
pd.factorize(df77.Name)
(array([0, 1, 2, 3, 1], dtype=int64),
 Index(['Alberto Franco', 'Gino Mcneill', 'Ryan Parkes', 'Eesha Hinton'], dtype='object'))
  1. Write a Pandas program to replace the current value in a dataframe column based on last largest value. If the current value is less than last largest value replaces the value with 0.
df78=pd.DataFrame({'rnum':[23, 21, 27, 22, 34, 33, 34, 31, 25, 22, 34, 19, 31, 32, 19]})
df78
rnum
0 23
1 21
2 27
3 22
4 34
5 33
6 34
7 31
8 25
9 22
10 34
11 19
12 31
13 32
14 19
a78_1=[-np.inf]
for ele in df78.rnum:
    if ele<np.amax(a78_1):
        a78_1.append(0)
    else:
        a78_1.append(ele)
    #print(ele)
print(a78_1[1:])
[23, 0, 27, 0, 34, 0, 34, 0, 0, 0, 34, 0, 0, 0, 0]
#Solution 2
df78.rnum.where(df78.rnum.eq(df78.rnum.cummax()),0)
0     23
1      0
2     27
3      0
4     34
5      0
6     34
7      0
8      0
9      0
10    34
11     0
12     0
13     0
14     0
Name: rnum, dtype: int64
  1. Write a Pandas program to create a DataFrame from the clipboard (data from an Excel spreadsheet or a Google Sheet).
df79=pd.read_clipboard()
df79
Python Pandas DataFrame
  1. Write a Pandas program to check for inequality of two given DataFrames.
df80_1 = pd.DataFrame({'W':[68,75,86,80,None],'X':[78,85,None,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});
df80_2 = pd.DataFrame({'W':[78,75,86,80,None],'X':[78,85,96,80,76], 'Y':[84,84,89,83,86],'Z':[86,97,96,72,83]});

df80_1!=df80_2
#~(df80_1==df80_2)
#df80_1.ne(df80_2)
W X Y Z
0 True False False False
1 False False True False
2 False True False False
3 False False False False
4 True True False False
  1. Write a Pandas program to get lowest n records within each group of a given DataFrame.
df81 = pd.DataFrame({'col1': [1, 2, 3, 4, 7, 11], 'col2': [4, 5, 6, 9, 5, 0], 'col3': [7, 5, 8, 12, 1,11]})

df81.nsmallest(3,'col1')
col1 col2 col3
0 1 4 7
1 2 5 5
2 3 6 8
df81.nsmallest(3,'col2')
col1 col2 col3
5 11 0 11
0 1 4 7
1 2 5 5
df81.nsmallest(3,'col3')
col1 col2 col3
4 7 5 1
1 2 5 5
0 1 4 7
#df81.nsmallest(3,columns=['col2','col1']) # It is taking first passed column into account even if we pass multiple columns