Python Pandas - DataFrame
Exercise - Python Pandas DataFrame
import pandas as pd
import numpy as np
- Write a Pandas program to create and display a DataFrame from a specified dictionary data.
data1= {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]}
df1=pd.DataFrame(data1)
print(df1)
    X   Y   Z
0  78  84  86
1  85  94  97
2  96  89  96
3  80  83  72
4  86  86  83
- Write a Pandas program to create and display a DataFrame from a specified dictionary data which has the index labels.
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df2=pd.DataFrame(exam_data,index=labels,columns=['attempts','name','qualify','score'])
df2
| attempts | name | qualify | score | |
|---|---|---|---|---|
| a | 1 | Anastasia | yes | 12.5 | 
| b | 3 | Dima | no | 9.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| d | 3 | James | no | NaN | 
| e | 2 | Emily | no | 9.0 | 
| f | 3 | Michael | yes | 20.0 | 
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | NaN | 
| i | 2 | Kevin | no | 8.0 | 
| j | 1 | Jonas | yes | 19.0 | 
- Write a Pandas program to display a summary of the basic information about a specified DataFrame and its data.
df2.info()
<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
attempts    10 non-null int64
name        10 non-null object
qualify     10 non-null object
score       8 non-null float64
dtypes: float64(1), int64(1), object(2)
memory usage: 400.0+ bytes
df2.describe()
| attempts | score | |
|---|---|---|
| count | 10.000000 | 8.000000 | 
| mean | 1.900000 | 13.562500 | 
| std | 0.875595 | 4.693746 | 
| min | 1.000000 | 8.000000 | 
| 25% | 1.000000 | 9.000000 | 
| 50% | 2.000000 | 13.500000 | 
| 75% | 2.750000 | 17.125000 | 
| max | 3.000000 | 20.000000 | 
- Write a Pandas program to get the first 3 rows of a given DataFrame.
df2.head(3)
| attempts | name | qualify | score | |
|---|---|---|---|---|
| a | 1 | Anastasia | yes | 12.5 | 
| b | 3 | Dima | no | 9.0 | 
| c | 2 | Katherine | yes | 16.5 | 
- 
    - Write a Pandas program to select the ‘name’ and ‘score’ columns from the above DataFrame.
 
df2[['name','score']]
| name | score | |
|---|---|---|
| a | Anastasia | 12.5 | 
| b | Dima | 9.0 | 
| c | Katherine | 16.5 | 
| d | James | NaN | 
| e | Emily | 9.0 | 
| f | Michael | 20.0 | 
| g | Matthew | 14.5 | 
| h | Laura | NaN | 
| i | Kevin | 8.0 | 
| j | Jonas | 19.0 | 
- Write a Pandas program to select the specified columns and rows from a given data frame.
df2[['name','score']].iloc[[1,3,5,6],:]
| name | score | |
|---|---|---|
| b | Dima | 9.0 | 
| d | James | NaN | 
| f | Michael | 20.0 | 
| g | Matthew | 14.5 | 
- Write a Pandas program to select the rows where the number of attempts in the examination is greater than 2.
df2[df2.attempts>2]
| attempts | name | qualify | score | |
|---|---|---|---|---|
| b | 3 | Dima | no | 9.0 | 
| d | 3 | James | no | NaN | 
| f | 3 | Michael | yes | 20.0 | 
- Write a Pandas program to count the number of rows and columns of a DataFrame.
df2.columns
Index(['attempts', 'name', 'qualify', 'score'], dtype='object')
print("Number of rows : {} \nNumber of columns : {}".format(df2.shape[0],df2.shape[1]))
Number of rows : 10
Number of columns : 4
print("Number of rows : {} \nNumber of columns : {}".format(len(df2),len(df2.columns)))
Number of rows : 10
Number of columns : 4
- Write a Pandas program to select the rows where the score is missing, i.e. is NaN.
df2[df2.score.isna()]
| attempts | name | qualify | score | |
|---|---|---|---|---|
| d | 3 | James | no | NaN | 
| h | 1 | Laura | no | NaN | 
- Write a Pandas program to select the rows the score is between 15 and 20 (inclusive).
df2[(df2.score>=15) & (df2.score<=20)]
| attempts | name | qualify | score | |
|---|---|---|---|---|
| c | 2 | Katherine | yes | 16.5 | 
| f | 3 | Michael | yes | 20.0 | 
| j | 1 | Jonas | yes | 19.0 | 
- Write a Pandas program to select the rows where number of attempts in the examination is less than 2 and score greater than 15.
df2[(df2.attempts<2) & (df2.score>15)]
| attempts | name | qualify | score | |
|---|---|---|---|---|
| j | 1 | Jonas | yes | 19.0 | 
- Write a Pandas program to change the score in row ‘d’ to 11.5.
df12=df2.copy()
df12.loc['d','score']=11.5
df12.loc['d']
attempts        3
name        James
qualify        no
score        11.5
Name: d, dtype: object
- Write a Pandas program to calculate the sum of the examination attempts by the students.
df2.attempts.sum()
19
- Write a Pandas program to calculate the mean score for each different student in DataFrame.
df2.score.mean()
13.5625
- Write a Pandas program to append a new row ‘k’ to data frame with given values for each column. Now delete the new row and return the original DataFrame.
df15=df2.copy()
df15.loc['k']=[1,'Suresh','yes',15.5]
df15
| attempts | name | qualify | score | |
|---|---|---|---|---|
| a | 1 | Anastasia | yes | 12.5 | 
| b | 3 | Dima | no | 9.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| d | 3 | James | no | NaN | 
| e | 2 | Emily | no | 9.0 | 
| f | 3 | Michael | yes | 20.0 | 
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | NaN | 
| i | 2 | Kevin | no | 8.0 | 
| j | 1 | Jonas | yes | 19.0 | 
| k | 1 | Suresh | yes | 15.5 | 
- Write a Pandas program to sort the DataFrame first by ‘name’ in descending order, then by ‘score’ in ascending order.
df2.sort_values(['name','score'],ascending=[False,True])
| attempts | name | qualify | score | |
|---|---|---|---|---|
| f | 3 | Michael | yes | 20.0 | 
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | NaN | 
| i | 2 | Kevin | no | 8.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| j | 1 | Jonas | yes | 19.0 | 
| d | 3 | James | no | NaN | 
| e | 2 | Emily | no | 9.0 | 
| b | 3 | Dima | no | 9.0 | 
| a | 1 | Anastasia | yes | 12.5 | 
- Write a Pandas program to replace the ‘qualify’ column contains the values ‘yes’ and ‘no’ with True and False.
df17=df2.copy()
df17.qualify=df17.qualify.apply(lambda x:True if x=='yes' else False)
df17
| attempts | name | qualify | score | |
|---|---|---|---|---|
| a | 1 | Anastasia | True | 12.5 | 
| b | 3 | Dima | False | 9.0 | 
| c | 2 | Katherine | True | 16.5 | 
| d | 3 | James | False | NaN | 
| e | 2 | Emily | False | 9.0 | 
| f | 3 | Michael | True | 20.0 | 
| g | 1 | Matthew | True | 14.5 | 
| h | 1 | Laura | False | NaN | 
| i | 2 | Kevin | False | 8.0 | 
| j | 1 | Jonas | True | 19.0 | 
- Write a Pandas program to change the name ‘James’ to ‘Suresh’ in name column of the DataFrame.
df18=df2.copy()
df18.name=df18.name.replace('James','Suresh')
df18
| attempts | name | qualify | score | |
|---|---|---|---|---|
| a | 1 | Anastasia | yes | 12.5 | 
| b | 3 | Dima | no | 9.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| d | 3 | Suresh | no | NaN | 
| e | 2 | Emily | no | 9.0 | 
| f | 3 | Michael | yes | 20.0 | 
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | NaN | 
| i | 2 | Kevin | no | 8.0 | 
| j | 1 | Jonas | yes | 19.0 | 
- Write a Pandas program to delete the ‘attempts’ column from the DataFrame.
df19=df2.copy()
df19.drop('attempts',axis=1)
| name | qualify | score | |
|---|---|---|---|
| a | Anastasia | yes | 12.5 | 
| b | Dima | no | 9.0 | 
| c | Katherine | yes | 16.5 | 
| d | James | no | NaN | 
| e | Emily | no | 9.0 | 
| f | Michael | yes | 20.0 | 
| g | Matthew | yes | 14.5 | 
| h | Laura | no | NaN | 
| i | Kevin | no | 8.0 | 
| j | Jonas | yes | 19.0 | 
- Write a Pandas program to insert a new column in existing DataFrame.
df20=df2.copy()
color = ['Red','Blue','Orange','Red','White','White','Blue','Green','Green','Red']
df20['color']=color
df20
| attempts | name | qualify | score | color | |
|---|---|---|---|---|---|
| a | 1 | Anastasia | yes | 12.5 | Red | 
| b | 3 | Dima | no | 9.0 | Blue | 
| c | 2 | Katherine | yes | 16.5 | Orange | 
| d | 3 | James | no | NaN | Red | 
| e | 2 | Emily | no | 9.0 | White | 
| f | 3 | Michael | yes | 20.0 | White | 
| g | 1 | Matthew | yes | 14.5 | Blue | 
| h | 1 | Laura | no | NaN | Green | 
| i | 2 | Kevin | no | 8.0 | Green | 
| j | 1 | Jonas | yes | 19.0 | Red | 
- Write a Pandas program to iterate over rows in a DataFrame.
exam_data = [{'name':'Anastasia', 'score':12.5}, {'name':'Dima','score':9}, {'name':'Katherine','score':16.5}]
df21=pd.DataFrame(exam_data)
for index,row in df21.iterrows():
    print(row['name'],row['score'])
Anastasia 12.5
Dima 9.0
Katherine 16.5
- Write a Pandas program to get list from DataFrame column headers.
df2.columns.tolist()
['attempts', 'name', 'qualify', 'score']
- Write a Pandas program to rename columns of a given DataFrame.
df23= pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]})
df23
| col1 | col2 | col3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 2 | 5 | 8 | 
| 2 | 3 | 6 | 9 | 
df23.columns=['Column1','Column2','Column3']
df23
| Column1 | Column2 | Column3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 2 | 5 | 8 | 
| 2 | 3 | 6 | 9 | 
df23=df23.rename(columns={'Column1':'Columns1'})
df23
| Columns1 | Column2 | Column3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 2 | 5 | 8 | 
| 2 | 3 | 6 | 9 | 
- Write a Pandas program to select rows from a given DataFrame based on values in some columns.
df24= pd.DataFrame({'col1': [1, 4,3,4,5], 'col2': [4, 5, 6,7,8], 'col3': [7, 8, 9,0,1]})
df24[df24.col1==4]
| col1 | col2 | col3 | |
|---|---|---|---|
| 1 | 4 | 5 | 8 | 
| 3 | 4 | 7 | 0 | 
- Write a Pandas program to change the order of a DataFrame columns.
df25=df24[['col3','col2','col1']]
df25
| col3 | col2 | col1 | |
|---|---|---|---|
| 0 | 7 | 4 | 1 | 
| 1 | 8 | 5 | 4 | 
| 2 | 9 | 6 | 3 | 
| 3 | 0 | 7 | 4 | 
| 4 | 1 | 8 | 5 | 
- Write a Pandas program to add one row in an existing DataFrame.
df26=df24.copy()
df26.loc[len(df26)]=[10,11,12]
df26
| col1 | col2 | col3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 4 | 5 | 8 | 
| 2 | 3 | 6 | 9 | 
| 3 | 4 | 7 | 0 | 
| 4 | 5 | 8 | 1 | 
| 5 | 10 | 11 | 12 | 
- Write a Pandas program to write a DataFrame to CSV file using tab separator.
df24.to_csv('27.csv',sep='\t',index=False)
df27=pd.read_csv('27.csv')
df27
| col1\tcol2\tcol3 | |
|---|---|
| 0 | 1\t4\t7 | 
| 1 | 4\t5\t8 | 
| 2 | 3\t6\t9 | 
| 3 | 4\t7\t0 | 
| 4 | 5\t8\t1 | 
- Write a Pandas program to count city wise number of people from a given of data set (city, name of the person).
df28 = pd.DataFrame({'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'city': ['California', 'Los Angeles', 'California', 'California', 'California', 'Los Angeles', 'Los Angeles', 'Georgia', 'Georgia', 'Los Angeles']})
df28
| name | city | |
|---|---|---|
| 0 | Anastasia | California | 
| 1 | Dima | Los Angeles | 
| 2 | Katherine | California | 
| 3 | James | California | 
| 4 | Emily | California | 
| 5 | Michael | Los Angeles | 
| 6 | Matthew | Los Angeles | 
| 7 | Laura | Georgia | 
| 8 | Kevin | Georgia | 
| 9 | Jonas | Los Angeles | 
df28.groupby('city').size().reset_index(name='Number of people')
| city | Number of people | |
|---|---|---|
| 0 | California | 4 | 
| 1 | Georgia | 2 | 
| 2 | Los Angeles | 4 | 
- Write a Pandas program to delete DataFrame row(s) based on given column value.
df29=df24.copy()
df29=df29[df29.col2!=5]
df29
| col1 | col2 | col3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 2 | 3 | 6 | 9 | 
| 3 | 4 | 7 | 0 | 
| 4 | 5 | 8 | 1 | 
- Write a Pandas program to widen output display to see more columns.
pd.set_option('display.max_rows',500)
pd.set_option('display.max_columns',500)
pd.set_option('display.width',1000)
print(df24)
   col1  col2  col3
0     1     4     7
1     4     5     8
2     3     6     9
3     4     7     0
4     5     8     1
- Write a Pandas program to select a row of series/dataframe by given integer index.
df24.loc[[2]]
| col1 | col2 | col3 | |
|---|---|---|---|
| 2 | 3 | 6 | 9 | 
- Write a Pandas program to replace all the NaN values with Zero’s in a column of a dataframe.
df32=df2.copy()
df32.fillna(0)
| attempts | name | qualify | score | |
|---|---|---|---|---|
| a | 1 | Anastasia | yes | 12.5 | 
| b | 3 | Dima | no | 9.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| d | 3 | James | no | 0.0 | 
| e | 2 | Emily | no | 9.0 | 
| f | 3 | Michael | yes | 20.0 | 
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | 0.0 | 
| i | 2 | Kevin | no | 8.0 | 
| j | 1 | Jonas | yes | 19.0 | 
- Write a Pandas program to convert index in a column of the given dataframe.
df33=df2.copy()
df33.reset_index()
| index | attempts | name | qualify | score | |
|---|---|---|---|---|---|
| 0 | a | 1 | Anastasia | yes | 12.5 | 
| 1 | b | 3 | Dima | no | 9.0 | 
| 2 | c | 2 | Katherine | yes | 16.5 | 
| 3 | d | 3 | James | no | NaN | 
| 4 | e | 2 | Emily | no | 9.0 | 
| 5 | f | 3 | Michael | yes | 20.0 | 
| 6 | g | 1 | Matthew | yes | 14.5 | 
| 7 | h | 1 | Laura | no | NaN | 
| 8 | i | 2 | Kevin | no | 8.0 | 
| 9 | j | 1 | Jonas | yes | 19.0 | 
print(df33.reset_index().to_string(index=False))
index  attempts       name qualify  score
    a         1  Anastasia     yes   12.5
    b         3       Dima      no    9.0
    c         2  Katherine     yes   16.5
    d         3      James      no    NaN
    e         2      Emily      no    9.0
    f         3    Michael     yes   20.0
    g         1    Matthew     yes   14.5
    h         1      Laura      no    NaN
    i         2      Kevin      no    8.0
    j         1      Jonas     yes   19.0
#Back to original be setting index once again
df33.reset_index().set_index('index')
| attempts | name | qualify | score | |
|---|---|---|---|---|
| index | ||||
| a | 1 | Anastasia | yes | 12.5 | 
| b | 3 | Dima | no | 9.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| d | 3 | James | no | NaN | 
| e | 2 | Emily | no | 9.0 | 
| f | 3 | Michael | yes | 20.0 | 
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | NaN | 
| i | 2 | Kevin | no | 8.0 | 
| j | 1 | Jonas | yes | 19.0 | 
- Write a Pandas program to set a given value for particular cell in DataFrame using index value.
df2
| attempts | name | qualify | score | |
|---|---|---|---|---|
| a | 1 | Anastasia | yes | 12.5 | 
| b | 3 | Dima | no | 9.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| d | 3 | James | no | NaN | 
| e | 2 | Emily | no | 9.0 | 
| f | 3 | Michael | yes | 20.0 | 
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | NaN | 
| i | 2 | Kevin | no | 8.0 | 
| j | 1 | Jonas | yes | 19.0 | 
df34=df2.copy()
df34.loc['i',:].score=10.2
C:\Users\a112471\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py:5208: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value
df34
| attempts | name | qualify | score | |
|---|---|---|---|---|
| a | 1 | Anastasia | yes | 12.5 | 
| b | 3 | Dima | no | 9.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| d | 3 | James | no | NaN | 
| e | 2 | Emily | no | 9.0 | 
| f | 3 | Michael | yes | 20.0 | 
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | NaN | 
| i | 2 | Kevin | no | 8.0 | 
| j | 1 | Jonas | yes | 19.0 | 
df34.set_value('i','score',10.2)
C:\Users\a112471\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: set_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead
  """Entry point for launching an IPython kernel.
| attempts | name | qualify | score | |
|---|---|---|---|---|
| a | 1 | Anastasia | yes | 12.5 | 
| b | 3 | Dima | no | 9.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| d | 3 | James | no | NaN | 
| e | 2 | Emily | no | 9.0 | 
| f | 3 | Michael | yes | 20.0 | 
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | NaN | 
| i | 2 | Kevin | no | 10.2 | 
| j | 1 | Jonas | yes | 19.0 | 
df34.at['i','score']=10.2
df34
| attempts | name | qualify | score | |
|---|---|---|---|---|
| a | 1 | Anastasia | yes | 12.5 | 
| b | 3 | Dima | no | 9.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| d | 3 | James | no | NaN | 
| e | 2 | Emily | no | 9.0 | 
| f | 3 | Michael | yes | 20.0 | 
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | NaN | 
| i | 2 | Kevin | no | 10.2 | 
| j | 1 | Jonas | yes | 19.0 | 
df34.iat[8,3]
10.2
- Write a Pandas program to count the NaN values in one or more columns in DataFrame.
df2.isna().sum().sum()
2
- Write a Pandas program to drop a list of rows from a specified DataFrame.
df36=df24.copy()
df36.drop([2,4],axis=0)
| col1 | col2 | col3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 4 | 5 | 8 | 
| 3 | 4 | 7 | 0 | 
- Write a Pandas program to reset index in a given DataFrame.
df37=df2.copy()
df37.drop(['a','b']).reset_index()
| index | attempts | name | qualify | score | |
|---|---|---|---|---|---|
| 0 | c | 2 | Katherine | yes | 16.5 | 
| 1 | d | 3 | James | no | NaN | 
| 2 | e | 2 | Emily | no | 9.0 | 
| 3 | f | 3 | Michael | yes | 20.0 | 
| 4 | g | 1 | Matthew | yes | 14.5 | 
| 5 | h | 1 | Laura | no | NaN | 
| 6 | i | 2 | Kevin | no | 8.0 | 
| 7 | j | 1 | Jonas | yes | 19.0 | 
df37
| attempts | name | qualify | score | |
|---|---|---|---|---|
| a | 1 | Anastasia | yes | 12.5 | 
| b | 3 | Dima | no | 9.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| d | 3 | James | no | NaN | 
| e | 2 | Emily | no | 9.0 | 
| f | 3 | Michael | yes | 20.0 | 
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | NaN | 
| i | 2 | Kevin | no | 8.0 | 
| j | 1 | Jonas | yes | 19.0 | 
- Write a Pandas program to divide a DataFrame in a given ratio.
df38=pd.DataFrame(np.random.randn(10,2))
df38
| 0 | 1 | |
|---|---|---|
| 0 | 1.248443 | -1.241437 | 
| 1 | 1.350421 | -1.539320 | 
| 2 | -0.118075 | 1.586380 | 
| 3 | 0.126995 | 1.419173 | 
| 4 | -0.062786 | 0.070672 | 
| 5 | -0.077202 | 0.442252 | 
| 6 | -0.547482 | 0.472672 | 
| 7 | -0.154437 | -0.138206 | 
| 8 | 0.634787 | 1.480123 | 
| 9 | -0.029771 | 0.834099 | 
df38_70per=df38.sample(frac=.7,random_state=0)
df38_30per=df38.drop(df38_70per.index)
df38_70per
| 0 | 1 | |
|---|---|---|
| 2 | -0.118075 | 1.586380 | 
| 8 | 0.634787 | 1.480123 | 
| 4 | -0.062786 | 0.070672 | 
| 9 | -0.029771 | 0.834099 | 
| 1 | 1.350421 | -1.539320 | 
| 6 | -0.547482 | 0.472672 | 
| 7 | -0.154437 | -0.138206 | 
df38_30per
| 0 | 1 | |
|---|---|---|
| 0 | 1.248443 | -1.241437 | 
| 3 | 0.126995 | 1.419173 | 
| 5 | -0.077202 | 0.442252 | 
- Write a Pandas program to combining two series into a DataFrame.
s39_1 = pd.Series(['100', '200', 'python', '300.12', '400'])
s39_2 = pd.Series(['10', '20', 'php', '30.12', '40'])
pd.concat([s39_1,s39_2],axis=1)
| 0 | 1 | |
|---|---|---|
| 0 | 100 | 10 | 
| 1 | 200 | 20 | 
| 2 | python | php | 
| 3 | 300.12 | 30.12 | 
| 4 | 400 | 40 | 
- Write a Pandas program to shuffle a given DataFrame rows.
df40=df2.copy()
df40.sample(frac=1)
| attempts | name | qualify | score | |
|---|---|---|---|---|
| g | 1 | Matthew | yes | 14.5 | 
| f | 3 | Michael | yes | 20.0 | 
| d | 3 | James | no | NaN | 
| a | 1 | Anastasia | yes | 12.5 | 
| e | 2 | Emily | no | 9.0 | 
| h | 1 | Laura | no | NaN | 
| i | 2 | Kevin | no | 8.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| j | 1 | Jonas | yes | 19.0 | 
| b | 3 | Dima | no | 9.0 | 
df40
| attempts | name | qualify | score | |
|---|---|---|---|---|
| a | 1 | Anastasia | yes | 12.5 | 
| b | 3 | Dima | no | 9.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| d | 3 | James | no | NaN | 
| e | 2 | Emily | no | 9.0 | 
| f | 3 | Michael | yes | 20.0 | 
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | NaN | 
| i | 2 | Kevin | no | 8.0 | 
| j | 1 | Jonas | yes | 19.0 | 
- Write a Pandas program to convert DataFrame column type from string to datetime.
s41 = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'])
s41
0    3/11/2000
1    3/12/2000
2    3/13/2000
dtype: object
pd.to_datetime(s41)
0   2000-03-11
1   2000-03-12
2   2000-03-13
dtype: datetime64[ns]
- Write a Pandas program to rename a specific column name in a given DataFrame.
df42= pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]})
df42.rename(columns={'col2':'column2'})
| col1 | column2 | col3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 2 | 5 | 8 | 
| 2 | 3 | 6 | 9 | 
- Write a Pandas program to get a list of a specified column of a DataFrame.
df42.col2.tolist()
[4, 5, 6]
- Write a Pandas program to create a DataFrame from a Numpy array and specify the index column and column headers.
pd.DataFrame(np.random.randn(30).reshape(10,3),index=np.linspace(0,9,10,dtype='int'),columns=['column1','columns2','columns3'])
| column1 | columns2 | columns3 | |
|---|---|---|---|
| 0 | -0.345403 | 2.167887 | 0.495698 | 
| 1 | 1.146200 | -1.078212 | 0.231268 | 
| 2 | -0.619157 | 1.287958 | 0.216466 | 
| 3 | -0.859700 | -0.735332 | 0.437529 | 
| 4 | 0.395431 | 1.002545 | -0.999019 | 
| 5 | -0.474449 | -0.106919 | -2.089148 | 
| 6 | 0.375528 | 0.173526 | -1.103849 | 
| 7 | -1.468913 | 1.206882 | -0.983365 | 
| 8 | -0.751145 | -0.289252 | -0.346848 | 
| 9 | 0.923659 | 1.038505 | -1.394882 | 
- Write a Pandas program to find the row for where the value of a given column is maximum.
df45 = pd.DataFrame({'col1': [1, 2, 3, 4, 7], 'col2': [4, 5, 6, 9, 5], 'col3': [7, 8, 12, 1, 11]})
df45
| col1 | col2 | col3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 2 | 5 | 8 | 
| 2 | 3 | 6 | 12 | 
| 3 | 4 | 9 | 1 | 
| 4 | 7 | 5 | 11 | 
#Solution 1 (not efficient)
df45[df45.col1==df45.col1.max()].index[0]
4
df45['col1'].idxmax()
4
df45['col2'].idxmax()
3
df45['col3'].idxmax()
2
- Write a Pandas program to check whether a given column is present in a DataFrame or not.
col='Col4'
if col in df45.columns.tolist():
    print(col,'present in DataFrame')
else:
    print(col,'not present in DataFrame')
Col4 not present in DataFrame
- Write a Pandas program to get the specified row value of a given DataFrame.
df45.iloc[0]
col1    1
col2    4
col3    7
Name: 0, dtype: int64
- Write a Pandas program to get the datatypes of columns of a DataFrame.
df2.dtypes
attempts      int64
name         object
qualify      object
score       float64
dtype: object
- Write a Pandas program to append data to an empty DataFrame.
df49=pd.DataFrame()
d49=pd.DataFrame({'col1':range(3),'col2':range(3)})
df49.append(d49)
| col1 | col2 | |
|---|---|---|
| 0 | 0 | 0 | 
| 1 | 1 | 1 | 
| 2 | 2 | 2 | 
- Write a Pandas program to sort a given DataFrame by two or more columns.
df50=df2.copy()
df50.sort_values(['attempts','name'],ascending=[True,False])
| attempts | name | qualify | score | |
|---|---|---|---|---|
| g | 1 | Matthew | yes | 14.5 | 
| h | 1 | Laura | no | NaN | 
| j | 1 | Jonas | yes | 19.0 | 
| a | 1 | Anastasia | yes | 12.5 | 
| i | 2 | Kevin | no | 8.0 | 
| c | 2 | Katherine | yes | 16.5 | 
| e | 2 | Emily | no | 9.0 | 
| f | 3 | Michael | yes | 20.0 | 
| d | 3 | James | no | NaN | 
| b | 3 | Dima | no | 9.0 | 
- Write a Pandas program to convert the datatype of a given column (floats to ints).
df51=df2.copy()
df51.dtypes
attempts      int64
name         object
qualify      object
score       float64
dtype: object
df51.fillna(0).astype({'score':int}).dtypes
attempts     int64
name        object
qualify     object
score        int32
dtype: object
- Write a Pandas program to remove infinite values from a given DataFrame.
df52 = pd.DataFrame([1000, 2000, 3000, -4000, np.inf, -np.inf])
df52
| 0 | |
|---|---|
| 0 | 1000.0 | 
| 1 | 2000.0 | 
| 2 | 3000.0 | 
| 3 | -4000.0 | 
| 4 | inf | 
| 5 | -inf | 
df52.replace([np.inf,-np.inf],np.NaN)
| 0 | |
|---|---|
| 0 | 1000.0 | 
| 1 | 2000.0 | 
| 2 | 3000.0 | 
| 3 | -4000.0 | 
| 4 | NaN | 
| 5 | NaN | 
- Write a Pandas program to insert a given column at a specific column index in a DataFrame.
df53 = pd.DataFrame({'col2': [4, 5, 6, 9, 5], 'col3': [7, 8, 12, 1, 11]})
df53
| col2 | col3 | |
|---|---|---|
| 0 | 4 | 7 | 
| 1 | 5 | 8 | 
| 2 | 6 | 12 | 
| 3 | 9 | 1 | 
| 4 | 5 | 11 | 
col1= [1, 2, 3, 4, 7]
idx=0
df53.insert(loc=idx,column='col1',value=col1)
df53
| col1 | col2 | col3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 2 | 5 | 8 | 
| 2 | 3 | 6 | 12 | 
| 3 | 4 | 9 | 1 | 
| 4 | 7 | 5 | 11 | 
- Write a Pandas program to convert a given list of lists into a Dataframe.
d54=[[2, 4], [1, 3]]
pd.DataFrame(d54,columns=['col1','col2'])
| col1 | col2 | |
|---|---|---|
| 0 | 2 | 4 | 
| 1 | 1 | 3 | 
- Write a Pandas program to group by the first column and get second column as lists in rows.
df55=pd.DataFrame({'col1':['C1','C1','C2','C2','C2','C3','C2'],'col2':[1,2,3,3,4,6,5]})
g55=df55.groupby('col1')['col2']
#Wrong Solution (or not gettring desired result)
for ele in g55:
    print(ele[1])
0    1
1    2
Name: col2, dtype: int64
2    3
3    3
4    4
6    5
Name: col2, dtype: int64
5    6
Name: col2, dtype: int64
#Correct Solution
df55.groupby('col1')['col2'].apply(list)
col1
C1          [1, 2]
C2    [3, 3, 4, 5]
C3             [6]
Name: col2, dtype: object
- Write a Pandas program to get column index from column name of a given DataFrame.
df45.columns.tolist().index('col2')
1
df45.columns.get_loc('col2')
1
- Write a Pandas program to count number of columns of a DataFrame.
df45.shape[1]
3
len(df45.columns)
3
- Write a Pandas program to select all columns, except one given column in a DataFrame.
#Solution 1
df45.drop('col3',axis=1)
| col1 | col2 | |
|---|---|---|
| 0 | 1 | 4 | 
| 1 | 2 | 5 | 
| 2 | 3 | 6 | 
| 3 | 4 | 9 | 
| 4 | 7 | 5 | 
#Solution 2
df45.loc[:,df45.columns!='col3']
| col1 | col2 | |
|---|---|---|
| 0 | 1 | 4 | 
| 1 | 2 | 5 | 
| 2 | 3 | 6 | 
| 3 | 4 | 9 | 
| 4 | 7 | 5 | 
- Write a Pandas program to get first n records of a DataFrame.
df45.head(3)
| col1 | col2 | col3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 2 | 5 | 8 | 
| 2 | 3 | 6 | 12 | 
- Write a Pandas program to get last n records of a DataFrame.
df45.iloc[-3:,:]
| col1 | col2 | col3 | |
|---|---|---|---|
| 2 | 3 | 6 | 12 | 
| 3 | 4 | 9 | 1 | 
| 4 | 7 | 5 | 11 | 
df45.tail(3)
| col1 | col2 | col3 | |
|---|---|---|---|
| 2 | 3 | 6 | 12 | 
| 3 | 4 | 9 | 1 | 
| 4 | 7 | 5 | 11 | 
- Write a Pandas program to get topmost n records within each group of a DataFrame.
df61 = pd.DataFrame({'col1': [1, 2, 3, 4, 7, 11], 'col2': [4, 5, 6, 9, 5, 0], 'col3': [7, 5, 8, 12, 1,11]})
df61
| col1 | col2 | col3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 2 | 5 | 5 | 
| 2 | 3 | 6 | 8 | 
| 3 | 4 | 9 | 12 | 
| 4 | 7 | 5 | 1 | 
| 5 | 11 | 0 | 11 | 
df61.nlargest(3,'col1')
| col1 | col2 | col3 | |
|---|---|---|---|
| 5 | 11 | 0 | 11 | 
| 4 | 7 | 5 | 1 | 
| 3 | 4 | 9 | 12 | 
df61.nlargest(3,'col2')
| col1 | col2 | col3 | |
|---|---|---|---|
| 3 | 4 | 9 | 12 | 
| 2 | 3 | 6 | 8 | 
| 1 | 2 | 5 | 5 | 
df61.nlargest(3,'col3')
| col1 | col2 | col3 | |
|---|---|---|---|
| 3 | 4 | 9 | 12 | 
| 5 | 11 | 0 | 11 | 
| 2 | 3 | 6 | 8 | 
- Write a Pandas program to remove first n rows of a given DataFrame.
df62=df61.copy()
df62
| col1 | col2 | col3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 2 | 5 | 5 | 
| 2 | 3 | 6 | 8 | 
| 3 | 4 | 9 | 12 | 
| 4 | 7 | 5 | 1 | 
| 5 | 11 | 0 | 11 | 
df62=df62.tail(-3)
df62
| col1 | col2 | col3 | |
|---|---|---|---|
| 3 | 4 | 9 | 12 | 
| 4 | 7 | 5 | 1 | 
| 5 | 11 | 0 | 11 | 
- Write a Pandas program to remove last n rows of a given DataFrame.
df63=df61.copy()
df63=df63.iloc[:-3,:]
df63
| col1 | col2 | col3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 2 | 5 | 5 | 
| 2 | 3 | 6 | 8 | 
- Write a Pandas program to add a prefix or suffix to all columns of a given DataFrame.
df64 = pd.DataFrame({'W':[68,75,86,80,66],'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});
df64_1=df64.copy()
df64_1.columns='A_'+df64.columns
df64_1
| A_W | A_X | A_Y | A_Z | |
|---|---|---|---|---|
| 0 | 68 | 78 | 84 | 86 | 
| 1 | 75 | 85 | 94 | 97 | 
| 2 | 86 | 96 | 89 | 96 | 
| 3 | 80 | 80 | 83 | 72 | 
| 4 | 66 | 86 | 86 | 83 | 
df64_2=df64.copy()
df64_2.columns=df64.columns+'_1'
df64_2
| W_1 | X_1 | Y_1 | Z_1 | |
|---|---|---|---|---|
| 0 | 68 | 78 | 84 | 86 | 
| 1 | 75 | 85 | 94 | 97 | 
| 2 | 86 | 96 | 89 | 96 | 
| 3 | 80 | 80 | 83 | 72 | 
| 4 | 66 | 86 | 86 | 83 | 
#sol2
df64.add_prefix('A_')
| A_W | A_X | A_Y | A_Z | |
|---|---|---|---|---|
| 0 | 68 | 78 | 84 | 86 | 
| 1 | 75 | 85 | 94 | 97 | 
| 2 | 86 | 96 | 89 | 96 | 
| 3 | 80 | 80 | 83 | 72 | 
| 4 | 66 | 86 | 86 | 83 | 
df64.add_suffix('_1')
| W_1 | X_1 | Y_1 | Z_1 | |
|---|---|---|---|---|
| 0 | 68 | 78 | 84 | 86 | 
| 1 | 75 | 85 | 94 | 97 | 
| 2 | 86 | 96 | 89 | 96 | 
| 3 | 80 | 80 | 83 | 72 | 
| 4 | 66 | 86 | 86 | 83 | 
- Write a Pandas program to reverse order (rows, columns) of a given DataFrame.
df64[df64.columns[::-1]]
#df64.iloc[:,::-1] same answer
| Z | Y | X | W | |
|---|---|---|---|---|
| 0 | 86 | 84 | 78 | 68 | 
| 1 | 97 | 94 | 85 | 75 | 
| 2 | 96 | 89 | 96 | 86 | 
| 3 | 72 | 83 | 80 | 80 | 
| 4 | 83 | 86 | 86 | 66 | 
df64.iloc[::-1]
| W | X | Y | Z | |
|---|---|---|---|---|
| 4 | 66 | 86 | 86 | 83 | 
| 3 | 80 | 80 | 83 | 72 | 
| 2 | 86 | 96 | 89 | 96 | 
| 1 | 75 | 85 | 94 | 97 | 
| 0 | 68 | 78 | 84 | 86 | 
print("Reverse row order and reset index:")
df64.iloc[::-1].reset_index(drop=True)
Reverse row order and reset index:
| W | X | Y | Z | |
|---|---|---|---|---|
| 0 | 66 | 86 | 86 | 83 | 
| 1 | 80 | 80 | 83 | 72 | 
| 2 | 86 | 96 | 89 | 96 | 
| 3 | 75 | 85 | 94 | 97 | 
| 4 | 68 | 78 | 84 | 86 | 
- Write a Pandas program to select columns by data type of a given DataFrame.
df66 = pd.DataFrame({
    'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
    'date_of_birth': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'age': [18.5, 21.2, 22.5, 22, 23]
})
df66
| name | date_of_birth | age | |
|---|---|---|---|
| 0 | Alberto Franco | 17/05/2002 | 18.5 | 
| 1 | Gino Mcneill | 16/02/1999 | 21.2 | 
| 2 | Ryan Parkes | 25/09/1998 | 22.5 | 
| 3 | Eesha Hinton | 11/05/2002 | 22.0 | 
| 4 | Syed Wharton | 15/09/1997 | 23.0 | 
df66.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
name             5 non-null object
date_of_birth    5 non-null object
age              5 non-null float64
dtypes: float64(1), object(2)
memory usage: 248.0+ bytes
df66.select_dtypes(include='number')
| age | |
|---|---|
| 0 | 18.5 | 
| 1 | 21.2 | 
| 2 | 22.5 | 
| 3 | 22.0 | 
| 4 | 23.0 | 
df66.select_dtypes(include='object')
| name | date_of_birth | |
|---|---|---|
| 0 | Alberto Franco | 17/05/2002 | 
| 1 | Gino Mcneill | 16/02/1999 | 
| 2 | Ryan Parkes | 25/09/1998 | 
| 3 | Eesha Hinton | 11/05/2002 | 
| 4 | Syed Wharton | 15/09/1997 | 
- Write a Pandas program to split a given DataFrame into two random subsets.
df67=df66.copy()
df67_1=df67.sample(frac=0.7,random_state=0)
df67_2=df67.drop(df67_1.index)
df67_1
| name | date_of_birth | age | |
|---|---|---|---|
| 2 | Ryan Parkes | 25/09/1998 | 22.5 | 
| 0 | Alberto Franco | 17/05/2002 | 18.5 | 
| 1 | Gino Mcneill | 16/02/1999 | 21.2 | 
| 3 | Eesha Hinton | 11/05/2002 | 22.0 | 
df67_2
| name | date_of_birth | age | |
|---|---|---|---|
| 4 | Syed Wharton | 15/09/1997 | 23.0 | 
- Write a Pandas program to rename all columns with the same pattern of a given DataFrame.
df68 = pd.DataFrame({
    'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
    'Date_of_Birth': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'Age': [18.5, 21.2, 22.5, 22, 23]
})
df68
| Name | Date_of_Birth | Age | |
|---|---|---|---|
| 0 | Alberto Franco | 17/05/2002 | 18.5 | 
| 1 | Gino Mcneill | 16/02/1999 | 21.2 | 
| 2 | Ryan Parkes | 25/09/1998 | 22.5 | 
| 3 | Eesha Hinton | 11/05/2002 | 22.0 | 
| 4 | Syed Wharton | 15/09/1997 | 23.0 | 
df68.columns=df68.columns.str.lower().str.strip()
df68
| name | date_of_birth | age | |
|---|---|---|---|
| 0 | Alberto Franco | 17/05/2002 | 18.5 | 
| 1 | Gino Mcneill | 16/02/1999 | 21.2 | 
| 2 | Ryan Parkes | 25/09/1998 | 22.5 | 
| 3 | Eesha Hinton | 11/05/2002 | 22.0 | 
| 4 | Syed Wharton | 15/09/1997 | 23.0 | 
- Write a Pandas program to merge datasets and check uniqueness.
df69 = pd.DataFrame({
    'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
    'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'Age': [18.5, 21.2, 22.5, 22, 23]
})
df69_1=df69.copy()
df69_1=df69_1.drop([0,1])
df69_1
| Name | Date_Of_Birth | Age | |
|---|---|---|---|
| 2 | Ryan Parkes | 25/09/1998 | 22.5 | 
| 3 | Eesha Hinton | 11/05/2002 | 22.0 | 
| 4 | Syed Wharton | 15/09/1997 | 23.0 | 
df69_2=df69.copy()
df69_2=df69_2.drop([2])
df69_2
| Name | Date_Of_Birth | Age | |
|---|---|---|---|
| 0 | Alberto Franco | 17/05/2002 | 18.5 | 
| 1 | Gino Mcneill | 16/02/1999 | 21.2 | 
| 3 | Eesha Hinton | 11/05/2002 | 22.0 | 
| 4 | Syed Wharton | 15/09/1997 | 23.0 | 
#one_to_one
pd.merge(df69_1,df69_2,validate='one_to_one')
| Name | Date_Of_Birth | Age | |
|---|---|---|---|
| 0 | Eesha Hinton | 11/05/2002 | 22.0 | 
| 1 | Syed Wharton | 15/09/1997 | 23.0 | 
#one_to_many
pd.merge(df69_1,df69_2,validate='one_to_many')
| Name | Date_Of_Birth | Age | |
|---|---|---|---|
| 0 | Eesha Hinton | 11/05/2002 | 22.0 | 
| 1 | Syed Wharton | 15/09/1997 | 23.0 | 
#many_to_one
pd.merge(df69_1,df69_2,validate='many_to_one')
| Name | Date_Of_Birth | Age | |
|---|---|---|---|
| 0 | Eesha Hinton | 11/05/2002 | 22.0 | 
| 1 | Syed Wharton | 15/09/1997 | 23.0 | 
#many_to_many
pd.merge(df69_1,df69_2,validate='many_to_many')
| Name | Date_Of_Birth | Age | |
|---|---|---|---|
| 0 | Eesha Hinton | 11/05/2002 | 22.0 | 
| 1 | Syed Wharton | 15/09/1997 | 23.0 | 
- Write a Pandas program to convert continuous values of a column in a given DataFrame to categorical.
df70 = pd.DataFrame({
    'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton', 'Kierra Gentry'],
      'age': [18, 22, 85, 50, 80, 5]
})
df70
| name | age | |
|---|---|---|
| 0 | Alberto Franco | 18 | 
| 1 | Gino Mcneill | 22 | 
| 2 | Ryan Parkes | 85 | 
| 3 | Eesha Hinton | 50 | 
| 4 | Syed Wharton | 80 | 
| 5 | Kierra Gentry | 5 | 
df70['age_group']=pd.cut(df70.age,bins=[0,18,65,95],labels=['kids','adult','elderly'])
df70
| name | age | age_group | |
|---|---|---|---|
| 0 | Alberto Franco | 18 | kids | 
| 1 | Gino Mcneill | 22 | adult | 
| 2 | Ryan Parkes | 85 | elderly | 
| 3 | Eesha Hinton | 50 | adult | 
| 4 | Syed Wharton | 80 | elderly | 
| 5 | Kierra Gentry | 5 | kids | 
- Write a Pandas program to display memory usage of a given DataFrame and every column of the DataFrame.
df70.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
name         6 non-null object
age          6 non-null int64
age_group    6 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 334.0+ bytes
df70.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
name         6 non-null object
age          6 non-null int64
age_group    6 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 903.0 bytes
df70.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
name         6 non-null object
age          6 non-null int64
age_group    6 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 903.0 bytes
df70.memory_usage(deep=True)
Index        128
name         416
age           48
age_group    311
dtype: int64
df70.memory_usage(deep=False)
Index        128
name          48
age           48
age_group    110
dtype: int64
- Write a Pandas program to combine many given series to create a DataFrame.
s72_1=pd.Series('php python jawa c# c++'.split())
s72_2=pd.Series([1,2,3,4,5])
pd.concat([s72_1,s72_2],axis=1)
| 0 | 1 | |
|---|---|---|
| 0 | php | 1 | 
| 1 | python | 2 | 
| 2 | jawa | 3 | 
| 3 | c# | 4 | 
| 4 | c++ | 5 | 
pd.DataFrame({'col1':s72_1,'col2':s72_2})
| col1 | col2 | |
|---|---|---|
| 0 | php | 1 | 
| 1 | python | 2 | 
| 2 | jawa | 3 | 
| 3 | c# | 4 | 
| 4 | c++ | 5 | 
- Write a Pandas program to create DataFrames that contains random values, contains missing values, contains datetime values and contains mixed values.
pd.DataFrame(np.random.randn(40).reshape(10,4),columns='A B C D'.split())
| A | B | C | D | |
|---|---|---|---|---|
| 0 | 0.976226 | -1.231195 | -1.927390 | 1.687527 | 
| 1 | 0.265455 | -0.872881 | -1.234580 | 0.025491 | 
| 2 | 2.309553 | -0.368772 | 1.281644 | 0.903505 | 
| 3 | -0.490682 | -0.934150 | 0.395868 | -0.614944 | 
| 4 | 1.270408 | 0.105810 | -0.934098 | -0.193174 | 
| 5 | -0.412797 | 0.123316 | 0.656035 | -0.801055 | 
| 6 | -0.627806 | -0.182581 | 0.200891 | -0.471121 | 
| 7 | -0.705074 | 0.416434 | 0.424676 | 0.495411 | 
| 8 | -0.157094 | -1.339826 | 1.379757 | 1.843449 | 
| 9 | 1.265529 | -0.211037 | 0.542715 | 1.862077 | 
pd.util.testing.makeDataFrame()
| A | B | C | D | |
|---|---|---|---|---|
| mIrUvYIpau | -0.132329 | 0.369235 | -2.518745 | 0.463750 | 
| IIuIlUG3TZ | 0.224135 | 1.173692 | -1.402806 | -0.634084 | 
| lhz7LziqT6 | -1.747848 | 1.112314 | -1.173867 | -1.275242 | 
| bDRGRAVZ5L | -1.125598 | 1.167309 | 0.119561 | -0.315563 | 
| tltm1jawR2 | -0.170738 | -0.020117 | 0.807885 | -1.399174 | 
| gyHU2hW6dO | 0.228735 | -0.685193 | -0.241533 | -1.230713 | 
| GH6l1o4kp8 | -0.683374 | -1.373173 | 1.563031 | -1.344920 | 
| Ee15AMPrrZ | -0.229806 | -1.190730 | 0.691445 | -0.614836 | 
| CVtTzdhYOH | -1.806060 | 0.126193 | 1.921333 | 0.655736 | 
| u7yGpt7ojr | 0.418048 | -0.560950 | -2.247065 | 2.067856 | 
| JQRGGsJRHG | -1.116554 | 0.614687 | -0.367641 | 0.160849 | 
| 6NTUAVqlrk | 0.622822 | 0.119354 | -1.052081 | -0.347517 | 
| 8XKVbhaoCO | 1.380594 | -0.907565 | -1.298444 | 0.627115 | 
| HSlwYeeFDf | 0.566009 | -0.185294 | 0.437134 | -1.943550 | 
| 441fk4pWhD | 0.214447 | -1.183068 | 1.761380 | -0.652720 | 
| VdiWlWttWB | 0.743793 | 1.722755 | 0.757516 | 0.615187 | 
| ls77KaEvOq | -1.344870 | 0.364236 | 0.093453 | -0.609153 | 
| 5K0bPzfMFi | -1.065251 | 0.841662 | 1.479194 | -0.731203 | 
| AXsuHNuwHo | -0.827250 | 1.965530 | 0.773863 | 0.665516 | 
| T0GlToPXZi | -0.320390 | -0.728882 | 2.018289 | -0.578019 | 
| 22vFAApOA5 | -0.494500 | 0.550730 | -1.067188 | 0.044000 | 
| gPiu7GA3im | -0.521648 | -0.856955 | 0.709654 | -0.932031 | 
| OZnBwP9gm9 | -0.181220 | -1.985956 | 0.794047 | -0.335244 | 
| 9U23Nv30l1 | 0.126521 | -0.269881 | -0.475025 | 0.873421 | 
| NLLYWOAAPo | -1.399080 | -0.614755 | -0.431522 | -1.314192 | 
| 0snfJwr8Zi | 1.104287 | -0.747049 | -0.494870 | -0.280476 | 
| hyIRHRTG6c | 0.447759 | 0.375540 | -1.348494 | -1.576017 | 
| nO4Ybp7ia5 | -0.665945 | 1.525348 | 0.291121 | 0.763574 | 
| OU81k9dvGD | 0.897210 | 0.587101 | 1.375282 | -0.842839 | 
| ktS4vr3Pqy | -0.270603 | -0.081540 | -1.243459 | 0.965349 | 
pd.util.testing.makeMissingDataframe()
| A | B | C | D | |
|---|---|---|---|---|
| K1jtgA2gtF | -1.142478 | 1.478004 | 0.777470 | 0.517304 | 
| Fz0OYcxMQn | -0.096315 | 0.070175 | 0.176539 | -0.511371 | 
| 7gLTJCMX3w | -0.236882 | 0.453078 | -0.588427 | 1.412213 | 
| Id0qucUCeB | NaN | -1.088763 | -0.637960 | 1.000012 | 
| MlJfZSTHJj | -0.732373 | 1.112463 | 0.441055 | 0.148622 | 
| p74iwf5CWk | 0.880165 | NaN | 0.319050 | -0.100719 | 
| k4wEIqvtKd | -1.076164 | -1.052221 | NaN | 0.817084 | 
| 6gdeN8A4gg | 1.160957 | 0.109959 | 0.587556 | 1.699185 | 
| c0GHnKP4vD | -1.127555 | 0.933663 | 0.527032 | -0.450704 | 
| 0yltZq25JS | -0.177030 | 0.225647 | 0.413793 | 1.491172 | 
| BAGqtJef0S | 0.356970 | 0.621645 | -1.182119 | -0.008877 | 
| iOriKpo3am | 0.734046 | -0.004511 | 0.921544 | 0.719030 | 
| Ogd3TaGzC0 | -0.076480 | -1.460346 | 0.026264 | -0.775999 | 
| XZp591LpQ7 | -0.998655 | NaN | 1.518520 | 0.453103 | 
| nvd2jdiBX5 | 1.592946 | -1.306997 | -0.168547 | NaN | 
| ICJxEvHfx4 | -0.577977 | 0.387412 | -0.594324 | 0.294138 | 
| mLyqvY1KVk | 0.901325 | -1.560872 | 0.160145 | -1.170798 | 
| 23TmyM4Vjt | 0.291432 | -0.810652 | -0.184815 | 0.211070 | 
| J2JBz9nc2N | 1.818592 | NaN | NaN | -0.257770 | 
| 77utwjDcEg | NaN | 0.183574 | 0.111528 | 1.381484 | 
| yAg15dn3KU | -0.259229 | -0.201913 | -1.912119 | -0.271751 | 
| yRbk019rAz | -0.913102 | -0.120397 | -0.721780 | 0.725017 | 
| U5oGxD3WnW | -0.565617 | 0.579980 | NaN | 0.031062 | 
| sNGtSOERD7 | NaN | 0.074556 | 1.525357 | -1.744255 | 
| ubpFxokUim | 1.045186 | 1.165908 | -1.138931 | -0.082937 | 
| b6ZCgA1dZD | 1.620180 | 0.962883 | 0.966021 | 1.709217 | 
| 3lYiadTEpV | -0.147411 | 0.950925 | -0.122691 | -0.973400 | 
| qvn2AUqCgF | 1.207141 | 0.611017 | -0.733806 | 0.072065 | 
| dqO35ESFKx | 2.076349 | 0.519654 | NaN | 0.217811 | 
| am4HPhiP5I | -1.403172 | NaN | 0.797241 | -0.641350 | 
pd.util.testing.makeTimeDataFrame()
| A | B | C | D | |
|---|---|---|---|---|
| 2000-01-03 | 1.003781 | -0.356442 | -0.690759 | -2.543469 | 
| 2000-01-04 | 0.307821 | -0.425427 | -0.134576 | 0.824121 | 
| 2000-01-05 | -0.976996 | -1.113531 | -0.522290 | -0.567436 | 
| 2000-01-06 | 0.438595 | 0.129363 | -0.433556 | 1.080008 | 
| 2000-01-07 | -0.001516 | -0.291726 | -1.008141 | -0.605705 | 
| 2000-01-10 | 0.015357 | -1.016492 | -0.162590 | -2.282291 | 
| 2000-01-11 | 0.523147 | 0.587595 | 0.451137 | -0.031271 | 
| 2000-01-12 | 0.917431 | 1.109996 | -0.561045 | 0.305243 | 
| 2000-01-13 | 1.150708 | -0.069645 | -0.561491 | -0.165408 | 
| 2000-01-14 | 1.122621 | -1.035067 | 1.201935 | -0.690251 | 
| 2000-01-17 | 0.813662 | 0.924708 | 0.545710 | -0.002078 | 
| 2000-01-18 | 1.379715 | -0.229399 | -0.726494 | 0.493481 | 
| 2000-01-19 | 0.622288 | -0.757303 | 0.689828 | 0.229719 | 
| 2000-01-20 | -0.680374 | 1.011196 | -0.551390 | -1.713440 | 
| 2000-01-21 | -1.524046 | -1.010875 | -0.997332 | 0.666093 | 
| 2000-01-24 | -0.941409 | 1.025098 | -0.905963 | 1.082745 | 
| 2000-01-25 | 0.042881 | 1.382499 | 1.049391 | 1.074299 | 
| 2000-01-26 | 0.967413 | -0.923113 | 2.007157 | 0.628600 | 
| 2000-01-27 | -0.051459 | 1.172377 | 0.464885 | 1.129185 | 
| 2000-01-28 | 0.533879 | -1.320932 | 0.503104 | -0.695140 | 
| 2000-01-31 | -1.133007 | 0.437374 | 0.505606 | -1.571305 | 
| 2000-02-01 | 0.979238 | -0.721377 | 0.234655 | -0.061103 | 
| 2000-02-02 | 0.231642 | -0.627750 | -0.133623 | 0.748030 | 
| 2000-02-03 | -1.431866 | 1.060292 | -0.040169 | 0.229246 | 
| 2000-02-04 | -0.350627 | -1.165422 | 0.436717 | 0.446511 | 
| 2000-02-07 | -1.886733 | 0.926730 | 0.604955 | -1.034671 | 
| 2000-02-08 | 1.205402 | 1.982943 | -0.653149 | -0.675256 | 
| 2000-02-09 | 0.212245 | -0.374800 | -0.472396 | 1.577047 | 
| 2000-02-10 | -0.732806 | -1.332025 | 1.173203 | 0.602202 | 
| 2000-02-11 | -0.681866 | -0.170969 | -1.366112 | 0.779899 | 
pd.util.testing.makeMixedDataFrame()
| A | B | C | D | |
|---|---|---|---|---|
| 0 | 0.0 | 0.0 | foo1 | 2009-01-01 | 
| 1 | 1.0 | 1.0 | foo2 | 2009-01-02 | 
| 2 | 2.0 | 0.0 | foo3 | 2009-01-05 | 
| 3 | 3.0 | 1.0 | foo4 | 2009-01-06 | 
| 4 | 4.0 | 0.0 | foo5 | 2009-01-07 | 
- Write a Pandas program to fill missing values in time series data.
s74 = {"c1":[120, 130 ,140, 150, np.nan, 170], "c2":[7, np.nan, 10, np.nan, 5.5, 16.5]}
pd.util.testing.makeDateIndex()[:6]
DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-10'], dtype='datetime64[ns]', freq='B')
df74=pd.DataFrame(s74,index=pd.util.testing.makeDateIndex()[:6])
df74
| c1 | c2 | |
|---|---|---|
| 2000-01-03 | 120.0 | 7.0 | 
| 2000-01-04 | 130.0 | NaN | 
| 2000-01-05 | 140.0 | 10.0 | 
| 2000-01-06 | 150.0 | NaN | 
| 2000-01-07 | NaN | 5.5 | 
| 2000-01-10 | 170.0 | 16.5 | 
df74.interpolate()
| c1 | c2 | |
|---|---|---|
| 2000-01-03 | 120.0 | 7.00 | 
| 2000-01-04 | 130.0 | 8.50 | 
| 2000-01-05 | 140.0 | 10.00 | 
| 2000-01-06 | 150.0 | 7.75 | 
| 2000-01-07 | 160.0 | 5.50 | 
| 2000-01-10 | 170.0 | 16.50 | 
- Write a Pandas program to use a local variable within a query.
df75 = pd.DataFrame({'W':[68,75,86,80,66],'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});
df75
| W | X | Y | Z | |
|---|---|---|---|---|
| 0 | 68 | 78 | 84 | 86 | 
| 1 | 75 | 85 | 94 | 97 | 
| 2 | 86 | 96 | 89 | 96 | 
| 3 | 80 | 80 | 83 | 72 | 
| 4 | 66 | 86 | 86 | 83 | 
maxx = df75["W"].max()
df75.query("W < @maxx")
| W | X | Y | Z | |
|---|---|---|---|---|
| 0 | 68 | 78 | 84 | 86 | 
| 1 | 75 | 85 | 94 | 97 | 
| 3 | 80 | 80 | 83 | 72 | 
| 4 | 66 | 86 | 86 | 83 | 
#Solution 2 (without Query)
df75[df75.W<df75.W.max()]
| W | X | Y | Z | |
|---|---|---|---|---|
| 0 | 68 | 78 | 84 | 86 | 
| 1 | 75 | 85 | 94 | 97 | 
| 3 | 80 | 80 | 83 | 72 | 
| 4 | 66 | 86 | 86 | 83 | 
- Write a Pandas program to clean object column with mixed data of a given DataFrame using regular expression.
df76 = pd.DataFrame({"agent": ["a001", "a002", "a003", "a003", "a004"], "purchase":[4500.00, 7500.00, "$3000.25", "$1250.35", "9000.00"]})
df76
| agent | purchase | |
|---|---|---|
| 0 | a001 | 4500 | 
| 1 | a002 | 7500 | 
| 2 | a003 | $3000.25 | 
| 3 | a003 | $1250.35 | 
| 4 | a004 | 9000.00 | 
df76.purchase.apply(type)
0    <class 'float'>
1    <class 'float'>
2      <class 'str'>
3      <class 'str'>
4      <class 'str'>
Name: purchase, dtype: object
df76.purchase.replace("[$]","",regex=True).astype('float').apply(type)
0    <class 'float'>
1    <class 'float'>
2    <class 'float'>
3    <class 'float'>
4    <class 'float'>
Name: purchase, dtype: object
- Write a Pandas program to get the numeric representation of an array by identifying distinct values of a given column of a dataframe.
df77 = pd.DataFrame({
    'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill'],
    'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'Age': [18.5, 21.2, 22.5, 22, 23]
})
df77
| Name | Date_Of_Birth | Age | |
|---|---|---|---|
| 0 | Alberto Franco | 17/05/2002 | 18.5 | 
| 1 | Gino Mcneill | 16/02/1999 | 21.2 | 
| 2 | Ryan Parkes | 25/09/1998 | 22.5 | 
| 3 | Eesha Hinton | 11/05/2002 | 22.0 | 
| 4 | Gino Mcneill | 15/09/1997 | 23.0 | 
a77_1=df77.Name.unique()
d77={}
for ind,name in df77.iterrows():
    print(ind,name[0])
    if name[0] not in d77.keys():
        d77[name[0]]=ind
0 Alberto Franco
1 Gino Mcneill
2 Ryan Parkes
3 Eesha Hinton
4 Gino Mcneill
d77
{'Alberto Franco': 0, 'Gino Mcneill': 1, 'Ryan Parkes': 2, 'Eesha Hinton': 3}
df77.replace({'Name':d77})
| Name | Date_Of_Birth | Age | |
|---|---|---|---|
| 0 | 0 | 17/05/2002 | 18.5 | 
| 1 | 1 | 16/02/1999 | 21.2 | 
| 2 | 2 | 25/09/1998 | 22.5 | 
| 3 | 3 | 11/05/2002 | 22.0 | 
| 4 | 1 | 15/09/1997 | 23.0 | 
#solution 1
df77['Name'].map(d77).tolist()
[0, 1, 2, 3, 1]
#Solution 2
df77.replace({'Name':d77}).Name.tolist()
[0, 1, 2, 3, 1]
#one step Solution
pd.factorize(df77.Name)
(array([0, 1, 2, 3, 1], dtype=int64),
 Index(['Alberto Franco', 'Gino Mcneill', 'Ryan Parkes', 'Eesha Hinton'], dtype='object'))
- Write a Pandas program to replace the current value in a dataframe column based on last largest value. If the current value is less than last largest value replaces the value with 0.
df78=pd.DataFrame({'rnum':[23, 21, 27, 22, 34, 33, 34, 31, 25, 22, 34, 19, 31, 32, 19]})
df78
| rnum | |
|---|---|
| 0 | 23 | 
| 1 | 21 | 
| 2 | 27 | 
| 3 | 22 | 
| 4 | 34 | 
| 5 | 33 | 
| 6 | 34 | 
| 7 | 31 | 
| 8 | 25 | 
| 9 | 22 | 
| 10 | 34 | 
| 11 | 19 | 
| 12 | 31 | 
| 13 | 32 | 
| 14 | 19 | 
a78_1=[-np.inf]
for ele in df78.rnum:
    if ele<np.amax(a78_1):
        a78_1.append(0)
    else:
        a78_1.append(ele)
    #print(ele)
print(a78_1[1:])
[23, 0, 27, 0, 34, 0, 34, 0, 0, 0, 34, 0, 0, 0, 0]
#Solution 2
df78.rnum.where(df78.rnum.eq(df78.rnum.cummax()),0)
0     23
1      0
2     27
3      0
4     34
5      0
6     34
7      0
8      0
9      0
10    34
11     0
12     0
13     0
14     0
Name: rnum, dtype: int64
- Write a Pandas program to create a DataFrame from the clipboard (data from an Excel spreadsheet or a Google Sheet).
df79=pd.read_clipboard()
df79
| Python | Pandas | DataFrame | 
|---|
- Write a Pandas program to check for inequality of two given DataFrames.
df80_1 = pd.DataFrame({'W':[68,75,86,80,None],'X':[78,85,None,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});
df80_2 = pd.DataFrame({'W':[78,75,86,80,None],'X':[78,85,96,80,76], 'Y':[84,84,89,83,86],'Z':[86,97,96,72,83]});
df80_1!=df80_2
#~(df80_1==df80_2)
#df80_1.ne(df80_2)
| W | X | Y | Z | |
|---|---|---|---|---|
| 0 | True | False | False | False | 
| 1 | False | False | True | False | 
| 2 | False | True | False | False | 
| 3 | False | False | False | False | 
| 4 | True | True | False | False | 
- Write a Pandas program to get lowest n records within each group of a given DataFrame.
df81 = pd.DataFrame({'col1': [1, 2, 3, 4, 7, 11], 'col2': [4, 5, 6, 9, 5, 0], 'col3': [7, 5, 8, 12, 1,11]})
df81.nsmallest(3,'col1')
| col1 | col2 | col3 | |
|---|---|---|---|
| 0 | 1 | 4 | 7 | 
| 1 | 2 | 5 | 5 | 
| 2 | 3 | 6 | 8 | 
df81.nsmallest(3,'col2')
| col1 | col2 | col3 | |
|---|---|---|---|
| 5 | 11 | 0 | 11 | 
| 0 | 1 | 4 | 7 | 
| 1 | 2 | 5 | 5 | 
df81.nsmallest(3,'col3')
| col1 | col2 | col3 | |
|---|---|---|---|
| 4 | 7 | 5 | 1 | 
| 1 | 2 | 5 | 5 | 
| 0 | 1 | 4 | 7 | 
#df81.nsmallest(3,columns=['col2','col1']) # It is taking first passed column into account even if we pass multiple columns
 
  
  
