This is an example of how to extract customer information, such as the credit card number from an Ecommerce using Python.
We start loading the data and showing the first 10 observations. We also can see the number of columns (14) and rows (10000) of the dataset.
import pandas as pd
ecom = pd.read_csv('C:/07 - R Website/dataset/Graph/Ecommerce Purchases')
ecom.head(10)
## Address ... Purchase Price
## 0 16629 Pace Camp Apt. 448\nAlexisborough, NE 77... ... 98.14
## 1 9374 Jasmine Spurs Suite 508\nSouth John, TN 8... ... 70.73
## 2 Unit 0065 Box 5052\nDPO AP 27450 ... 0.95
## 3 7780 Julia Fords\nNew Stacy, WA 45798 ... 78.04
## 4 23012 Munoz Drive Suite 337\nNew Cynthia, TX 5... ... 77.82
## 5 7502 Powell Mission Apt. 768\nTravisland, VA 3... ... 25.15
## 6 93971 Conway Causeway\nAndersonburgh, AZ 75107 ... 88.56
## 7 260 Rachel Plains Suite 366\nCastroberg, WV 24... ... 44.25
## 8 2129 Dylan Burg\nNew Michelle, ME 28650 ... 59.54
## 9 3795 Dawson Extensions\nLake Tinafort, ID 88739 ... 95.63
##
## [10 rows x 14 columns]
ecom.columns
## Index(['Address', 'Lot', 'AM or PM', 'Browser Info', 'Company', 'Credit Card',
## 'CC Exp Date', 'CC Security Code', 'CC Provider', 'Email', 'Job',
## 'IP Address', 'Language', 'Purchase Price'],
## dtype='object')
len(ecom.columns) # give the number of columns
## 14
len(ecom.index) # give the number of rows
## 10000
Now we can calculate the average, highest and lowest purchase price from the Purchase Price variable iside the Ecommerce dataset. Moreover, we can find from the Language column how many customers came speak english.
ecom['Purchase Price'].mean()
## 50.34730200000025
ecom['Purchase Price'].max()
## 99.99
ecom['Purchase Price'].min()
## 0.0
ecom[ecom['Language']=='en']['Language'].count()
## 1098
Here, we calculate how manu people made a purchase both at AM and PM from the variable called AM or PM. We can also calculate the 10 most common job titles of the Ecommerce customers from the Job column.
ecom['AM or PM'].value_counts()
## PM 5068
## AM 4932
## Name: AM or PM, dtype: int64
ecom['Job'].value_counts().head(10) # top job of customers
## Interior and spatial designer 31
## Lawyer 30
## Social researcher 28
## Purchasing manager 27
## Research officer, political party 27
## Designer, jewellery 27
## Dietitian 26
## Designer, fashion/clothing 26
## Social worker 26
## Charity fundraiser 26
## Name: Job, dtype: int64
We can also find the email of the person with a specific credit card number. Moreover we can find how many peole have an American Express as their credit card and made a purchase above $80. Moreover, how many people have a credit card that expired on 2020.
ecom[ecom['Credit Card']==4926535242672853]['Email']
## 1234 bondellen@williams-garza.com
## Name: Email, dtype: object
ecom[(ecom['CC Provider']=='American Express') & (ecom['Purchase Price']>80)].count()
## Address 174
## Lot 174
## AM or PM 174
## Browser Info 174
## Company 174
## Credit Card 174
## CC Exp Date 174
## CC Security Code 174
## CC Provider 174
## Email 174
## Job 174
## IP Address 174
## Language 174
## Purchase Price 174
## dtype: int64
sum(ecom['CC Exp Date'].apply(lambda exp: exp[3:]== '20'))
## 988
Now we are searching from the most popular, top 5, email provider used to log-in in the Ecommerce.
# email = ecom['Email'].iloc[0]
# email.split('@')[1]
ecom['Email'].apply(lambda emailnum: emailnum.split('@')[1]).value_counts().head(5)
## hotmail.com 1638
## yahoo.com 1616
## gmail.com 1605
## smith.com 42
## williams.com 37
## Name: Email, dtype: int64