Mode Analytics: Python Tutorial (5)

Lesson 5 covers filtering data with boolean indexes. The Mode tutorial can be found here. Lesson 5 uses the Watsi data set that was used in lessons 3 and 4. It expands on the practice of web analytics and the techniques that can be used to understand web usage. Below is a brief summary of lesson 5.

Segmentation: breaking down data into subsections (i.e., breaking down page views data based on referrers).

Boolean indexes can be used to filter data. Once an index is created, you can create a DataFrame to select instances only when that index is ‘true’ using square brackets. See below:

Boolean_variable_name = data[(data['column_name']== 'row_object')]

Once the filtered DataFrame is created, it can be used to further examine questions about that specific subset of data.

Method .str.contains(): This method returns a boolean index for whether the row object contains a specific string. This method is case sensitive unless case parameter is changed .str.contains(case=False).

Method .tolist(): This method returns a python list given a pandas series.

Query string: A query string is how searches are stored in urls and contain “?”. This can be used to understand what search terms people used to find specific webpages. In the below url “crowd funding for medical treatment” was searched.

My code for lesson 5 is below.

# Python Notebook - Python Tutorial: Lesson 5

datasets[0].head(n=5) # Code to access data produced by SQL query (not written by me)

# Prepping a DataFrame
import pandas as pd

data = datasets[0] # This creates a variable for the SQL query results
data = data.fillna('') # Replaces mising values as strings


# Filtering data with boolean indexing
data['title']== 'Watsi | Fund medical treatments for people around the world' # Boolean index for views on homepage
homepage_index = (data['title'] == 'Watsi | Fund medical treatments for people around the world') # Assign index a variable name
watsi_homepage = data[homepage_index] # Selects only 'true' rows from homepage index
watsi_homepage['referrer'].value_counts()[:15] # Returns top 15 referrers to watsi homepage
# problem with above is that it's messy (many links from google)
watsi_homepage['referrer_domain'].value_counts()[:15] # combines referrers from same domain

# Practice problem: Select all the pageviews originating from the Reddit domain, and see where traffic is landing within Watsi.
reddit_index = data['referrer_domain'] == ''
watsi_reddit = data[reddit_index] # Selects only 'true' rows from reddit index
watsi_reddit['title'].value_counts()  # Returns list of where traffic is landing from
# top 2 rows can be combined like this instead:
watsi_reddit = data[data['referrer_domain']== ''] 

# Partial matching text with .str.contains()
medical_referrer_index = data['referrer'].str.contains('medical')
medical_referrals = data[medical_referrer_index]
medical_referrals['referrer'].tolist()# Returns a list instead of a pandas series

# Practice problem: Find the records with a referrer link containing "crowdfund"
crowdfund_index = data['referrer'].str.contains('crowdfund') # creates an index if referrer contains "crowdfun"
data[crowdfund_index]['referrer'].tolist() # Selects index and 'referrer' column and returns a list

# Practice problem:Find the users who visited the site on a windows phone using `user_agent`. Output the full string values.
windows_index = data['user_agent'].str.contains('IEMobile') # creates an index if user_agent contains "IEMobile"
data[windows_index]['user_agent'].tolist() # Selects index and 'user_agent' column and returns a list



One thought on “Mode Analytics: Python Tutorial (5)

  1. Pingback: Mode Analytics: Python Tutorial (6) | DATADOLL

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s