Lesson 4 covers counting values and basic plotting in Python. The Mode tutorial for lesson 4 can be found here. The data set used in this tutorial is the same as in tutorial 3.
This tutorial begins to go over web analytics and what the practice can be used for. Web analytics uses data such as page views or number of visitors to understand and enhance web usage. Mode breaks down web analytics into three primary goals: (1) finding and monitoring trends, (2) finding and monitoring outliers, and (3) how different metrics change over time. Web analytics can be a highly effective tool when applied to businesses or market research.
Method .value_counts(): used to count the number of times a value occurs in a categorical column.
This tutorial begins to use more complicated lines of code which is why they recommend breaking down more complex statements and examining their individual steps. To do this, add the methods and functions one by one by either adding a new cell for each step or adding to the same cell and re-running the statement after each addition.
Method .plot(): From pandas, this method is used to create charts with DataFrame and Series objects (see tutorial 3 for more information on these). To choose your plot type, use the keyword kind=, followed by the type you want (i.e., kind=bar). Other keywords can allow you to change other plot features.
Missing data is a common occurrence in any data set. When there is a high frequency of missing data as there is in the Watsi referrers column, it’s a good thing to note and look into.
Below is my code for lesson 4.
# Python Notebook - Python Tutorial: Lesson 4
datasets.head(n=5) # Code to access data produced by SQL query (not written by me)
# Prepping a DataFrame
import pandas as pd
data = datasets # This creates a variable for the SQL query results
data = data.fillna('') # Replaces mising values as strings
# Counting with categorical columns
data['title'].value_counts()[:20] # Selects the 20 most frequent titles
# Visualizing data
data['title'].value_counts()[:20].plot(kind='barh') # Plots a horizontal bar chart of the most visited pages
# Traffic volume here forms a logarithmic or longtail distribiution
# Practice problem: What were the 15 most popular website sections? Bonus Points for creating a plot!
# Practice problem: What websites most commonly referred users to Watsi's pages? Create a frequency distribution bar chart.
data['referrer_domain'].value_counts()[:20] # 20 most common referrers
data['referrer_domain'].value_counts()[:20].plot(kind='bar') # Frequency distribution bar chart
data['referrer_domain'].value_counts()[:20].plot(kind='bar', title='Top Referrers to Watsi')# Title added
# Practice problem: When people visited Watsi, what devices were they using? Plot the relative counts of pageviews from each platform.
data['platform'].value_counts()[:20].plot(kind='barh', title= 'Top platforms used to visit Watsi')