Advanced Analysis of Categorical Data in Python

My client’s bank automatically categorizes charges on their credit card but does so using many (dozens!) of rather specific and often inaccurate categories. This isn’t super helpful for the way my client would like to track spending and the bank’s web interface offers limited functionality to manually change how these categories work. Additionally, the bank’s analytics dashboard lacks sufficient detail to be a source of insight. In this project, I want to re-categorize each record in a way that works for my client. I also want to build some visualizations that enable tracking spending in meaningful and actionable ways.

Also, I’m using dummy data for demonstration purposes, of course.

Imports

I’ll start by importing everything I’ll need for the whole project.

import pandas as pd
import glob
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.patches as mpatches
import numpy as np

Data Acquisition and Loading

From the bank’s website, my client was able to download about 14 months of credit card data in several CSV files and I have placed those files into a directory in my development environment. I will use glob to load all of the files into one file object.

files = glob.glob('F:/dataProjects/2023/data/charges/*.csv')

Now I will make a list of Pandas DataFrames (df). There will be one DataFrame for each CSV file in the data directory.

dfs = []
for file in files:
    data = pd.read_csv(file)
    dfs.append(data)

Next I will concatenate all of the DataFrames in the list, making one big DataFrame. This method works because all of the files have the same column structure.

new_df = pd.concat(dfs)

Data Wrangling and Exploration

Now that we have the data loaded into a pandas DataFrame, let’s inspect the first few rows to see what the data look like.

print(new_df.head(10))

Looking at these first 10 rows, there are several things I want to do. First, I want to sort the DataFrame by the “Date” field. It looks like it’s in order but just to avoid any funny business, I will make it explicit. Once the data are sorted, I will re-index so the indexes follow our new order, and then I want to get rid of the original index column; we don’t need that anymore.

# Sort the df by Date
new_df = new_df.sort_values('Date')

# Re-index so the new df is properly indexed
new_df = new_df.reset_index()

# Get rid of the original indexes. We don't need those anymore. 
new_df.drop(columns=['index'], inplace=True)

As I ultimately want to use the “Category” field for the new categories I am going to create, so let’s go ahead and rename that field now so we can retain the original categories for comparison.

new_df = new_df.rename(columns={'Category': 'Old_Category'})

I also want to only include rows that have a value of “Posted” in the “Status” field. Other values in this column include “Pending” (which will often be different dollar amounts when the charges post, such as restaurant charges for the bill amount without the tip that are later updated with the grand total) and “Declined” (which have “0” as the dollar amount). I only want to deal with these final charge amounts.

new_df = new_df.loc[new_df['Status'] == 'Posted']

I also don’t want my client to have to think about what dates they already have data for in the local directory. We just want to grab some data for the bank’s interface and produce some useful metrics. We’ll handle any duplicate data in the script.

new_df.drop_duplicates(inplace=True)

Looking at the first 10 rows of the DataFrame again, I’m curious about the “Amounts” column. The amounts shown are negative. Further investigation revealed amounts of zero for declined transactions (which we have already removed), and positive amounts for zeroing out cancelled transactions, and also positive amounts for transactions such as inbound transfers and cash-back rewards. As we only want to track charges, let’s just toss out any transaction with an amount less than zero.

new_df = new_df.loc[new_df['Amount'] < 0]

Now the entire “Amount” field has negative numbers. Let’s take the absolute value so we’re dealing with positive numbers.

new_df['Amount'] = new_df.Amount.abs()

I also see that there’s a very rare value of “Category Pending” in the “Old_Category”. This happened only twice in the whole dataset on a couple of small charges which would not have been categorized the same way if they hadn’t been stuck in Pending Purgatory. To avoid the need to create special rules for cases like this, let’s not include these in our analysis. My client and I agree this will have a negligible effect on our results.

new_df = new_df.loc[new_df['Old_Category'] != 'Category Pending']

Since we’re going to be working with monthly data, let’s go ahead and make a new column that will store each month as a category. This will make things much easier later on.

date_list = []
for i in new_df.Date:
    date_list.append(i[5:7] + '/' + i[2:4])
new_df['Year_Month'] = date_list

Building the New Categories

Now let’s do the heavy lifting of creating the new categories. This will use a variety of ways to reduce the number of charge categories and to improve categorization accuracy.

new_df.loc[new_df['Old_Category']== 'Restaurants', 'Category'] = 'Restaurants'
new_df.loc[new_df['Old_Category']== 'Food & Dining', 'Category'] = 'Restaurants'
new_df.loc[new_df['Old_Category']== 'Fast Food', 'Category'] = 'Restaurants'
new_df.loc[new_df['Old_Category']== 'Alcohol & Bars', 'Category'] = 'Restaurants'
new_df.loc[new_df['Old_Category']== 'Food Dining', 'Category'] = 'Restaurants'
new_df.loc[new_df['Description']== 'Denny\'s', 'Category'] = 'Restaurants'
new_df.loc[new_df['Old_Category']== 'Utilities', 'Category'] = 'Home'
new_df.loc[new_df['Old_Category']== 'Gas', 'Category'] = 'Car'
new_df.loc[new_df['Description']== 'Internet', 'Category'] = 'Home'
new_df.loc[new_df['Description']== 'Walmart', 'Category'] = 'Walmart'
new_df.loc[new_df['Description']== 'Amazon', 'Category'] = 'Amazon'
new_df.loc[new_df['Description']== 'Ron Gas Co.', 'Category'] = 'Home'
new_df.loc[new_df['Old_Category']== 'Groceries', 'Category'] = 'Groceries'
new_df.loc[new_df['Description']== 'Cream of Ice', 'Category'] = 'Restaurants'
new_df.loc[new_df['Old_Category']== 'Auto & Transport', 'Category'] = 'Car'
new_df.loc[new_df['Old_Category']== 'Babysitter & Daycare', 'Category'] = 'Daycare'
new_df.loc[new_df['Old_Category']== 'Books', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Clothing', 'Category'] = 'Health & Beauty'
new_df.loc[new_df['Old_Category']== 'Education', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Electronics & Software', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Books', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Hair', 'Category'] = 'Health & Beauty'
new_df.loc[new_df['Old_Category']== 'Health & Fitness', 'Category'] = 'Health & Beauty'
new_df.loc[new_df['Old_Category']== 'Eyecare', 'Category'] = 'Health & Beauty'
new_df.loc[new_df['Old_Category']== 'Home', 'Category'] = 'Home'
new_df.loc[new_df['Old_Category']== 'Home Improvement', 'Category'] = 'Home'
new_df.loc[new_df['Old_Category']== 'Home Services', 'Category'] = 'Home'
new_df.loc[new_df['Old_Category']== 'Legal', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Books', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Dollar Tree', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Original Description'].str.contains('ADD ON'), 'Category'] = 'Entertainment'
new_df.loc[new_df['Original Description'].str.contains('XSP 123'), 'Category'] = 'Entertainment'
new_df.loc[new_df['Original Description'].str.contains('Amazon'), 'Category'] = 'Amazon'
new_df.loc[new_df['Original Description'].str.contains('amzn.com'), 'Category'] = 'Amazon'
new_df.loc[new_df['Original Description'].str.contains('AMAZON.COM'), 'Category'] = 'Amazon'
new_df.loc[new_df['Original Description'].str.contains('AMZN'), 'Category'] = 'Amazon'
new_df.loc[new_df['Description']== 'Dinosaur Run', 'Category'] = 'Entertainment'
new_df.loc[new_df['Old_Category']== 'Television', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Entertainment', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Spa & Massage', 'Category'] = 'Health & Beauty'
new_df.loc[new_df['Old_Category']== 'Pharmacy', 'Category'] = 'Health & Beauty'
new_df.loc[new_df['Old_Category']== 'Doctor', 'Category'] = 'Health & Beauty'
new_df.loc[new_df['Old_Category']== 'Hobbies', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Parking', 'Category'] = 'Car'
new_df.loc[new_df['Old_Category']== 'Hotel', 'Category'] = 'Travel'
new_df.loc[new_df['Old_Category']== 'Travel', 'Category'] = 'Travel'
new_df.loc[new_df['Old_Category']== 'Air Travel', 'Category'] = 'Travel'
new_df.loc[new_df['Old_Category']== 'Pets', 'Category'] = 'Home'
new_df.loc[new_df['Old_Category']== 'Pet Food & Supplies', 'Category'] = 'Home'
new_df.loc[new_df['Old_Category']== 'Shipping', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Mobile Phone', 'Category'] = 'Home'
new_df.loc[new_df['Old_Category']== 'Taxes', 'Category'] = 'Home'
new_df.loc[new_df['Old_Category']== 'Gym', 'Category'] = 'Health & Beauty'
new_df.loc[new_df['Description']== 'Electrify', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Lunch Lady Hair Nets', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Montango Express', 'Category'] = 'Entertainment'
new_df.loc[new_df['Description']== 'Belts', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'The Store', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Starbucks', 'Category'] = 'Restaurants'
new_df.loc[new_df['Description']== 'Google', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Original Description'].str.contains('BLUE CRAB'), 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Hangar Shop', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Spirit Halloween', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Lulu Studio', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Dillard\'s', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Onsite News', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Rental Car & Taxi', 'Category'] = 'Travel'
new_df.loc[new_df['Description']== 'Season Basics', 'Category'] = 'Entertainment'
new_df.loc[new_df['Description']== 'California Shop', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description'].str.contains('Progressive'), 'Category'] = 'Car'
new_df.loc[new_df['Old_Category']== 'Mortgage & Rent', 'Category'] = 'Home'
new_df.loc[new_df['Original Description'].str.contains('Build a bear'), 'Category'] = 'Entertainment'
new_df.loc[new_df['Original Description'].str.contains('CROSS AND DIME'), 'Category'] = 'Entertainment'
new_df.loc[new_df['Original Description'].str.contains('THE BATTING CAGE'), 'Category'] = 'Entertainment'
new_df.loc[new_df['Original Description'].str.contains('FAIR TICKETS'), 'Category'] = 'Entertainment'
new_df.loc[new_df['Old_Category']== 'Personal Care', 'Category'] = 'Health & Beauty'
new_df.loc[new_df['Description']== 'Thumbtack', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Suite', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Glass Jars', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Service & Parts', 'Category'] = 'Car'
new_df.loc[new_df['Old_Category']== 'Babysitter Daycare', 'Category'] = 'Daycare'
new_df.loc[new_df['Description']== 'Square Tippy', 'Category'] = 'Restaurants'
new_df.loc[new_df['Description']== 'Pods', 'Category'] = 'Home'
new_df.loc[new_df['Old_Category']== 'Business Services', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Jared Galleria', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Original Description'].str.contains('GIFTSHOPGAYLORD'), 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Five Below', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Rover.com', 'Category'] = 'Home'
new_df.loc[new_df['Old_Category']== 'Transfer', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Veterinary', 'Category'] = 'Home'
new_df.loc[new_df['Description']== 'Dorian Studio Inc', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Fees & Charges', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Movies & Dvds', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'JCPenney', 'Category'] = 'Health & Beauty'
new_df.loc[new_df['Description']== 'Family Dollar', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Vacation', 'Category'] = 'Travel'
new_df.loc[new_df['Old_Category']== 'Newspapers & Magazines', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Old_Category']== 'Sporting Goods', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Description']== 'Mission Depot', 'Category'] = 'General Merchandise'
new_df.loc[new_df['Original Description'].str.contains('SIX FLAGS'), 'Category'] = 'Entertainment'

Finally, if there are any records that we didn’t catch in our new categorization scheme, let’s call them “Uncategorized” so we know where to direct our efforts to improve the process.

new_df = new_df.fillna('Uncategorized')

Let’s check our results. Here’s a quick process check to see how the new categorization scheme performed on our dataset.

num_old_cats = str(len(new_df['Old_Category'].unique()))
num_new_cats = str(len(new_df['Category'].unique()))
print('We started with ' + num_old_cats + ' categories and reduced it down to ' + num_new_cats + '. Nice!')
print('Here are the new categories and their value counts:')
new_df.Category.value_counts()

Python output

Data Visualization

Let’s first get the total date range of the data we have loaded by creating two variables that store the first and last date in the “Date” field. This works because we sorted the DataFrame by date earlier. We’ll use these variables to build the our first graph.

first_date = new_df['Date'].iloc[0]
last_date = new_df['Date'].iloc[-1]

And now we can create our first graph:

plt.rcParams['figure.figsize'] = (20,10)
plot_order = new_df.groupby('Category')['Amount'].sum().sort_values(ascending=False).index.values
sns.set(font_scale=2)
sns.barplot(x=new_df['Amount'], y=new_df['Category'], estimator=sum, ci=None, color='purple', order=plot_order).set(title='All Charges in All Categories (' + first_date + ' - ' + last_date + ')', xlabel='Amount (USD)')

Now that we’ve taken a broad view of all of the data that has been loaded, let’s build into our code the ability to capture a “rolling year” so as a user adds new data to the data directory, we’re always looking at the last year of data for much of our analysis. We’ll do the same thing to capture the most recent month so we can deep-dive on those charges.

yearMonthList = new_df.Year_Month.unique()
mostRecentMonth = yearMonthList[-1]
rollingYear = yearMonthList[-12:]

# Make a new df with only the last 12 months in it.
lastYear_df = new_df.loc[new_df['Year_Month'].isin(rollingYear)]

# Create a new df with only the rows from the most recent month
mostRecentMonth_df = new_df.loc[new_df['Year_Month']== mostRecentMonth]

Next, let’s capture all of the charged amounts for each month and compute a sum for each month. Then we will compute the mean (average) amount charged on a monthly basis.

# Get all the months
month_1 = lastYear_df.loc[lastYear_df['Year_Month']== rollingYear[0]]
month_2 = lastYear_df.loc[lastYear_df['Year_Month']== rollingYear[1]]
month_3 = lastYear_df.loc[lastYear_df['Year_Month']== rollingYear[2]]
month_4 = lastYear_df.loc[lastYear_df['Year_Month']== rollingYear[3]]
month_5 = lastYear_df.loc[lastYear_df['Year_Month']== rollingYear[4]]
month_6 = lastYear_df.loc[lastYear_df['Year_Month']== rollingYear[5]]
month_7 = lastYear_df.loc[lastYear_df['Year_Month']== rollingYear[6]]
month_8 = lastYear_df.loc[lastYear_df['Year_Month']== rollingYear[7]]
month_9 = lastYear_df.loc[lastYear_df['Year_Month']== rollingYear[8]]
month_10 = lastYear_df.loc[lastYear_df['Year_Month']== rollingYear[9]]
month_11 = lastYear_df.loc[lastYear_df['Year_Month']== rollingYear[10]]
month_12 = lastYear_df.loc[lastYear_df['Year_Month']== rollingYear[11]]

# sum all the amounts for the months and store them in a list
sum_list = [month_1['Amount'].sum(), month_2['Amount'].sum(), month_3['Amount'].sum(), month_4['Amount'].sum(), month_5['Amount'].sum(), month_6['Amount'].sum(), month_7['Amount'].sum(), month_8['Amount'].sum(), month_9['Amount'].sum(), month_10['Amount'].sum(), month_11['Amount'].sum(), month_12['Amount'].sum()]

# Compute the mean and store it in a variable.
mean_line = np.mean(sum_list)

Now we will use the monthly sums and mean to generate some visualizations:

# Generate Seaborn bar graph with mean line
sns.set(font_scale=2)
purple_p = mpatches.Patch(color='purple', label='Average Monthly Charges')
sns.barplot(x=lastYear_df['Year_Month'], y=lastYear_df['Amount'], estimator=sum, ci=None, color='lightgreen').set(title='Monthly Credit Card Charges Over the Last Year', xlabel='Month', ylabel='Amount (USD)')
plt.axhline(mean_line, color='purple', linewidth=2)
plt.legend(handles=[purple_p], facecolor='white')

# Generate Seaborn Facetgrid with bar graphs
sns.set(font_scale=0.8)
g = sns.FacetGrid(lastYear_df, col='Category', sharex=False, sharey=False, col_wrap=3, height=3, aspect=1.5)
g.map_dataframe(sns.barplot, x='Year_Month', y='Amount', estimator=sum, ci=None, color='red')
g.fig.subplots_adjust(top=0.9)
g.fig.suptitle('Monthly Charges in Each Category', fontsize=22)

Next, we will quickly capture all of the unique categories in a list to be used in our next visualization.

catList = lastYear_df.Category.unique()

And now we’ll do some processing to capture the rows that contain each of the categories for each of the months individually.

# Get all the cats in the first month
month_1_cat_1 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[0]) & (lastYear_df['Category']== catList[0])]
month_1_cat_2 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[0]) & (lastYear_df['Category']== catList[1])]
month_1_cat_3 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[0]) & (lastYear_df['Category']== catList[2])]
month_1_cat_4 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[0]) & (lastYear_df['Category']== catList[3])]
month_1_cat_5 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[0]) & (lastYear_df['Category']== catList[4])]
month_1_cat_6 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[0]) & (lastYear_df['Category']== catList[5])]
month_1_cat_7 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[0]) & (lastYear_df['Category']== catList[6])]
month_1_cat_8 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[0]) & (lastYear_df['Category']== catList[7])]
month_1_cat_9 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[0]) & (lastYear_df['Category']== catList[8])]
month_1_cat_10 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[0]) & (lastYear_df['Category']== catList[9])]
month_1_cat_11 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[0]) & (lastYear_df['Category']== catList[10])]
# Get all the cats in the second month
month_2_cat_1 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[1]) & (lastYear_df['Category']== catList[0])]
month_2_cat_2 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[1]) & (lastYear_df['Category']== catList[1])]
month_2_cat_3 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[1]) & (lastYear_df['Category']== catList[2])]
month_2_cat_4 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[1]) & (lastYear_df['Category']== catList[3])]
month_2_cat_5 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[1]) & (lastYear_df['Category']== catList[4])]
month_2_cat_6 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[1]) & (lastYear_df['Category']== catList[5])]
month_2_cat_7 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[1]) & (lastYear_df['Category']== catList[6])]
month_2_cat_8 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[1]) & (lastYear_df['Category']== catList[7])]
month_2_cat_9 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[1]) & (lastYear_df['Category']== catList[8])]
month_2_cat_10 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[1]) & (lastYear_df['Category']== catList[9])]
month_2_cat_11 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[1]) & (lastYear_df['Category']== catList[10])]
# Get all the cats in the third month
month_3_cat_1 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[2]) & (lastYear_df['Category']== catList[0])]
month_3_cat_2 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[2]) & (lastYear_df['Category']== catList[1])]
month_3_cat_3 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[2]) & (lastYear_df['Category']== catList[2])]
month_3_cat_4 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[2]) & (lastYear_df['Category']== catList[3])]
month_3_cat_5 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[2]) & (lastYear_df['Category']== catList[4])]
month_3_cat_6 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[2]) & (lastYear_df['Category']== catList[5])]
month_3_cat_7 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[2]) & (lastYear_df['Category']== catList[6])]
month_3_cat_8 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[2]) & (lastYear_df['Category']== catList[7])]
month_3_cat_9 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[2]) & (lastYear_df['Category']== catList[8])]
month_3_cat_10 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[2]) & (lastYear_df['Category']== catList[9])]
month_3_cat_11 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[2]) & (lastYear_df['Category']== catList[10])]
# Get all the cats in the fourth month
month_4_cat_1 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[3]) & (lastYear_df['Category']== catList[0])]
month_4_cat_2 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[3]) & (lastYear_df['Category']== catList[1])]
month_4_cat_3 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[3]) & (lastYear_df['Category']== catList[2])]
month_4_cat_4 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[3]) & (lastYear_df['Category']== catList[3])]
month_4_cat_5 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[3]) & (lastYear_df['Category']== catList[4])]
month_4_cat_6 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[3]) & (lastYear_df['Category']== catList[5])]
month_4_cat_7 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[3]) & (lastYear_df['Category']== catList[6])]
month_4_cat_8 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[3]) & (lastYear_df['Category']== catList[7])]
month_4_cat_9 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[3]) & (lastYear_df['Category']== catList[8])]
month_4_cat_10 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[3]) & (lastYear_df['Category']== catList[9])]
month_4_cat_11 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[3]) & (lastYear_df['Category']== catList[10])]
# Get all the cats in the fifth month
month_5_cat_1 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[4]) & (lastYear_df['Category']== catList[0])]
month_5_cat_2 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[4]) & (lastYear_df['Category']== catList[1])]
month_5_cat_3 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[4]) & (lastYear_df['Category']== catList[2])]
month_5_cat_4 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[4]) & (lastYear_df['Category']== catList[3])]
month_5_cat_5 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[4]) & (lastYear_df['Category']== catList[4])]
month_5_cat_6 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[4]) & (lastYear_df['Category']== catList[5])]
month_5_cat_7 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[4]) & (lastYear_df['Category']== catList[6])]
month_5_cat_8 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[4]) & (lastYear_df['Category']== catList[7])]
month_5_cat_9 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[4]) & (lastYear_df['Category']== catList[8])]
month_5_cat_10 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[4]) & (lastYear_df['Category']== catList[9])]
month_5_cat_11 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[4]) & (lastYear_df['Category']== catList[10])]
# Get all the cats in the sixth month
month_6_cat_1 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[5]) & (lastYear_df['Category']== catList[0])]
month_6_cat_2 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[5]) & (lastYear_df['Category']== catList[1])]
month_6_cat_3 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[5]) & (lastYear_df['Category']== catList[2])]
month_6_cat_4 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[5]) & (lastYear_df['Category']== catList[3])]
month_6_cat_5 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[5]) & (lastYear_df['Category']== catList[4])]
month_6_cat_6 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[5]) & (lastYear_df['Category']== catList[5])]
month_6_cat_7 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[5]) & (lastYear_df['Category']== catList[6])]
month_6_cat_8 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[5]) & (lastYear_df['Category']== catList[7])]
month_6_cat_9 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[5]) & (lastYear_df['Category']== catList[8])]
month_6_cat_10 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[5]) & (lastYear_df['Category']== catList[9])]
month_6_cat_11 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[5]) & (lastYear_df['Category']== catList[10])]
# Get all the cats in the seventh month
month_7_cat_1 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[6]) & (lastYear_df['Category']== catList[0])]
month_7_cat_2 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[6]) & (lastYear_df['Category']== catList[1])]
month_7_cat_3 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[6]) & (lastYear_df['Category']== catList[2])]
month_7_cat_4 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[6]) & (lastYear_df['Category']== catList[3])]
month_7_cat_5 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[6]) & (lastYear_df['Category']== catList[4])]
month_7_cat_6 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[6]) & (lastYear_df['Category']== catList[5])]
month_7_cat_7 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[6]) & (lastYear_df['Category']== catList[6])]
month_7_cat_8 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[6]) & (lastYear_df['Category']== catList[7])]
month_7_cat_9 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[6]) & (lastYear_df['Category']== catList[8])]
month_7_cat_10 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[6]) & (lastYear_df['Category']== catList[9])]
month_7_cat_11 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[6]) & (lastYear_df['Category']== catList[10])]
# Get all the cats in the eighth month
month_8_cat_1 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[7]) & (lastYear_df['Category']== catList[0])]
month_8_cat_2 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[7]) & (lastYear_df['Category']== catList[1])]
month_8_cat_3 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[7]) & (lastYear_df['Category']== catList[2])]
month_8_cat_4 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[7]) & (lastYear_df['Category']== catList[3])]
month_8_cat_5 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[7]) & (lastYear_df['Category']== catList[4])]
month_8_cat_6 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[7]) & (lastYear_df['Category']== catList[5])]
month_8_cat_7 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[7]) & (lastYear_df['Category']== catList[6])]
month_8_cat_8 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[7]) & (lastYear_df['Category']== catList[7])]
month_8_cat_9 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[7]) & (lastYear_df['Category']== catList[8])]
month_8_cat_10 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[7]) & (lastYear_df['Category']== catList[9])]
month_8_cat_11 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[7]) & (lastYear_df['Category']== catList[10])]
# Get all the cats in the ninth month
month_9_cat_1 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[8]) & (lastYear_df['Category']== catList[0])]
month_9_cat_2 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[8]) & (lastYear_df['Category']== catList[1])]
month_9_cat_3 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[8]) & (lastYear_df['Category']== catList[2])]
month_9_cat_4 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[8]) & (lastYear_df['Category']== catList[3])]
month_9_cat_5 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[8]) & (lastYear_df['Category']== catList[4])]
month_9_cat_6 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[8]) & (lastYear_df['Category']== catList[5])]
month_9_cat_7 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[8]) & (lastYear_df['Category']== catList[6])]
month_9_cat_8 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[8]) & (lastYear_df['Category']== catList[7])]
month_9_cat_9 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[8]) & (lastYear_df['Category']== catList[8])]
month_9_cat_10 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[8]) & (lastYear_df['Category']== catList[9])]
month_9_cat_11 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[8]) & (lastYear_df['Category']== catList[10])]
# Get all the cats in the tenth month
month_10_cat_1 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[9]) & (lastYear_df['Category']== catList[0])]
month_10_cat_2 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[9]) & (lastYear_df['Category']== catList[1])]
month_10_cat_3 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[9]) & (lastYear_df['Category']== catList[2])]
month_10_cat_4 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[9]) & (lastYear_df['Category']== catList[3])]
month_10_cat_5 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[9]) & (lastYear_df['Category']== catList[4])]
month_10_cat_6 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[9]) & (lastYear_df['Category']== catList[5])]
month_10_cat_7 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[9]) & (lastYear_df['Category']== catList[6])]
month_10_cat_8 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[9]) & (lastYear_df['Category']== catList[7])]
month_10_cat_9 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[9]) & (lastYear_df['Category']== catList[8])]
month_10_cat_10 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[9]) & (lastYear_df['Category']== catList[9])]
month_10_cat_11 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[9]) & (lastYear_df['Category']== catList[10])]
# Get all the cats in the eleventh month
month_11_cat_1 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[10]) & (lastYear_df['Category']== catList[0])]
month_11_cat_2 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[10]) & (lastYear_df['Category']== catList[1])]
month_11_cat_3 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[10]) & (lastYear_df['Category']== catList[2])]
month_11_cat_4 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[10]) & (lastYear_df['Category']== catList[3])]
month_11_cat_5 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[10]) & (lastYear_df['Category']== catList[4])]
month_11_cat_6 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[10]) & (lastYear_df['Category']== catList[5])]
month_11_cat_7 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[10]) & (lastYear_df['Category']== catList[6])]
month_11_cat_8 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[10]) & (lastYear_df['Category']== catList[7])]
month_11_cat_9 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[10]) & (lastYear_df['Category']== catList[8])]
month_11_cat_10 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[10]) & (lastYear_df['Category']== catList[9])]
month_11_cat_11 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[10]) & (lastYear_df['Category']== catList[10])]
# Get all the cats in the twelth month
month_12_cat_1 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[11]) & (lastYear_df['Category']== catList[0])]
month_12_cat_2 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[11]) & (lastYear_df['Category']== catList[1])]
month_12_cat_3 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[11]) & (lastYear_df['Category']== catList[2])]
month_12_cat_4 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[11]) & (lastYear_df['Category']== catList[3])]
month_12_cat_5 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[11]) & (lastYear_df['Category']== catList[4])]
month_12_cat_6 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[11]) & (lastYear_df['Category']== catList[5])]
month_12_cat_7 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[11]) & (lastYear_df['Category']== catList[6])]
month_12_cat_8 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[11]) & (lastYear_df['Category']== catList[7])]
month_12_cat_9 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[11]) & (lastYear_df['Category']== catList[8])]
month_12_cat_10 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[11]) & (lastYear_df['Category']== catList[9])]
month_12_cat_11 = lastYear_df.loc[(lastYear_df['Year_Month']== rollingYear[11]) & (lastYear_df['Category']== catList[10])]

Now let’s create a list for each category containing the sums of charges for each month in the last 12 months.

cat_1_sums = [month_1_cat_1['Amount'].sum(), month_2_cat_1['Amount'].sum(), month_3_cat_1['Amount'].sum(), month_4_cat_1['Amount'].sum(), month_5_cat_1['Amount'].sum(), month_6_cat_1['Amount'].sum(), month_7_cat_1['Amount'].sum(), month_8_cat_1['Amount'].sum(), month_9_cat_1['Amount'].sum(), month_10_cat_1['Amount'].sum(), month_11_cat_1['Amount'].sum(), month_12_cat_1['Amount'].sum()]
cat_2_sums = [month_1_cat_2['Amount'].sum(), month_2_cat_2['Amount'].sum(), month_3_cat_2['Amount'].sum(), month_4_cat_2['Amount'].sum(), month_5_cat_2['Amount'].sum(), month_6_cat_2['Amount'].sum(), month_7_cat_2['Amount'].sum(), month_8_cat_2['Amount'].sum(), month_9_cat_2['Amount'].sum(), month_10_cat_2['Amount'].sum(), month_11_cat_2['Amount'].sum(), month_12_cat_2['Amount'].sum()]
cat_3_sums = [month_1_cat_3['Amount'].sum(), month_2_cat_3['Amount'].sum(), month_3_cat_3['Amount'].sum(), month_4_cat_3['Amount'].sum(), month_5_cat_3['Amount'].sum(), month_6_cat_3['Amount'].sum(), month_7_cat_3['Amount'].sum(), month_8_cat_3['Amount'].sum(), month_9_cat_3['Amount'].sum(), month_10_cat_3['Amount'].sum(), month_11_cat_3['Amount'].sum(), month_12_cat_3['Amount'].sum()]
cat_4_sums = [month_1_cat_4['Amount'].sum(), month_2_cat_4['Amount'].sum(), month_3_cat_4['Amount'].sum(), month_4_cat_4['Amount'].sum(), month_5_cat_4['Amount'].sum(), month_6_cat_4['Amount'].sum(), month_7_cat_4['Amount'].sum(), month_8_cat_4['Amount'].sum(), month_9_cat_4['Amount'].sum(), month_10_cat_4['Amount'].sum(), month_11_cat_4['Amount'].sum(), month_12_cat_4['Amount'].sum()]
cat_5_sums = [month_1_cat_5['Amount'].sum(), month_2_cat_5['Amount'].sum(), month_3_cat_5['Amount'].sum(), month_4_cat_5['Amount'].sum(), month_5_cat_5['Amount'].sum(), month_6_cat_5['Amount'].sum(), month_7_cat_5['Amount'].sum(), month_8_cat_5['Amount'].sum(), month_9_cat_5['Amount'].sum(), month_10_cat_5['Amount'].sum(), month_11_cat_5['Amount'].sum(), month_12_cat_5['Amount'].sum()]
cat_6_sums = [month_1_cat_6['Amount'].sum(), month_2_cat_6['Amount'].sum(), month_3_cat_6['Amount'].sum(), month_4_cat_6['Amount'].sum(), month_5_cat_6['Amount'].sum(), month_6_cat_6['Amount'].sum(), month_7_cat_6['Amount'].sum(), month_8_cat_6['Amount'].sum(), month_9_cat_6['Amount'].sum(), month_10_cat_6['Amount'].sum(), month_11_cat_6['Amount'].sum(), month_12_cat_6['Amount'].sum()]
cat_7_sums = [month_1_cat_7['Amount'].sum(), month_2_cat_7['Amount'].sum(), month_3_cat_7['Amount'].sum(), month_4_cat_7['Amount'].sum(), month_5_cat_7['Amount'].sum(), month_6_cat_7['Amount'].sum(), month_7_cat_7['Amount'].sum(), month_8_cat_7['Amount'].sum(), month_9_cat_7['Amount'].sum(), month_10_cat_7['Amount'].sum(), month_11_cat_7['Amount'].sum(), month_12_cat_7['Amount'].sum()]
cat_8_sums = [month_1_cat_8['Amount'].sum(), month_2_cat_8['Amount'].sum(), month_3_cat_8['Amount'].sum(), month_4_cat_8['Amount'].sum(), month_5_cat_8['Amount'].sum(), month_6_cat_8['Amount'].sum(), month_7_cat_8['Amount'].sum(), month_8_cat_8['Amount'].sum(), month_9_cat_8['Amount'].sum(), month_10_cat_8['Amount'].sum(), month_11_cat_8['Amount'].sum(), month_12_cat_8['Amount'].sum()]
cat_9_sums = [month_1_cat_9['Amount'].sum(), month_2_cat_9['Amount'].sum(), month_3_cat_9['Amount'].sum(), month_4_cat_9['Amount'].sum(), month_5_cat_9['Amount'].sum(), month_6_cat_9['Amount'].sum(), month_7_cat_9['Amount'].sum(), month_8_cat_9['Amount'].sum(), month_9_cat_9['Amount'].sum(), month_10_cat_9['Amount'].sum(), month_11_cat_9['Amount'].sum(), month_12_cat_9['Amount'].sum()]
cat_10_sums = [month_1_cat_10['Amount'].sum(), month_2_cat_10['Amount'].sum(), month_3_cat_10['Amount'].sum(), month_4_cat_10['Amount'].sum(), month_5_cat_10['Amount'].sum(), month_6_cat_10['Amount'].sum(), month_7_cat_10['Amount'].sum(), month_8_cat_10['Amount'].sum(), month_9_cat_10['Amount'].sum(), month_10_cat_10['Amount'].sum(), month_11_cat_10['Amount'].sum(), month_12_cat_10['Amount'].sum()]
cat_11_sums = [month_1_cat_11['Amount'].sum(), month_2_cat_11['Amount'].sum(), month_3_cat_11['Amount'].sum(), month_4_cat_11['Amount'].sum(), month_5_cat_11['Amount'].sum(), month_6_cat_11['Amount'].sum(), month_7_cat_11['Amount'].sum(), month_8_cat_11['Amount'].sum(), month_9_cat_11['Amount'].sum(), month_10_cat_11['Amount'].sum(), month_11_cat_11['Amount'].sum(), month_12_cat_11['Amount'].sum()]

Now let’s compute the monthly averages for each category and store that answer in a list.

monthly_avgs = [sum(cat_1_sums) / len(cat_1_sums), sum(cat_2_sums) / len(cat_2_sums), sum(cat_3_sums) / len(cat_3_sums), sum(cat_4_sums) / len(cat_4_sums), sum(cat_5_sums) / len(cat_5_sums), sum(cat_6_sums) / len(cat_6_sums), sum(cat_7_sums) / len(cat_7_sums), sum(cat_8_sums) / len(cat_8_sums), sum(cat_9_sums) / len(cat_9_sums), sum(cat_10_sums) / len(cat_10_sums), sum(cat_11_sums) / len(cat_11_sums)]

Next let’s create a DataFrame with the monthly averages list and apply an index of the list categories, then transpose the DataFrame so the categories appear as columns and the monthly averages appear as values in those columns.

avgs_df = pd.DataFrame(monthly_avgs, index=catList).transpose()

And now we’re finally able to produce the visualization.

# Seaborn bar graph for the total charges in each category for the
# most recent month and also a point plot for the average monthly
# charges for each category.

yellow_patch = mpatches.Patch(color='yellow', label='Charges by Category for the month of ' + mostRecentMonth)
blue_patch = mpatches.Patch(color='blue', label='1-Year Rolling Monthly Average for Each Category')
sns.set(font_scale=2)
plot_order = mostRecentMonth_df.groupby('Category')['Amount'].sum().sort_values(ascending=False).index.values
sns.pointplot(data=avgs_df, join=False, color='blue', order=plot_order, orient='h')
sns.barplot(x=mostRecentMonth_df['Amount'], y=new_df['Category'], estimator=sum, ci=None, color='yellow', order=plot_order, zorder=0.9).set(title='How ' + mostRecentMonth + ' Credit Card Charges Compare to a 1-Year Rolling Monthly Average', xlabel='Amount (USD)')
plt.legend(handles=[yellow_patch, blue_patch], facecolor='white')

This next visualization will show multiple boxplots, one for each category observed over the last year. What we are seeing in these plots is the total distribution of the charges in each of the categories, as described in this graphic:

# Seaborn Facetgrid with boxplots showing the charges in each
# category.
sns.set(font_scale=1.2)
h = sns.FacetGrid(mostRecentMonth_df, col='Category', sharex=False, sharey=False, col_wrap=3, height=3, aspect=1.5)
h.map_dataframe(sns.boxplot, x='Amount', color='cyan')
h.fig.subplots_adjust(top=0.9)
h.fig.suptitle('Amount Distribution of Charges in Each Category for the month of ' + mostRecentMonth, fontsize=22)

Conclusion

The purpose of this project was to create an improved categorization scheme for my client’s credit card charges and to build a more robust series of data visualizations that will enable actionable insight from the data. Both the data wrangling / manipulation capabilities and the visualizations are dynamic to new data; a user just needs to add data to the data directory and run the script in order to produce all of the visualizations seen here.

Over time, some maintenance on the categorization scheme will be needed, however, the script will not throw out data that it cannot categorize; a user will just see a category called “Uncategorized”.

For whatever reason, the bank’s analytics system broke this data into 55 original categories. Further, the analytics dashboard provided on the bank’s website is insufficient to enable meaningful conclusions about the data that can inform credit card charging habits. I think this project demonstrates a successful approach to remedying these issues and will add a lot of value to my client’s personal finance tracking workflow.