{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Use Case Tutorial 3: Customer Interest Clustering\n", "\n", "This is a tutorial on how to perform customer clustering based on the interests and purchases of customers. \n", "\n", "Marketing teams frequently are interested in this analysis.\n", "\n", "We'll show how graph analytics can be used to gain insights about the interests of customers by finding communities of customers who've bought similar products. \n", "\n", "We'll accomplish this by creating a bipartite graph of customers and products, using a graph projection to create a graph of customers linked to other customers who've bought the same product, and using Louvain community detection to find the communities.\n", "\n", "We'll be using ecommerce transaction data from a U.K. retailer provided by the University of California, Irvine. The data can be found [here](https://www.kaggle.com/carrie1/ecommerce-data)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Preprocessing\n", "\n", "Let's first look at the data.\n", "\n", "First, we'll need to import some libraries." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import metagraph as mg\n", "import pandas as pd\n", "import networkx as nx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see what the data looks like." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountry
053636585123AWHITE HANGING HEART T-LIGHT HOLDER612/1/2010 8:262.5517850.0United Kingdom
153636571053WHITE METAL LANTERN612/1/2010 8:263.3917850.0United Kingdom
253636584406BCREAM CUPID HEARTS COAT HANGER812/1/2010 8:262.7517850.0United Kingdom
353636584029GKNITTED UNION FLAG HOT WATER BOTTLE612/1/2010 8:263.3917850.0United Kingdom
453636584029ERED WOOLLY HOTTIE WHITE HEART.612/1/2010 8:263.3917850.0United Kingdom
\n", "
" ], "text/plain": [ " InvoiceNo StockCode Description Quantity \\\n", "0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 \n", "1 536365 71053 WHITE METAL LANTERN 6 \n", "2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 \n", "3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 \n", "4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 \n", "\n", " InvoiceDate UnitPrice CustomerID Country \n", "0 12/1/2010 8:26 2.55 17850.0 United Kingdom \n", "1 12/1/2010 8:26 3.39 17850.0 United Kingdom \n", "2 12/1/2010 8:26 2.75 17850.0 United Kingdom \n", "3 12/1/2010 8:26 3.39 17850.0 United Kingdom \n", "4 12/1/2010 8:26 3.39 17850.0 United Kingdom " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "RAW_DATA_CSV = './data/ecommerce/data.csv' # https://www.kaggle.com/carrie1/ecommerce-data\n", "data_df = pd.read_csv(RAW_DATA_CSV, encoding=\"ISO-8859-1\")\n", "data_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's clean the data to make sure there aren't any missing values. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountry
053636585123AWHITE HANGING HEART T-LIGHT HOLDER62010-12-01 08:26:002.5517850United Kingdom
153636571053WHITE METAL LANTERN62010-12-01 08:26:003.3917850United Kingdom
253636584406BCREAM CUPID HEARTS COAT HANGER82010-12-01 08:26:002.7517850United Kingdom
353636584029GKNITTED UNION FLAG HOT WATER BOTTLE62010-12-01 08:26:003.3917850United Kingdom
453636584029ERED WOOLLY HOTTIE WHITE HEART.62010-12-01 08:26:003.3917850United Kingdom
\n", "
" ], "text/plain": [ " InvoiceNo StockCode Description Quantity \\\n", "0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 \n", "1 536365 71053 WHITE METAL LANTERN 6 \n", "2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 \n", "3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 \n", "4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 \n", "\n", " InvoiceDate UnitPrice CustomerID Country \n", "0 2010-12-01 08:26:00 2.55 17850 United Kingdom \n", "1 2010-12-01 08:26:00 3.39 17850 United Kingdom \n", "2 2010-12-01 08:26:00 2.75 17850 United Kingdom \n", "3 2010-12-01 08:26:00 3.39 17850 United Kingdom \n", "4 2010-12-01 08:26:00 3.39 17850 United Kingdom " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_df.InvoiceDate = pd.to_datetime(data_df.InvoiceDate, format=\"%m/%d/%Y %H:%M\")\n", "data_df.drop(data_df.index[data_df.CustomerID != data_df.CustomerID], inplace=True)\n", "assert len(data_df[data_df.isnull().any(axis=1)])==0, \"Raw data contains NaN\"\n", "data_df = data_df.astype({'CustomerID': int}, copy=False)\n", "data_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that some of these transactions are for returns (denoted by negative quantity values)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountry
141C536379DDiscount-12010-12-01 09:41:0027.5014527United Kingdom
154C53638335004CSET OF 3 COLOURED FLYING DUCKS-12010-12-01 09:49:004.6515311United Kingdom
235C53639122556PLASTERS IN TIN CIRCUS PARADE-122010-12-01 10:24:001.6517548United Kingdom
236C53639121984PACK OF 12 PINK PAISLEY TISSUES-242010-12-01 10:24:000.2917548United Kingdom
237C53639121983PACK OF 12 BLUE PAISLEY TISSUES-242010-12-01 10:24:000.2917548United Kingdom
\n", "
" ], "text/plain": [ " InvoiceNo StockCode Description Quantity \\\n", "141 C536379 D Discount -1 \n", "154 C536383 35004C SET OF 3 COLOURED FLYING DUCKS -1 \n", "235 C536391 22556 PLASTERS IN TIN CIRCUS PARADE -12 \n", "236 C536391 21984 PACK OF 12 PINK PAISLEY TISSUES -24 \n", "237 C536391 21983 PACK OF 12 BLUE PAISLEY TISSUES -24 \n", "\n", " InvoiceDate UnitPrice CustomerID Country \n", "141 2010-12-01 09:41:00 27.50 14527 United Kingdom \n", "154 2010-12-01 09:49:00 4.65 15311 United Kingdom \n", "235 2010-12-01 10:24:00 1.65 17548 United Kingdom \n", "236 2010-12-01 10:24:00 0.29 17548 United Kingdom \n", "237 2010-12-01 10:24:00 0.29 17548 United Kingdom " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_df[data_df.Quantity < 1].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Though customers may have returned these products, they did initially purchase the products (which reflects an interest in the product), so we’ll keep the initial purchases. However, we’ll remove the return transactions (which will also remove any discount transactions as well)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountry
053636585123AWHITE HANGING HEART T-LIGHT HOLDER62010-12-01 08:26:002.5517850United Kingdom
153636571053WHITE METAL LANTERN62010-12-01 08:26:003.3917850United Kingdom
253636584406BCREAM CUPID HEARTS COAT HANGER82010-12-01 08:26:002.7517850United Kingdom
353636584029GKNITTED UNION FLAG HOT WATER BOTTLE62010-12-01 08:26:003.3917850United Kingdom
453636584029ERED WOOLLY HOTTIE WHITE HEART.62010-12-01 08:26:003.3917850United Kingdom
\n", "
" ], "text/plain": [ " InvoiceNo StockCode Description Quantity \\\n", "0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 \n", "1 536365 71053 WHITE METAL LANTERN 6 \n", "2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 \n", "3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 \n", "4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 \n", "\n", " InvoiceDate UnitPrice CustomerID Country \n", "0 2010-12-01 08:26:00 2.55 17850 United Kingdom \n", "1 2010-12-01 08:26:00 3.39 17850 United Kingdom \n", "2 2010-12-01 08:26:00 2.75 17850 United Kingdom \n", "3 2010-12-01 08:26:00 3.39 17850 United Kingdom \n", "4 2010-12-01 08:26:00 3.39 17850 United Kingdom " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_df.drop(data_df.index[data_df.Quantity <= 0], inplace=True)\n", "data_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Community Detection\n", "\n", "Let's now find the communities of customers with similar purchases / interests. \n", "\n", "First, we'll need to create a bipartite graph of customers and products. \n", "\n", "Let's grab the default resolver." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at the nodes of the bipartite graph we're going to create. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "customer_ids = data_df['CustomerID']\n", "stock_codes = data_df['StockCode']" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 17850\n", "1 17850\n", "2 17850\n", "3 17850\n", "4 17850\n", "Name: CustomerID, dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "customer_ids.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 85123A\n", "1 71053\n", "2 84406B\n", "3 84029G\n", "4 84029E\n", "Name: StockCode, dtype: object" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stock_codes.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our customer ids are ints, but our stock codes are not ints. \n", "\n", "Ideally, our graph will have nodes of all the same type since some hardware backends might require this. This isn't strictly necessary here, but it's good practice to do this in order to avoid any potential problems any specific backend might have. \n", "\n", "We can make our graph nodes all have the same type by mapping our original customer ids and stock codes to node ids and making a graph of those node ids. We can do this with `metagraph.NodeLabels`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "all_nodes = pd.concat([customer_ids, stock_codes]).unique()\n", "node_ids = range(len(all_nodes))\n", "node_labels = mg.NodeLabels(node_ids, all_nodes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`node_labels` maps the customer ids or stock codes to node ids as shown below. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "17850" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_customer_id = customer_ids.iloc[0]\n", "first_customer_id" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_customer_id_node_id = node_labels[first_customer_id]\n", "first_customer_id_node_id" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'85123A'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_stock_code = stock_codes.iloc[0]\n", "first_stock_code" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4339" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_stock_code_node_id = node_labels[first_stock_code]\n", "first_stock_code_node_id" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`node_labels.ids` maps node ids to customer ids or stock codes as shown below. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "17850" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "node_labels.ids[first_customer_id_node_id]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'85123A'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "node_labels.ids[first_stock_code_node_id]" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "assert node_labels.ids[first_customer_id_node_id] == first_customer_id\n", "assert node_labels.ids[first_stock_code_node_id] == first_stock_code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now create our bipartite graph." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "customer_id_node_ids = [node_labels[customer_id] for customer_id in customer_ids]\n", "stock_code_node_ids = [node_labels[stock_code] for stock_code in stock_codes]\n", "edges = zip(customer_id_node_ids, stock_code_node_ids)\n", "\n", "nx_bipartite_graph = nx.Graph()\n", "nx_bipartite_graph.add_edges_from(edges)\n", "bipartite_graph = mg.wrappers.BipartiteGraph.NetworkXBipartiteGraph(\n", " nx_bipartite_graph, \n", " [customer_id_node_ids, stock_code_node_ids]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we'll need to use a graph projection to create a graph of customers linked to other customers who've bought the same product." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "customer_similarity_graph = mg.algos.bipartite.graph_projection(bipartite_graph, 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have an unweighted bipartite graph. Louvain community detection requires weights. A more elegant approach might be taken in practice, but we'll simply assign every edge to have a weight of 1 for this tutorial." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "customer_similarity_graph = mg.algos.util.graph.assign_uniform_weight(\n", " customer_similarity_graph, \n", " 1.0\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we'll need to use Louvain community detection to find similar communities based on purchased products." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "community_labels, modularity_score = mg.algos.clustering.louvain_community(\n", " customer_similarity_graph\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`community_labels` is a mapping from node IDs to their community labels. \n", "\n", "Let's see how many / what community labels we have." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(community_labels)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0, 1, 2, 3}" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "set(community_labels.values())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now merge the labels into our dataframe." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InvoiceNoStockCodeDescriptionQuantityInvoiceDateUnitPriceCustomerIDCountryCustomerCommunityLabel
17408055173922916HERB MARKER THYME22011-05-04 10:58:000.6518118United Kingdom0
33756056645022977DOLLY GIRL CHILDRENS EGG CUP122011-09-12 16:12:001.2515673United Kingdom2
34633656718422173METAL 4 HOOK HANGER FRENCH CHATEAU12011-09-18 15:41:003.2916033United Kingdom0
49311957815522208WOOD STAMP SET THANK YOU12011-11-23 11:32:000.8312748United Kingdom0
40674057182822812PACK 3 BOXES CHRISTMAS PANNETONE82011-10-19 11:52:001.9516440United Kingdom2
48280957748423295SET OF 12 MINI LOAF BAKING CASES12011-11-20 11:52:000.8313536United Kingdom2
30381356355522201FRYING PAN BLUE POLKADOT12011-08-17 13:21:004.2516755United Kingdom3
52556158063223552BICYCLE PUNCTURE REPAIR KIT22011-12-05 12:16:002.0816360United Kingdom2
40637757174722585PACK OF 6 BIRDY GIFT TAGS122011-10-19 10:59:001.2513849United Kingdom2
34476356709723355HOT WATER BOTTLE KEEP CALM82011-09-16 13:23:004.9513323United Kingdom0
\n", "
" ], "text/plain": [ " InvoiceNo StockCode Description Quantity \\\n", "174080 551739 22916 HERB MARKER THYME 2 \n", "337560 566450 22977 DOLLY GIRL CHILDRENS EGG CUP 12 \n", "346336 567184 22173 METAL 4 HOOK HANGER FRENCH CHATEAU 1 \n", "493119 578155 22208 WOOD STAMP SET THANK YOU 1 \n", "406740 571828 22812 PACK 3 BOXES CHRISTMAS PANNETONE 8 \n", "482809 577484 23295 SET OF 12 MINI LOAF BAKING CASES 1 \n", "303813 563555 22201 FRYING PAN BLUE POLKADOT 1 \n", "525561 580632 23552 BICYCLE PUNCTURE REPAIR KIT 2 \n", "406377 571747 22585 PACK OF 6 BIRDY GIFT TAGS 12 \n", "344763 567097 23355 HOT WATER BOTTLE KEEP CALM 8 \n", "\n", " InvoiceDate UnitPrice CustomerID Country \\\n", "174080 2011-05-04 10:58:00 0.65 18118 United Kingdom \n", "337560 2011-09-12 16:12:00 1.25 15673 United Kingdom \n", "346336 2011-09-18 15:41:00 3.29 16033 United Kingdom \n", "493119 2011-11-23 11:32:00 0.83 12748 United Kingdom \n", "406740 2011-10-19 11:52:00 1.95 16440 United Kingdom \n", "482809 2011-11-20 11:52:00 0.83 13536 United Kingdom \n", "303813 2011-08-17 13:21:00 4.25 16755 United Kingdom \n", "525561 2011-12-05 12:16:00 2.08 16360 United Kingdom \n", "406377 2011-10-19 10:59:00 1.25 13849 United Kingdom \n", "344763 2011-09-16 13:23:00 4.95 13323 United Kingdom \n", "\n", " CustomerCommunityLabel \n", "174080 0 \n", "337560 2 \n", "346336 0 \n", "493119 0 \n", "406740 2 \n", "482809 2 \n", "303813 3 \n", "525561 2 \n", "406377 2 \n", "344763 0 " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_df['CustomerCommunityLabel'] = data_df.CustomerID.map(\n", " lambda customer_id: community_labels[node_labels[customer_id]]\n", ")\n", "data_df.sample(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have clusters of customers who've bought similar products and can market to these interests. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }