{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Use Case Tutorial 3: Customer Interest Clustering\n",
    "\n",
    "This is a tutorial on how to perform customer clustering based on the interests and purchases of customers. \n",
    "\n",
    "Marketing teams frequently are interested in this analysis.\n",
    "\n",
    "We'll show how graph analytics can be used to gain insights about the interests of customers by finding communities of customers who've bought similar products. \n",
    "\n",
    "We'll accomplish this by creating a bipartite graph of customers and products, using a graph projection to create a graph of customers linked to other customers who've bought the same product, and using Louvain community detection to find the communities.\n",
    "\n",
    "We'll be using ecommerce transaction data from a U.K. retailer provided by the University of California, Irvine. The data can be found [here](https://www.kaggle.com/carrie1/ecommerce-data)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Preprocessing\n",
    "\n",
    "Let's first look at the data.\n",
    "\n",
    "First, we'll need to import some libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import metagraph as mg\n",
    "import pandas as pd\n",
    "import networkx as nx"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see what the data looks like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>InvoiceNo</th>\n",
       "      <th>StockCode</th>\n",
       "      <th>Description</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>InvoiceDate</th>\n",
       "      <th>UnitPrice</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>536365</td>\n",
       "      <td>85123A</td>\n",
       "      <td>WHITE HANGING HEART T-LIGHT HOLDER</td>\n",
       "      <td>6</td>\n",
       "      <td>12/1/2010 8:26</td>\n",
       "      <td>2.55</td>\n",
       "      <td>17850.0</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>536365</td>\n",
       "      <td>71053</td>\n",
       "      <td>WHITE METAL LANTERN</td>\n",
       "      <td>6</td>\n",
       "      <td>12/1/2010 8:26</td>\n",
       "      <td>3.39</td>\n",
       "      <td>17850.0</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>536365</td>\n",
       "      <td>84406B</td>\n",
       "      <td>CREAM CUPID HEARTS COAT HANGER</td>\n",
       "      <td>8</td>\n",
       "      <td>12/1/2010 8:26</td>\n",
       "      <td>2.75</td>\n",
       "      <td>17850.0</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>536365</td>\n",
       "      <td>84029G</td>\n",
       "      <td>KNITTED UNION FLAG HOT WATER BOTTLE</td>\n",
       "      <td>6</td>\n",
       "      <td>12/1/2010 8:26</td>\n",
       "      <td>3.39</td>\n",
       "      <td>17850.0</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>536365</td>\n",
       "      <td>84029E</td>\n",
       "      <td>RED WOOLLY HOTTIE WHITE HEART.</td>\n",
       "      <td>6</td>\n",
       "      <td>12/1/2010 8:26</td>\n",
       "      <td>3.39</td>\n",
       "      <td>17850.0</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  InvoiceNo StockCode                          Description  Quantity  \\\n",
       "0    536365    85123A   WHITE HANGING HEART T-LIGHT HOLDER         6   \n",
       "1    536365     71053                  WHITE METAL LANTERN         6   \n",
       "2    536365    84406B       CREAM CUPID HEARTS COAT HANGER         8   \n",
       "3    536365    84029G  KNITTED UNION FLAG HOT WATER BOTTLE         6   \n",
       "4    536365    84029E       RED WOOLLY HOTTIE WHITE HEART.         6   \n",
       "\n",
       "      InvoiceDate  UnitPrice  CustomerID         Country  \n",
       "0  12/1/2010 8:26       2.55     17850.0  United Kingdom  \n",
       "1  12/1/2010 8:26       3.39     17850.0  United Kingdom  \n",
       "2  12/1/2010 8:26       2.75     17850.0  United Kingdom  \n",
       "3  12/1/2010 8:26       3.39     17850.0  United Kingdom  \n",
       "4  12/1/2010 8:26       3.39     17850.0  United Kingdom  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "RAW_DATA_CSV = './data/ecommerce/data.csv' # https://www.kaggle.com/carrie1/ecommerce-data\n",
    "data_df = pd.read_csv(RAW_DATA_CSV, encoding=\"ISO-8859-1\")\n",
    "data_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's clean the data to make sure there aren't any missing values. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>InvoiceNo</th>\n",
       "      <th>StockCode</th>\n",
       "      <th>Description</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>InvoiceDate</th>\n",
       "      <th>UnitPrice</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>536365</td>\n",
       "      <td>85123A</td>\n",
       "      <td>WHITE HANGING HEART T-LIGHT HOLDER</td>\n",
       "      <td>6</td>\n",
       "      <td>2010-12-01 08:26:00</td>\n",
       "      <td>2.55</td>\n",
       "      <td>17850</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>536365</td>\n",
       "      <td>71053</td>\n",
       "      <td>WHITE METAL LANTERN</td>\n",
       "      <td>6</td>\n",
       "      <td>2010-12-01 08:26:00</td>\n",
       "      <td>3.39</td>\n",
       "      <td>17850</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>536365</td>\n",
       "      <td>84406B</td>\n",
       "      <td>CREAM CUPID HEARTS COAT HANGER</td>\n",
       "      <td>8</td>\n",
       "      <td>2010-12-01 08:26:00</td>\n",
       "      <td>2.75</td>\n",
       "      <td>17850</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>536365</td>\n",
       "      <td>84029G</td>\n",
       "      <td>KNITTED UNION FLAG HOT WATER BOTTLE</td>\n",
       "      <td>6</td>\n",
       "      <td>2010-12-01 08:26:00</td>\n",
       "      <td>3.39</td>\n",
       "      <td>17850</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>536365</td>\n",
       "      <td>84029E</td>\n",
       "      <td>RED WOOLLY HOTTIE WHITE HEART.</td>\n",
       "      <td>6</td>\n",
       "      <td>2010-12-01 08:26:00</td>\n",
       "      <td>3.39</td>\n",
       "      <td>17850</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  InvoiceNo StockCode                          Description  Quantity  \\\n",
       "0    536365    85123A   WHITE HANGING HEART T-LIGHT HOLDER         6   \n",
       "1    536365     71053                  WHITE METAL LANTERN         6   \n",
       "2    536365    84406B       CREAM CUPID HEARTS COAT HANGER         8   \n",
       "3    536365    84029G  KNITTED UNION FLAG HOT WATER BOTTLE         6   \n",
       "4    536365    84029E       RED WOOLLY HOTTIE WHITE HEART.         6   \n",
       "\n",
       "          InvoiceDate  UnitPrice  CustomerID         Country  \n",
       "0 2010-12-01 08:26:00       2.55       17850  United Kingdom  \n",
       "1 2010-12-01 08:26:00       3.39       17850  United Kingdom  \n",
       "2 2010-12-01 08:26:00       2.75       17850  United Kingdom  \n",
       "3 2010-12-01 08:26:00       3.39       17850  United Kingdom  \n",
       "4 2010-12-01 08:26:00       3.39       17850  United Kingdom  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_df.InvoiceDate = pd.to_datetime(data_df.InvoiceDate, format=\"%m/%d/%Y %H:%M\")\n",
    "data_df.drop(data_df.index[data_df.CustomerID != data_df.CustomerID], inplace=True)\n",
    "assert len(data_df[data_df.isnull().any(axis=1)])==0, \"Raw data contains NaN\"\n",
    "data_df = data_df.astype({'CustomerID': int}, copy=False)\n",
    "data_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that some of these transactions are for returns (denoted by negative quantity values)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>InvoiceNo</th>\n",
       "      <th>StockCode</th>\n",
       "      <th>Description</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>InvoiceDate</th>\n",
       "      <th>UnitPrice</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>141</th>\n",
       "      <td>C536379</td>\n",
       "      <td>D</td>\n",
       "      <td>Discount</td>\n",
       "      <td>-1</td>\n",
       "      <td>2010-12-01 09:41:00</td>\n",
       "      <td>27.50</td>\n",
       "      <td>14527</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>154</th>\n",
       "      <td>C536383</td>\n",
       "      <td>35004C</td>\n",
       "      <td>SET OF 3 COLOURED  FLYING DUCKS</td>\n",
       "      <td>-1</td>\n",
       "      <td>2010-12-01 09:49:00</td>\n",
       "      <td>4.65</td>\n",
       "      <td>15311</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>235</th>\n",
       "      <td>C536391</td>\n",
       "      <td>22556</td>\n",
       "      <td>PLASTERS IN TIN CIRCUS PARADE</td>\n",
       "      <td>-12</td>\n",
       "      <td>2010-12-01 10:24:00</td>\n",
       "      <td>1.65</td>\n",
       "      <td>17548</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>236</th>\n",
       "      <td>C536391</td>\n",
       "      <td>21984</td>\n",
       "      <td>PACK OF 12 PINK PAISLEY TISSUES</td>\n",
       "      <td>-24</td>\n",
       "      <td>2010-12-01 10:24:00</td>\n",
       "      <td>0.29</td>\n",
       "      <td>17548</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>237</th>\n",
       "      <td>C536391</td>\n",
       "      <td>21983</td>\n",
       "      <td>PACK OF 12 BLUE PAISLEY TISSUES</td>\n",
       "      <td>-24</td>\n",
       "      <td>2010-12-01 10:24:00</td>\n",
       "      <td>0.29</td>\n",
       "      <td>17548</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    InvoiceNo StockCode                       Description  Quantity  \\\n",
       "141   C536379         D                          Discount        -1   \n",
       "154   C536383    35004C   SET OF 3 COLOURED  FLYING DUCKS        -1   \n",
       "235   C536391     22556    PLASTERS IN TIN CIRCUS PARADE        -12   \n",
       "236   C536391     21984  PACK OF 12 PINK PAISLEY TISSUES        -24   \n",
       "237   C536391     21983  PACK OF 12 BLUE PAISLEY TISSUES        -24   \n",
       "\n",
       "            InvoiceDate  UnitPrice  CustomerID         Country  \n",
       "141 2010-12-01 09:41:00      27.50       14527  United Kingdom  \n",
       "154 2010-12-01 09:49:00       4.65       15311  United Kingdom  \n",
       "235 2010-12-01 10:24:00       1.65       17548  United Kingdom  \n",
       "236 2010-12-01 10:24:00       0.29       17548  United Kingdom  \n",
       "237 2010-12-01 10:24:00       0.29       17548  United Kingdom  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_df[data_df.Quantity < 1].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Though customers may have returned these products, they did initially purchase the products (which reflects an interest in the product), so we’ll keep the initial purchases. However, we’ll remove the return transactions (which will also remove any discount transactions as well)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>InvoiceNo</th>\n",
       "      <th>StockCode</th>\n",
       "      <th>Description</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>InvoiceDate</th>\n",
       "      <th>UnitPrice</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>536365</td>\n",
       "      <td>85123A</td>\n",
       "      <td>WHITE HANGING HEART T-LIGHT HOLDER</td>\n",
       "      <td>6</td>\n",
       "      <td>2010-12-01 08:26:00</td>\n",
       "      <td>2.55</td>\n",
       "      <td>17850</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>536365</td>\n",
       "      <td>71053</td>\n",
       "      <td>WHITE METAL LANTERN</td>\n",
       "      <td>6</td>\n",
       "      <td>2010-12-01 08:26:00</td>\n",
       "      <td>3.39</td>\n",
       "      <td>17850</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>536365</td>\n",
       "      <td>84406B</td>\n",
       "      <td>CREAM CUPID HEARTS COAT HANGER</td>\n",
       "      <td>8</td>\n",
       "      <td>2010-12-01 08:26:00</td>\n",
       "      <td>2.75</td>\n",
       "      <td>17850</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>536365</td>\n",
       "      <td>84029G</td>\n",
       "      <td>KNITTED UNION FLAG HOT WATER BOTTLE</td>\n",
       "      <td>6</td>\n",
       "      <td>2010-12-01 08:26:00</td>\n",
       "      <td>3.39</td>\n",
       "      <td>17850</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>536365</td>\n",
       "      <td>84029E</td>\n",
       "      <td>RED WOOLLY HOTTIE WHITE HEART.</td>\n",
       "      <td>6</td>\n",
       "      <td>2010-12-01 08:26:00</td>\n",
       "      <td>3.39</td>\n",
       "      <td>17850</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  InvoiceNo StockCode                          Description  Quantity  \\\n",
       "0    536365    85123A   WHITE HANGING HEART T-LIGHT HOLDER         6   \n",
       "1    536365     71053                  WHITE METAL LANTERN         6   \n",
       "2    536365    84406B       CREAM CUPID HEARTS COAT HANGER         8   \n",
       "3    536365    84029G  KNITTED UNION FLAG HOT WATER BOTTLE         6   \n",
       "4    536365    84029E       RED WOOLLY HOTTIE WHITE HEART.         6   \n",
       "\n",
       "          InvoiceDate  UnitPrice  CustomerID         Country  \n",
       "0 2010-12-01 08:26:00       2.55       17850  United Kingdom  \n",
       "1 2010-12-01 08:26:00       3.39       17850  United Kingdom  \n",
       "2 2010-12-01 08:26:00       2.75       17850  United Kingdom  \n",
       "3 2010-12-01 08:26:00       3.39       17850  United Kingdom  \n",
       "4 2010-12-01 08:26:00       3.39       17850  United Kingdom  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_df.drop(data_df.index[data_df.Quantity <= 0], inplace=True)\n",
    "data_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Community Detection\n",
    "\n",
    "Let's now find the communities of customers with similar purchases / interests. \n",
    "\n",
    "First, we'll need to create a bipartite graph of customers and products. \n",
    "\n",
    "Let's grab the default resolver."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look at the nodes of the bipartite graph we're going to create. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "customer_ids = data_df['CustomerID']\n",
    "stock_codes = data_df['StockCode']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    17850\n",
       "1    17850\n",
       "2    17850\n",
       "3    17850\n",
       "4    17850\n",
       "Name: CustomerID, dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "customer_ids.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    85123A\n",
       "1     71053\n",
       "2    84406B\n",
       "3    84029G\n",
       "4    84029E\n",
       "Name: StockCode, dtype: object"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stock_codes.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our customer ids are ints, but our stock codes are not ints. \n",
    "\n",
    "Ideally, our graph will have nodes of all the same type since some hardware backends might require this. This isn't strictly necessary here, but it's good practice to do this in order to avoid any potential problems any specific backend might have. \n",
    "\n",
    "We can make our graph nodes all have the same type by mapping our original customer ids and stock codes to node ids and making a graph of those node ids. We can do this with `metagraph.NodeLabels`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_nodes = pd.concat([customer_ids, stock_codes]).unique()\n",
    "node_ids = range(len(all_nodes))\n",
    "node_labels = mg.NodeLabels(node_ids, all_nodes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`node_labels` maps the customer ids or stock codes to node ids as shown below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "17850"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first_customer_id = customer_ids.iloc[0]\n",
    "first_customer_id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first_customer_id_node_id = node_labels[first_customer_id]\n",
    "first_customer_id_node_id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'85123A'"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first_stock_code = stock_codes.iloc[0]\n",
    "first_stock_code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4339"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "first_stock_code_node_id = node_labels[first_stock_code]\n",
    "first_stock_code_node_id"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`node_labels.ids` maps node ids to customer ids or stock codes as shown below. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "17850"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "node_labels.ids[first_customer_id_node_id]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'85123A'"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "node_labels.ids[first_stock_code_node_id]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert node_labels.ids[first_customer_id_node_id] == first_customer_id\n",
    "assert node_labels.ids[first_stock_code_node_id] == first_stock_code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now create our bipartite graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "customer_id_node_ids = [node_labels[customer_id] for customer_id in customer_ids]\n",
    "stock_code_node_ids = [node_labels[stock_code] for stock_code in stock_codes]\n",
    "edges = zip(customer_id_node_ids, stock_code_node_ids)\n",
    "\n",
    "nx_bipartite_graph = nx.Graph()\n",
    "nx_bipartite_graph.add_edges_from(edges)\n",
    "bipartite_graph = mg.wrappers.BipartiteGraph.NetworkXBipartiteGraph(\n",
    "    nx_bipartite_graph, \n",
    "    [customer_id_node_ids, stock_code_node_ids]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll need to use a graph projection to create a graph of customers linked to other customers who've bought the same product."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "customer_similarity_graph = mg.algos.bipartite.graph_projection(bipartite_graph, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have an unweighted bipartite graph. Louvain community detection requires weights. A more elegant approach might be taken in practice, but we'll simply assign every edge to have a weight of 1 for this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "customer_similarity_graph = mg.algos.util.graph.assign_uniform_weight(\n",
    "    customer_similarity_graph, \n",
    "    1.0\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we'll need to use Louvain community detection to find similar communities based on purchased products."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "community_labels, modularity_score = mg.algos.clustering.louvain_community(\n",
    "    customer_similarity_graph\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`community_labels` is a mapping from node IDs to their community labels. \n",
    "\n",
    "Let's see how many / what community labels we have."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(community_labels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0, 1, 2, 3}"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "set(community_labels.values())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now merge the labels into our dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>InvoiceNo</th>\n",
       "      <th>StockCode</th>\n",
       "      <th>Description</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>InvoiceDate</th>\n",
       "      <th>UnitPrice</th>\n",
       "      <th>CustomerID</th>\n",
       "      <th>Country</th>\n",
       "      <th>CustomerCommunityLabel</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>174080</th>\n",
       "      <td>551739</td>\n",
       "      <td>22916</td>\n",
       "      <td>HERB MARKER THYME</td>\n",
       "      <td>2</td>\n",
       "      <td>2011-05-04 10:58:00</td>\n",
       "      <td>0.65</td>\n",
       "      <td>18118</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>337560</th>\n",
       "      <td>566450</td>\n",
       "      <td>22977</td>\n",
       "      <td>DOLLY GIRL CHILDRENS EGG CUP</td>\n",
       "      <td>12</td>\n",
       "      <td>2011-09-12 16:12:00</td>\n",
       "      <td>1.25</td>\n",
       "      <td>15673</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346336</th>\n",
       "      <td>567184</td>\n",
       "      <td>22173</td>\n",
       "      <td>METAL 4 HOOK HANGER FRENCH CHATEAU</td>\n",
       "      <td>1</td>\n",
       "      <td>2011-09-18 15:41:00</td>\n",
       "      <td>3.29</td>\n",
       "      <td>16033</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>493119</th>\n",
       "      <td>578155</td>\n",
       "      <td>22208</td>\n",
       "      <td>WOOD STAMP SET THANK YOU</td>\n",
       "      <td>1</td>\n",
       "      <td>2011-11-23 11:32:00</td>\n",
       "      <td>0.83</td>\n",
       "      <td>12748</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>406740</th>\n",
       "      <td>571828</td>\n",
       "      <td>22812</td>\n",
       "      <td>PACK 3 BOXES CHRISTMAS PANNETONE</td>\n",
       "      <td>8</td>\n",
       "      <td>2011-10-19 11:52:00</td>\n",
       "      <td>1.95</td>\n",
       "      <td>16440</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>482809</th>\n",
       "      <td>577484</td>\n",
       "      <td>23295</td>\n",
       "      <td>SET OF 12 MINI LOAF BAKING CASES</td>\n",
       "      <td>1</td>\n",
       "      <td>2011-11-20 11:52:00</td>\n",
       "      <td>0.83</td>\n",
       "      <td>13536</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>303813</th>\n",
       "      <td>563555</td>\n",
       "      <td>22201</td>\n",
       "      <td>FRYING PAN BLUE POLKADOT</td>\n",
       "      <td>1</td>\n",
       "      <td>2011-08-17 13:21:00</td>\n",
       "      <td>4.25</td>\n",
       "      <td>16755</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>525561</th>\n",
       "      <td>580632</td>\n",
       "      <td>23552</td>\n",
       "      <td>BICYCLE PUNCTURE REPAIR KIT</td>\n",
       "      <td>2</td>\n",
       "      <td>2011-12-05 12:16:00</td>\n",
       "      <td>2.08</td>\n",
       "      <td>16360</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>406377</th>\n",
       "      <td>571747</td>\n",
       "      <td>22585</td>\n",
       "      <td>PACK OF 6 BIRDY GIFT TAGS</td>\n",
       "      <td>12</td>\n",
       "      <td>2011-10-19 10:59:00</td>\n",
       "      <td>1.25</td>\n",
       "      <td>13849</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>344763</th>\n",
       "      <td>567097</td>\n",
       "      <td>23355</td>\n",
       "      <td>HOT WATER BOTTLE KEEP CALM</td>\n",
       "      <td>8</td>\n",
       "      <td>2011-09-16 13:23:00</td>\n",
       "      <td>4.95</td>\n",
       "      <td>13323</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       InvoiceNo StockCode                         Description  Quantity  \\\n",
       "174080    551739     22916                   HERB MARKER THYME         2   \n",
       "337560    566450     22977        DOLLY GIRL CHILDRENS EGG CUP        12   \n",
       "346336    567184     22173  METAL 4 HOOK HANGER FRENCH CHATEAU         1   \n",
       "493119    578155     22208            WOOD STAMP SET THANK YOU         1   \n",
       "406740    571828     22812    PACK 3 BOXES CHRISTMAS PANNETONE         8   \n",
       "482809    577484     23295    SET OF 12 MINI LOAF BAKING CASES         1   \n",
       "303813    563555     22201            FRYING PAN BLUE POLKADOT         1   \n",
       "525561    580632     23552        BICYCLE PUNCTURE REPAIR KIT          2   \n",
       "406377    571747     22585           PACK OF 6 BIRDY GIFT TAGS        12   \n",
       "344763    567097     23355          HOT WATER BOTTLE KEEP CALM         8   \n",
       "\n",
       "               InvoiceDate  UnitPrice  CustomerID         Country  \\\n",
       "174080 2011-05-04 10:58:00       0.65       18118  United Kingdom   \n",
       "337560 2011-09-12 16:12:00       1.25       15673  United Kingdom   \n",
       "346336 2011-09-18 15:41:00       3.29       16033  United Kingdom   \n",
       "493119 2011-11-23 11:32:00       0.83       12748  United Kingdom   \n",
       "406740 2011-10-19 11:52:00       1.95       16440  United Kingdom   \n",
       "482809 2011-11-20 11:52:00       0.83       13536  United Kingdom   \n",
       "303813 2011-08-17 13:21:00       4.25       16755  United Kingdom   \n",
       "525561 2011-12-05 12:16:00       2.08       16360  United Kingdom   \n",
       "406377 2011-10-19 10:59:00       1.25       13849  United Kingdom   \n",
       "344763 2011-09-16 13:23:00       4.95       13323  United Kingdom   \n",
       "\n",
       "        CustomerCommunityLabel  \n",
       "174080                       0  \n",
       "337560                       2  \n",
       "346336                       0  \n",
       "493119                       0  \n",
       "406740                       2  \n",
       "482809                       2  \n",
       "303813                       3  \n",
       "525561                       2  \n",
       "406377                       2  \n",
       "344763                       0  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_df['CustomerCommunityLabel'] = data_df.CustomerID.map(\n",
    "    lambda customer_id: community_labels[node_labels[customer_id]]\n",
    ")\n",
    "data_df.sample(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have clusters of customers who've bought similar products and can market to these interests. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
