{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Use Case Tutorial 1: Well-Connected US Regions\n", "\n", "This is a tutorial on how to find the most well-connected regions of the U.S. via air travel.\n", "\n", "The U.S. Bureau of Transportation Statistics provides data on monthly air travel from all certificated U.S. air carriers and makes it available [here](https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=258). The 2018 air travel data used for this tutorial can be downloaded [here](https://transtats.bts.gov/ftproot/TranStatsData/403537556_T_T100D_MARKET_US_CARRIER_ONLY.zip). We chose 2018 data to avoid any impact COVID-19 might’ve had on travel.\n", "\n", "We will utilize this data to determine which areas in the U.S. are most well-connected using betweenness centrality." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Preprocessing\n", "\n", "Let’s first look at the data.\n", "\n", "First, we’ll need to import some libraries." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import metagraph as mg\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s see what the data looks like." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PASSENGERSFREIGHTMAILDISTANCEUNIQUE_CARRIERAIRLINE_IDUNIQUE_CARRIER_NAMEORIGIN_AIRPORT_IDORIGIN_AIRPORT_SEQ_IDORIGIN_CITY_MARKET_ID...DEST_AIRPORT_SEQ_IDDEST_CITY_MARKET_IDDESTDEST_CITY_NAMEDEST_STATE_ABRDEST_STATE_FIPSDEST_STATE_NMDEST_WACMONTHUnnamed: 26
00.0410.00.0616.0WN19393.0Southwest Airlines Co.13851138510333851...106930230693BNANashville, TNTN47Tennessee546NaN
10.0184.00.02592.0WN19393.0Southwest Airlines Co.14307143070530721...128920832575LAXLos Angeles, CACA6California916NaN
20.087.00.02445.0WN19393.0Southwest Airlines Co.14679146790333570...102570230257ALBAlbany, NYNY36New York226NaN
30.010.00.0432.0WN19393.0Southwest Airlines Co.14730147300333044...129920632600LITLittle Rock, ARAR5Arkansas716NaN
40.0100.00.0129.0WN19393.0Southwest Airlines Co.14747147470330559...140570234057PDXPortland, OROR41Oregon926NaN
\n", "

5 rows × 27 columns

\n", "
" ], "text/plain": [ " PASSENGERS FREIGHT MAIL DISTANCE UNIQUE_CARRIER AIRLINE_ID \\\n", "0 0.0 410.0 0.0 616.0 WN 19393.0 \n", "1 0.0 184.0 0.0 2592.0 WN 19393.0 \n", "2 0.0 87.0 0.0 2445.0 WN 19393.0 \n", "3 0.0 10.0 0.0 432.0 WN 19393.0 \n", "4 0.0 100.0 0.0 129.0 WN 19393.0 \n", "\n", " UNIQUE_CARRIER_NAME ORIGIN_AIRPORT_ID ORIGIN_AIRPORT_SEQ_ID \\\n", "0 Southwest Airlines Co. 13851 1385103 \n", "1 Southwest Airlines Co. 14307 1430705 \n", "2 Southwest Airlines Co. 14679 1467903 \n", "3 Southwest Airlines Co. 14730 1473003 \n", "4 Southwest Airlines Co. 14747 1474703 \n", "\n", " ORIGIN_CITY_MARKET_ID ... DEST_AIRPORT_SEQ_ID DEST_CITY_MARKET_ID DEST \\\n", "0 33851 ... 1069302 30693 BNA \n", "1 30721 ... 1289208 32575 LAX \n", "2 33570 ... 1025702 30257 ALB \n", "3 33044 ... 1299206 32600 LIT \n", "4 30559 ... 1405702 34057 PDX \n", "\n", " DEST_CITY_NAME DEST_STATE_ABR DEST_STATE_FIPS DEST_STATE_NM DEST_WAC \\\n", "0 Nashville, TN TN 47 Tennessee 54 \n", "1 Los Angeles, CA CA 6 California 91 \n", "2 Albany, NY NY 36 New York 22 \n", "3 Little Rock, AR AR 5 Arkansas 71 \n", "4 Portland, OR OR 41 Oregon 92 \n", "\n", " MONTH Unnamed: 26 \n", "0 6 NaN \n", "1 6 NaN \n", "2 6 NaN \n", "3 6 NaN \n", "4 6 NaN \n", "\n", "[5 rows x 27 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "RAW_DATA_CSV = './data/airtravel/raw_data.csv' # https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=258\n", "raw_data_df = pd.read_csv(RAW_DATA_CSV)\n", "raw_data_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A city market is a region that an airport supports. For example, New York City has many airports (and it’s sometimes cheaper to fly into and out of different airports), but all of their airports serve the same region / city market.\n", "\n", "Since we’re mostly concerned with where passengers will end up going (and not which airport they choose), we will view city markets as the regions of interest.\n", "\n", "We will define a region as being well-connected if many people travel in and out of it.\n", "\n", "Let’s filter out all the irrelevant information not required for finding the well-connected regions and any flight paths with zero passengers (these flights are usually flights transporting packages)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PASSENGERSORIGIN_AIRPORT_IDORIGIN_AIRPORT_SEQ_IDORIGIN_CITY_MARKET_IDORIGINORIGIN_CITY_NAMEORIGIN_STATE_ABRORIGIN_STATE_NMDEST_AIRPORT_IDDEST_AIRPORT_SEQ_IDDEST_CITY_MARKET_IDDESTDEST_CITY_NAMEDEST_STATE_ABRDEST_STATE_NM
444471.012523125230632523JNUJuneau, AKAKAlaska11545115450131545ELVElfin Cove, AKAKAlaska
444481.012523125230632523JNUJuneau, AKAKAlaska11619116190231619EXIExcursion Inlet, AKAKAlaska
444491.012610126100132610KAEKake, AKAKAlaska10204102040130204AGNAngoon, AKAKAlaska
444501.011298112980630194DFWDallas/Fort Worth, TXTXTexas11292112920230325DENDenver, COCOColorado
444511.015991159910235991YAKYakutat, AKAKAlaska14828148280534828SITSitka, AKAKAlaska
\n", "
" ], "text/plain": [ " PASSENGERS ORIGIN_AIRPORT_ID ORIGIN_AIRPORT_SEQ_ID \\\n", "44447 1.0 12523 1252306 \n", "44448 1.0 12523 1252306 \n", "44449 1.0 12610 1261001 \n", "44450 1.0 11298 1129806 \n", "44451 1.0 15991 1599102 \n", "\n", " ORIGIN_CITY_MARKET_ID ORIGIN ORIGIN_CITY_NAME ORIGIN_STATE_ABR \\\n", "44447 32523 JNU Juneau, AK AK \n", "44448 32523 JNU Juneau, AK AK \n", "44449 32610 KAE Kake, AK AK \n", "44450 30194 DFW Dallas/Fort Worth, TX TX \n", "44451 35991 YAK Yakutat, AK AK \n", "\n", " ORIGIN_STATE_NM DEST_AIRPORT_ID DEST_AIRPORT_SEQ_ID \\\n", "44447 Alaska 11545 1154501 \n", "44448 Alaska 11619 1161902 \n", "44449 Alaska 10204 1020401 \n", "44450 Texas 11292 1129202 \n", "44451 Alaska 14828 1482805 \n", "\n", " DEST_CITY_MARKET_ID DEST DEST_CITY_NAME DEST_STATE_ABR \\\n", "44447 31545 ELV Elfin Cove, AK AK \n", "44448 31619 EXI Excursion Inlet, AK AK \n", "44449 30204 AGN Angoon, AK AK \n", "44450 30325 DEN Denver, CO CO \n", "44451 34828 SIT Sitka, AK AK \n", "\n", " DEST_STATE_NM \n", "44447 Alaska \n", "44448 Alaska \n", "44449 Alaska \n", "44450 Colorado \n", "44451 Alaska " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "RELEVANT_COLUMNS = [\n", " 'PASSENGERS',\n", " 'ORIGIN_AIRPORT_ID', 'ORIGIN_AIRPORT_SEQ_ID', 'ORIGIN_CITY_MARKET_ID', 'ORIGIN', 'ORIGIN_CITY_NAME', 'ORIGIN_STATE_ABR', 'ORIGIN_STATE_NM',\n", " 'DEST_AIRPORT_ID', 'DEST_AIRPORT_SEQ_ID', 'DEST_CITY_MARKET_ID', 'DEST', 'DEST_CITY_NAME', 'DEST_STATE_ABR', 'DEST_STATE_NM',\n", "]\n", "relevant_df = raw_data_df[RELEVANT_COLUMNS]\n", "relevant_df = relevant_df[relevant_df.PASSENGERS != 0.0]\n", "relevant_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We’ll want to have our data in an edge list format where the city markets are the nodes so that we can import this data into `metagraph`.\n", "\n", "We’ll use betweenness centrality to determine connectedness since it is a metric of how many shortest paths go through a node. In order to use betweenness centrality effectively for our goal, we’ll want paths with less total weight to be the ones denoting paths with more passengers. More elegant metrics might be considered in practice, but we’ll use `1/number_of_passengers` for the weights in this tutorial for the sake of simplicity.\n", "\n", "We’ll create an edge list with such weights using pandas." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ORIGIN_CITY_MARKET_IDDEST_CITY_MARKET_IDPASSENGERSINVERSE_PASSENGER_COUNT
030005303494.00.250000
1300053121410.00.100000
23000531517193.00.005181
330005357317.00.142857
430006300565.00.200000
\n", "
" ], "text/plain": [ " ORIGIN_CITY_MARKET_ID DEST_CITY_MARKET_ID PASSENGERS \\\n", "0 30005 30349 4.0 \n", "1 30005 31214 10.0 \n", "2 30005 31517 193.0 \n", "3 30005 35731 7.0 \n", "4 30006 30056 5.0 \n", "\n", " INVERSE_PASSENGER_COUNT \n", "0 0.250000 \n", "1 0.100000 \n", "2 0.005181 \n", "3 0.142857 \n", "4 0.200000 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "passenger_flow_df = relevant_df[['ORIGIN_CITY_MARKET_ID', 'DEST_CITY_MARKET_ID', 'PASSENGERS']]\n", "passenger_flow_df = passenger_flow_df.groupby(['ORIGIN_CITY_MARKET_ID', 'DEST_CITY_MARKET_ID']) \\\n", " .PASSENGERS.sum() \\\n", " .reset_index()\n", "passenger_flow_df['INVERSE_PASSENGER_COUNT'] = passenger_flow_df.PASSENGERS.map(lambda passenger_count: 1/passenger_count)\n", "assert len(passenger_flow_df[passenger_flow_df.INVERSE_PASSENGER_COUNT != passenger_flow_df.INVERSE_PASSENGER_COUNT]) == 0, \"Edge list has NaN weights.\"\n", "passenger_flow_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the data has city market IDs and don’t have names because an airport can serve regions containing multiple cities, it’d be useful to get a mapping from city market IDs to city names and airports so that we can contextualize our findings." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AIRPORTCITY_NAME
CITY_MARKET_ID
30005{05A}{Little Squaw, AK}
30006{06A}{Kizhuyak, AK}
30007{KLW}{Klawock, AK}
30009{HOM, 09A}{Homer, AK}
30010{1B1}{Hudson, NY}
\n", "
" ], "text/plain": [ " AIRPORT CITY_NAME\n", "CITY_MARKET_ID \n", "30005 {05A} {Little Squaw, AK}\n", "30006 {06A} {Kizhuyak, AK}\n", "30007 {KLW} {Klawock, AK}\n", "30009 {HOM, 09A} {Homer, AK}\n", "30010 {1B1} {Hudson, NY}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "origin_city_market_id_info_df = relevant_df[['ORIGIN_CITY_MARKET_ID', 'ORIGIN', 'ORIGIN_CITY_NAME']] \\\n", " .rename(columns={'ORIGIN_CITY_MARKET_ID': 'CITY_MARKET_ID',\n", " 'ORIGIN': 'AIRPORT',\n", " 'ORIGIN_CITY_NAME': 'CITY_NAME'})\n", "dest_city_market_id_info_df = relevant_df[['DEST_CITY_MARKET_ID', 'DEST', 'DEST_CITY_NAME']] \\\n", " .rename(columns={'DEST_CITY_MARKET_ID': 'CITY_MARKET_ID',\n", " 'DEST': 'AIRPORT',\n", " 'DEST_CITY_NAME': 'CITY_NAME'})\n", "city_market_id_info_df = pd.concat([origin_city_market_id_info_df, dest_city_market_id_info_df])\n", "city_market_id_info_df = city_market_id_info_df.groupby('CITY_MARKET_ID').agg({'AIRPORT': set, 'CITY_NAME': set})\n", "city_market_id_info_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Which region is travelled through the most?\n", "\n", "We’re going to determine which region is travelled through the most using betweenness centrality as it measures exactly that. There are a variety of algorithms to choose from, but we’ll stick to using solely betweenness centrality for this tutorial.\n", "\n", "We’ll first create a metagraph graph for the data." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "passenger_flow_edge_map = mg.wrappers.EdgeMap.PandasEdgeMap(passenger_flow_df,\n", " 'ORIGIN_CITY_MARKET_ID', \n", " 'DEST_CITY_MARKET_ID', \n", " 'INVERSE_PASSENGER_COUNT',\n", " is_directed=True)\n", "passenger_flow_graph = mg.algos.util.graph.build(passenger_flow_edge_map)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we use the inverse passenger count as the weights to ensure that the shortest paths are the paths that have the most passengers.\n", "\n", "Let’s calculate the betweenness centrality." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "betweenness_centrality = mg.algos.centrality.betweenness(passenger_flow_graph, normalize=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s look at the results and find the highest scores (which would give us the city market IDs that are most travelled through)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "number_of_best_scores = 15\n", "best_betweenness_centrality_node_vector = mg.algos.util.nodemap.sort(betweenness_centrality, ascending=False, limit=number_of_best_scores)\n", "best_betweenness_centrality_node_set = mg.algos.util.nodeset.from_vector(best_betweenness_centrality_node_vector)\n", "best_betweenness_centrality_node_to_score_map = mg.algos.util.nodemap.select(betweenness_centrality, best_betweenness_centrality_node_set)\n", "best_betweenness_centrality_node_to_score_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have a mapping between city market IDs and their centrality scores in `best_betweenness_centrality_node_to_score_map`, which is a `NumpyNodeMap`. Since `NumpyNodeMap` stores it's mapping in a non-trivial fashion for performance reasons, it's non-trivial to inspect its internals to view the mapping's values. Luckily, metagraph allows us to translate it to a Python dictionary, which is significantly easier to inspect." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{30070: 62402.0,\n", " 30113: 75327.0,\n", " 30154: 56833.0,\n", " 30194: 121807.0,\n", " 30299: 349232.0,\n", " 30325: 107586.0,\n", " 30397: 144922.0,\n", " 30466: 45699.0,\n", " 30559: 465677.0,\n", " 30977: 206250.0,\n", " 31517: 90409.0,\n", " 31703: 337885.0,\n", " 32457: 46094.0,\n", " 32467: 48068.0,\n", " 32575: 494817.0}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "best_betweenness_centrality_node_to_score_map = mg.translate(best_betweenness_centrality_node_to_score_map, mg.types.NodeMap.PythonNodeMapType)\n", "best_betweenness_centrality_node_to_score_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have the city market IDs with the best scores, let’s find out which regions those city market IDs correspond to using the mapping from city market IDs to city names and airports we made earlier." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BETWEENNESS_CENTRALITY_SCOREAIRPORTCITY_NAME
CITY_MARKET_ID
32575494817.0{LAX, SMO, SNA, HHR, LGB, BUR, ONT, VNY}{Santa Ana, CA, Los Angeles, CA, Van Nuys, CA,...
30559465677.0{BFI, SEA, LKE, KEH}{Kenmore, WA, Seattle, WA}
30299349232.0{ANC, DQL, MRI}{Anchorage, AK}
31703337885.0{LGA, ISP, EWR, JRB, HPN, JRA, JFK, TSS, SWF}{Islip, NY, New York, NY, Newark, NJ, Newburgh...
30977206250.0{LOT, GYY, ORD, PWK, DPA, MDW}{Chicago/Romeoville, IL, Chicago, IL, Gary, IN}
30397144922.0{FTY, ATL, PDK, QMA}{Kennesaw, GA, Atlanta, GA}
30194121807.0{RBD, ADS, FWH, FTW, AFW, DAL, DFW}{Dallas/Fort Worth, TX, Dallas, TX, Fort Worth...
30325107586.0{APA, DEN}{Denver, CO}
3151790409.0{FBK, EIL, MTX, A01, FAI}{Fairbanks/Ft. Wainwright, AK, Fairbanks, AK}
3011375327.0{BET}{Bethel, AK}
3007062402.0{KDK, ADQ}{Kodiak, AK}
3015456833.0{ACK}{Nantucket, MA}
3246748068.0{FXE, FLL, OPF, TMB, MIA, MPB}{Miami, FL, Fort Lauderdale, FL}
3245746094.0{OAK, CCR, SFO, SJC}{San Jose, CA, San Francisco, CA, Oakland, CA,...
3046645699.0{AZA, AZ3, PHX, GYR, SCF}{Goodyear, AZ, Phoenix, AZ, Glendale, AZ}
\n", "
" ], "text/plain": [ " BETWEENNESS_CENTRALITY_SCORE \\\n", "CITY_MARKET_ID \n", "32575 494817.0 \n", "30559 465677.0 \n", "30299 349232.0 \n", "31703 337885.0 \n", "30977 206250.0 \n", "30397 144922.0 \n", "30194 121807.0 \n", "30325 107586.0 \n", "31517 90409.0 \n", "30113 75327.0 \n", "30070 62402.0 \n", "30154 56833.0 \n", "32467 48068.0 \n", "32457 46094.0 \n", "30466 45699.0 \n", "\n", " AIRPORT \\\n", "CITY_MARKET_ID \n", "32575 {LAX, SMO, SNA, HHR, LGB, BUR, ONT, VNY} \n", "30559 {BFI, SEA, LKE, KEH} \n", "30299 {ANC, DQL, MRI} \n", "31703 {LGA, ISP, EWR, JRB, HPN, JRA, JFK, TSS, SWF} \n", "30977 {LOT, GYY, ORD, PWK, DPA, MDW} \n", "30397 {FTY, ATL, PDK, QMA} \n", "30194 {RBD, ADS, FWH, FTW, AFW, DAL, DFW} \n", "30325 {APA, DEN} \n", "31517 {FBK, EIL, MTX, A01, FAI} \n", "30113 {BET} \n", "30070 {KDK, ADQ} \n", "30154 {ACK} \n", "32467 {FXE, FLL, OPF, TMB, MIA, MPB} \n", "32457 {OAK, CCR, SFO, SJC} \n", "30466 {AZA, AZ3, PHX, GYR, SCF} \n", "\n", " CITY_NAME \n", "CITY_MARKET_ID \n", "32575 {Santa Ana, CA, Los Angeles, CA, Van Nuys, CA,... \n", "30559 {Kenmore, WA, Seattle, WA} \n", "30299 {Anchorage, AK} \n", "31703 {Islip, NY, New York, NY, Newark, NJ, Newburgh... \n", "30977 {Chicago/Romeoville, IL, Chicago, IL, Gary, IN} \n", "30397 {Kennesaw, GA, Atlanta, GA} \n", "30194 {Dallas/Fort Worth, TX, Dallas, TX, Fort Worth... \n", "30325 {Denver, CO} \n", "31517 {Fairbanks/Ft. Wainwright, AK, Fairbanks, AK} \n", "30113 {Bethel, AK} \n", "30070 {Kodiak, AK} \n", "30154 {Nantucket, MA} \n", "32467 {Miami, FL, Fort Lauderdale, FL} \n", "32457 {San Jose, CA, San Francisco, CA, Oakland, CA,... \n", "30466 {Goodyear, AZ, Phoenix, AZ, Glendale, AZ} " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "best_betweenness_centrality_scores_df = pd.DataFrame(best_betweenness_centrality_node_to_score_map.items()).rename(columns={0:'CITY_MARKET_ID', 1:'BETWEENNESS_CENTRALITY_SCORE'}).set_index('CITY_MARKET_ID')\n", "best_betweenness_centrality_scores_df.join(city_market_id_info_df).sort_values('BETWEENNESS_CENTRALITY_SCORE', ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is what we'd expect. Highly populated areas like Los Angeles are the most traveled through areas.\n", "\n", "However, it's surprising that Anchorage is more travelled through than a hub like Dallas!\n", "\n", "There’s a good explanation for Anchorage being a very travelled through region: Since Alaska is so sparsely populated, a well-connected road infrastructure was never built. Thus, to travel between cities in Alaska, air travel is often the only option. More information can be found [here](https://en.wikipedia.org/wiki/List_of_airports_in_Alaska)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" }, "nbsphinx": { "execute": "never" } }, "nbformat": 4, "nbformat_minor": 4 }