Understanding spatial traits within the location of Tokyo comfort shops
When strolling round Tokyo you’ll typically go quite a few comfort shops, domestically often known as “konbinis”, which is sensible since there are over 56,000 comfort shops in Japan. Typically there can be totally different chains of comfort retailer situated very shut to 1 one other; it’s not unusual to see shops across the nook from one another or on reverse sides of the road. Given Tokyo’s inhabitants density, it’s comprehensible for competing companies to be pressured nearer to one another, nevertheless, may there be any relationships between which chains of comfort shops are discovered close to one another?
The objective can be to gather location knowledge from quite a few comfort retailer chains in a Tokyo neighbourhood, to grasp if there are any relationships between which chains are co-located with one another. To do that would require:
- Means to question the placement of various comfort shops in Tokyo, as a way to retrieve every retailer’s identify and site
- Discovering which comfort shops are co-located with one another inside a pre-defined radius
- Utilizing the info on co-located shops to derive affiliation guidelines
- Plotting and visualising outcomes for inspection
Let’s start!
For our use case we wish to discover comfort shops in Tokyo, so first we’ll have to perform a little homework on what are the widespread retailer chains. A fast Google search tells me that the principle shops are FamilyMart, Lawson, 7-Eleven, Ministop, Each day Yamazaki and NewDays.
Now we all know what we’re looking, lets go to OSMNX; an ideal Python bundle for looking knowledge in OpenStreetMap (OSM). In accordance the OSM’s schema, we should always be capable of discover the shop identify in both the ‘model:en’ or ‘model’ discipline.
We are able to begin by importing some helpful libraries for getting our knowledge, and defining a perform to return a desk of areas for a given comfort retailer chain inside a specified space:
import geopandas as gpd
from shapely.geometry import Level, Polygon
import osmnx
import shapely
import pandas as pd
import numpy as np
import networkx as nxdef point_finder(place, tags):
'''
Returns a dataframe of coordinates of an entity from OSM.
Parameters:
place (str): a location (i.e., 'Tokyo, Japan')
tags (dict): key worth of entity attribute in OSM (i.e., 'Identify') and worth (i.e., amenity identify)
Returns:
outcomes (DataFrame): desk of latitude and longitude with entity worth
'''
gdf = osmnx.geocode_to_gdf(place)
#Getting the bounding field of the gdf
bounding = gdf.bounds
north, south, east, west = bounding.iloc[0,3], bounding.iloc[0,1], bounding.iloc[0,2], bounding.iloc[0,0]
location = gdf.geometry.unary_union
#Discovering the factors throughout the space polygon
level = osmnx.geometries_from_bbox(north,
south,
east,
west,
tags=tags)
level.set_crs(crs=4326)
level = level[point.geometry.within(location)]
#Ensuring we're coping with factors
level['geometry'] = level['geometry'].apply(lambda x : x.centroid if kind(x) == Polygon else x)
level = level[point.geom_type != 'MultiPolygon']
level = level[point.geom_type != 'Polygon']
outcomes = pd.DataFrame({'identify' : record(level['name']),
'longitude' : record(level['geometry'].x),
'latitude' : record(level['geometry'].y)}
)
outcomes['name'] = record(tags.values())[0]
return outcomes
convenience_stores = place_finder(place = 'Shinjuku, Tokyo',
tags={"model:en" : " "})
We are able to go every comfort retailer identify and mix the outcomes right into a single desk of retailer identify, longitude and latitude. For our use case we are able to deal with the Shinjuku neighbourhood in Tokyo, and see what the abundance of every comfort retailer seems like:
Clearly FamilyMart and 7-Eleven dominate within the frequency of shops, however how does this look spatially? Plotting geospatial knowledge is fairly easy when utilizing Kepler.gl, which features a good interface for creating visualisations which may be saved as html objects or visualised straight in Jupyter notebooks:
Now that we’ve got our knowledge, the following step can be to seek out nearest neighbours for every comfort retailer. To do that, we can be utilizing Scikit Be taught’s ‘BallTree’ class to seek out the names of the closest comfort shops inside a two minute strolling radius. We’re not fascinated with what number of shops are thought-about nearest neighbours, so we are going to simply take a look at which comfort retailer chains are throughout the outlined radius.
# Convert location to radians
areas = convenience_stores[["latitude", "longitude"]].values
locations_radians = np.radians(areas)# Create a balltree to go looking areas
tree = BallTree(locations_radians, leaf_size=15, metric='haversine')
# Discover nearest neighbours in a 2 minute strolling radius
is_within, distances = tree.query_radius(locations_radians, r=168/6371000, count_only=False, return_distance=True)
# Exchange the neighbour indices with retailer names
df = pd.DataFrame(is_within)
df.columns = ['indices']
df['indices'] = [[val for val in row if val != idx] for idx, row in enumerate(df['indices'])]
# create short-term index column
convenience_stores = convenience_stores.reset_index()
# set short-term index column as index
convenience_stores = convenience_stores.set_index('index')
# create index-name mapping
index_name_mapping = convenience_stores['name'].to_dict()
# change index values with names and take away duplicates
df['indices'] = df['indices'].apply(lambda lst: record(set(map(index_name_mapping.get, set(lst)))))
# Append again to unique df
convenience_stores['neighbours'] = df['indices']
# Determine when a retailer has no neighbours
convenience_stores['neighbours'] = [lst if lst else ['no-neighbours'] for lst in convenience_stores['neighbours']]
# Distinctive retailer names
unique_elements = set([item for sublist in convenience_stores['neighbours'] for merchandise in sublist])
# Depend every shops frequency within the set of neighbours per location
counts = [dict(Counter(row)) for row in convenience_stores['neighbours']]
# Create a brand new dataframe with the counts
output_df = pd.DataFrame(counts).fillna(0)[sorted(unique_elements)]
If we wish to enhance the accuracy of our work, we may change the haversine distance measure for one thing extra correct (i.e., strolling occasions calculated utilizing networkx), however we’ll preserve issues easy.
This can give us a DataFrame the place every row corresponds to a location, and a binary rely of which comfort retailer chains are close by:
We now have a dataset able to carry out affiliation rule mining. Utilizing the mlxtend library we are able to derive affiliation guidelines utilizing the Apriori algorithm. There’s a minimal help of 5%, in order that we are able to study solely the foundations associated to frequent occurrences in our dataset (i.e., co-located comfort retailer chains). We use the metric ‘elevate’ when deriving guidelines; elevate is the ratio of the proportion of areas that comprise each the antecedent and consequent relative to the anticipated help below the belief of independence.
from mlxtend.frequent_patterns import association_rules, apriori# Calculate apriori
frequent_set = apriori(output_df, min_support = 0.05, use_colnames = True)
# Create guidelines
guidelines = association_rules(frequent_set, metric = 'elevate')
# Kind guidelines by the help worth
guidelines.sort_values(['support'], ascending=False)
This offers us the next outcomes desk:
We are going to now interpret these affiliation guidelines to make some excessive degree takeaway learnings. To interpret this desk its finest to learn extra about Affiliation Guidelines, utilizing these hyperlinks:
Okay, again to the desk.
Assist is telling us how typically totally different comfort retailer chains are literally discovered collectively. Due to this fact we are able to say that 7-Eleven and FamilyMart are discovered collectively in ~31% of the info. A elevate over 1 signifies that the presence of the antecedent will increase the chance of the ensuing, suggesting that the areas of the 2 chains are partially dependent. Alternatively, the affiliation between 7-Eleven and Lawson exhibits the next elevate however with a decrease confidence.
Each day Yamazaki has a low help close to our cutoff and exhibits a weak relationship with the placement of FamilyMart, given by a elevate barely above 1.
Different guidelines are referring to combos of comfort shops. For instance when a 7-Eleven and FamilyMart are already co-located, there’s a excessive elevate worth of 1.42 that implies a robust affiliation with Lawson.
If we had simply stopped at discovering the closest neighbours for every retailer location, we’d not have been in a position to decide something in regards to the relationships between these shops.
An instance of why geospatial affiliation guidelines may be insightful for companies is in figuring out new retailer areas. If a comfort retailer chain is opening a brand new location, affiliation guidelines may help to establish which shops are prone to co-occur.
The worth on this turns into clear when tailoring advertising and marketing campaigns and pricing methods, because it gives quantitative relationships about which shops are prone to compete. Since we all know that FamilyMart and 7-Eleven typically co-occur, which we exhibit with affiliation guidelines, it will make sense for each of those chains to pay extra consideration to how their merchandise compete relative to different chains similar to Lawson and Each day Yamazaki.
On this article we’ve got created geospatial affiliation guidelines for comfort retailer chains in a Tokyo neighbourhood. This was achieved utilizing knowledge extraction from OpenStreetMap, discovering nearest neighbour comfort retailer chains, visualising knowledge on maps, and creating affiliation guidelines utilizing an Apriori algorithm.
Thanks for studying!