Cutting Buildings
When building high quality POI (points of interest) maps, one thing that a consumer would want is high quality polygons for the places. Businesses keep moving, listings keep changing  buildings are relatively static in this otherwise dynamic world.
When you think about streams of data coming, you'd be wise to account for noise. Specially when it comes to POI data. Let's start by looking at a couple of malls:
If you are wondering what wkt
means. Wikipedia says:
Wellknown text (WKT) is a text markup language for representing vector geometry objects on a map, spatial reference systems of spatial objects and transformations between spatial reference systems.
It's essentially a way of representing 2D geometries. In our example, we only need to bother ourselves with POLYGON
. It's a list of (x, y)
coordinates which correspond to (longitude, latitude)
pairs. A convention that a lot geospatial libraries use is longitudefirst. To visualize the wkt
you can visualize it at this website.
As I have said before, these are the relatively static components, buildings rarely change. The things that do change often on the other hand are POI data, here's a made up example:
Let's take a look at how this would look like, with more (near reallife) number of places inside a mall:
The task at hand is to divide the building to accommodate the POI.
Voronoi diagram
One strategy that seems intuitive to start with, is that each point should own the area closest to it. This is where Voronoi comes in. It does exactly this.
In mathematics, a Voronoi diagram is a partitioning of a plane into regions based on distance to points in a specific subset of the plane. That set of points (called seeds, sites, or generators) is specified beforehand, and for each seed there is a corresponding region consisting of all points closer to that seed than to any other.
There are a lot of applications for Voronoi. It's awesome because:
 It's relatively fast O(nlog n) if we use Fortune's Algorithm.
 Not a lot of edge cases to handle.
 Fair splits for points.
Let's take a look at how the result looks like, if we apply Voronoi on our synthesized POI in buildings.
Does it mirror the real world?
While Voronoi cuts up the building in a reasonable way, it has a major drawback. It does not reflect how the building plans are laid out in the real world. This is because, it does not take the shape of the building into consideration. One of my colleague described the cut buildings as being very Picassoesque.
So, we need to think of this problem with some real world consideration. Data that we get from suppliers is besteffort. It is still noisy. So sometimes it make sense for us to start fresh.
Blank Slate
Let's start with a blank slate, in our case, blank buildings. Ignoring the points for a seconds, let's think about the following question: If you were to accommodate N stores inside the buildings. Assuming all the stores are the same (and no elevation), how would you divide the building?
I think a good place to start is splitting the building into block shapes, and minimizing the unused area. This is definitely easier said than done for a general building. We'll start and take a look at how involved the process is.
Algorithm Outline

We first need to find how the building is oriented, for this we use PCAPCA

We then try and cut the building into NN pieces, where we try to increase the mean area of each lot, under the constraint tha cuts can be made only parallel to the PCAaxes.
PCA
PCA finds a set of dimensions such that they are orthogonal and ranked according to the variance of data along them. This means that, more important principle axis occurs first (more important = more variance/more spread out data).
So, if we find the PCA for the building outline, we get a set of axes that we can then make blocks on. Let's look at the axes that we find for our buildings.
After the transformation, you can use dynamic programming to divide the polygon into contiguous blocks to allocate to the POI. You can see visualize it as the grids between the points.
Piecewise linear regression
One other interesting experiment we've tried is to use spline or piecewise linear regression on the points to see if placing the centroids of the POI on the line segments would lead to interesting outcomes. The extension to this experiment that I'd like to try is to buffer the boundaries of polygons inwards to see if that leads to better fits. It makes sense as we'd like to favor segments be inside the polygons rather than outside as you will see below.
Note: The blue line is the regular linear regression line.
You seldom operate with perfect data, it's critical to salvage the most of what you have. Hope you had fun reading this.I will be uploading the code to my github soon. Let me know if you think of any other techniques to tackle this problem.
A lot of ideas here are thanks to my colleagues at SafeGraph.