How to use R, QGIS, GRASS GIS, and PostGIS to identify neighborhoods and build polygon shapefiles
While reading The Image of the City, I realized that neighborhoods are really just streets. If streets have the same name within a similar location then for most purposes this should be accurate enough to create a voronoi diagram. I will show you how I’ve done this.
First, you will need lots of addresses. Luckily there is a database that is open and free and it has half a billion addresses. That’s not all of them but it is a good starting point.
We will then use R to do some data processing. You can use R with the open source version of RStudio. You don’t have to do it this way but it is a nice little development environment which has some convenient features.
You’ll start out with a file that looks like the grey dots in this picture:
(Tongyeong, South Gyeongsang Province)
Normalize data using R
In this step we’ll be extracting the data from the grey dots into the red dots of the previous picture.
Note that the OpenAddress data is not all the same so for each country you’ll need to adjust your “.R” script to get the right city/district/region column.
This is the code that I used to process the South Korea data. I was going to just paste the code directly here but the font on this website looks horrible so I will use github.
After you run the code you’ll end with a file that looks like this:
So they are still just points but the magic of running
aggregate() on latitude and longitude is that they are now the mean locations of all addresses along that road. So it should put the neighborhood points near the center of all the interesting things that are happening within that neighborhood.
We then use the voronoi function in QGIS or GRASS GIS to turn all those points into polygons. Both are freely available on all platforms. After running the algorithm you should end up with something that looks like this:
To get it to be even more accurate then you’ll want to clip the generated voronoi with a shapefile of the administrative boundary for the country or region that you are looking at. Then you’ll end up with something like this:
For processing United States data the process had to change a little bit because it was the data was in a different format and there was more of it. I changed it so that it will process individual files first and then group them together. Originally I was processing and grouping them within the same script.
After running the first script then move all the files that start with ‘p.’ into a folder named ‘pfiles’ and run the second script. I still haven’t been able to create the voronoi for all the points in the US though because there are so many and I gave up after 12 hours of running the algorithm on my laptop. I’ll get around to it one of these days though.
You can also do the same things within Oracle Spatial and Graph, ArcGIS, and PostGIS. It’s fundamentally similar and you should have no problems translating the above to those programs.