top of page

Maps & Data

Methodology Map.jpg
Properties High-Low Cluster-Outlier.jpg
Land Use Buffer Analysis.jpg
OLS Residual Map 1.jpg

Given the theoretical background, it was necessary to find relevant data. The data used for this research was obtained through the Massachusetts State GIS online resources. The following data layers were used: Mass. Boundaries, Land Use, Roadways, Rail Lines; Point Layers for Employers, Transit Stops, Highway Exits; Mass. Real Verified Property Sales (2004-2012); Mass. Census Tracts 2010. Above shows the different maps that were created for each part of the analysis. Essentially, the main question here deals with the effect of transportation features on the demand for housing.

 

The property sales layer contains the assessment and sale price for all the properties sold over eight years. Sale prices thus act as an indicator for housing demand and this provides the basis for the development of a regression model. First, in order to spatially understand the distribution of property sales throughout the Boston Metropolitan Area, a High-Low Cluster/Outlier test was run on the sales price data. This test has the capability to assess clustering on an attribute basis rather than a spatial one. By visualizing the areas where high and low sales prices cluster, this shows where housing demand may be the highest and lowest. The Cluster/Outlier map shows significant High-High clustering in the western suburbs like Weston, Newton, Concord, etc. and similarly along the coastlines. What this map shows are where the most expensive and least expensive properties are concentrated. Since the Low-High outliers only appear in the heart of Downtown Boston, this means that property prices in the central business district (CBD) are the highest, which reaffirms the monocentric city model and the inverse relationship between distance from the CBD and land rents. But the question as to what exactly causes this spatial clustering must be further understood. 

 

The cluster/outliermap shows hot and cold spots, which can now be spatially related to other features that have an effect on housing demand. High-High clusters along the seashore, for example, are most likely the result of the high premium on oceanfront property. But from Boston to MetroWest, the High-High clustering does not have an obvious reason other than general knowledge that MetroWest tends to be an affluent area. In order to begin the next step of analysis, the multi-ring buffer was used to clip land use features of interest (residential, commercial, industrial). The mapping of land use shows the location of commercial centers (red shading) and also the housing density in an around the city. A few visual takeaways are that commercial land use almost always locates along a major roadway or transit center and the housing density decreases by ring 3, making way for mainly low-density residential outside of the third ring. The monocentric city model continues to hold, in that the higher demand for land closer to the CBD results in high-density living.

 

In addition to land use, transportation and employment features were considered, especially when it came to the spatial distribution of property sale prices. After quantifying these features within each analysis ring, this continues to hone in on Boston's transportation and housing demands at varying distances from the city center. Particularly, it will offer a comparison between the current distribution of transport services and the potential demand.

 

 

 

Regression Model Applications

 

Formal and informal regression analysis was done in this project to more precisely model transportation and housing demand drivers. The last map in the above slideshow, "OLS Residual Map 1," shows the results of an OLS regression model with the following specification:

 

Total Property Sales in Census Tract = Census Tract Population + Average Assessment Price + Area of Tract  

 

Each of these independent variables arguably impacts the total sales count. This initial regression was significant and the population of the tract had the greatest positive effect on total sales. Areas shaded in red and blue represent regions where the observed values are above or below, respectively, the expected value for that location. The key interpretation here is that the current model does not fully address the drivers of property sales in these shaded regions and there are other features or phenomena at work. 

 

In terms of the regressors, it was easier to think of the "optimal" regression first, given perfect data, and then manipulate the datasets to build an equation. Another conceptualization defines the dependent variable as the average property sale price for a specific location and holding constant demand drivers like transportation accessibility and proximity to job and leisure activities. An analysis can be conducted on a town level, or even by census tracts, but for the purpose of this research, regression models will be fitted to each analysis ring. Ultimately, the development of a significant and inclusive model requires the accurate assessment of the issue of transportation demand. 

 

 

 

bottom of page