More Statistics on Sacramento Housing

In a previous post, I showed a pretty simple regression analysis of housing prices and house size for my Zip code. The zip code was used as an easy way to include location in the output. Using PostGIS and geographic data from the City of Sacramento, this post will show a regression analysis ( using the R statistical programming project) using the city’s designated neighborhoods. The raw data real estate data comes from the Sacramento Bee. After describing the model, I’ll apply it the last few months of home sales (not used in developing the model), and see how well it does at predicting results.   

The model is based on the number of bedrooms, size in square feet, and neighborhood the house is in.  Including the number of bathrooms had no impact on improving the explanatory power of the model.

Coefficient Std. Error t
Value
Pr(>|t|) Significance
(Intercept) 11,775 50,759 0.232 0.82
bed −9,222 2,365 −3.899 0.00 ***
size 87 4 20.476 0.00 ***
Alhambra Triangle 20,481 71,523 0.286 0.77
Alkali Flat 98,118 87,531 1.121 0.26
Avondale 22,051 52,208 0.422 0.67
Ben
Ali
33,502 58,389 0.574 0.57
Boulevard Park 200,821 71,501 2.809 0.01 **
Brentwood 31,761 53,817 0.59 0.56
Campus Commons 156,432 53,600 2.918 0.00 **
Cannon Industrial Park −39,781 71,512 −0.556 0.58
Carleton Tract 54,935 61,916 0.887 0.38
Central Oak Park −4,059 51,335 −0.079 0.94
College/Glen 63,569 51,705 1.229 0.22
Colonial Heights 34,289 53,286 0.643 0.52
Colonial Manor 21,227 52,915 0.401 0.69
Colonial Village 16,931 51,730 0.327 0.74
Creekside 22,176 51,819 0.428 0.67
Curtis Park 142,551 53,202 2.679 0.01 **
Del
Paso Heights
−5,721 52,253 −0.109 0.91
Downtown 123,107 58,433 2.107 0.04 *
East Del Paso Heights −18,106 51,520 −0.351 0.73
East Sacramento 271,534 51,255 5.298 0.00 ***
Elmhurst 141,890 55,407 2.561 0.01 *
Fairgrounds 106,751 87,565 1.219 0.22
Florin Fruitridge Industrial Park 1,188,225 87,657 13.555 0.00 ***
Freeport Manor 26,348 54,612 0.482 0.63
Fruitridge Manor 6,308 51,589 0.122 0.90
Gardenland 4,246 54,042 0.079 0.94
Gateway West 61,528 51,260 1.2 0.23
Glen Elder 22,371 53,281 0.42 0.67
Glenwood Meadows 12,106 52,205 0.232 0.82
Golf Course Terrace 6,828 51,980 0.131 0.90
Greenhaven 119,278 52,314 2.28 0.02 *
Hagginwood 3,271 51,434 0.064 0.95
Heritage Park 46,186 52,024 0.888 0.37
Hollywood Park 110,589 53,821 2.055 0.04 *
Johnson Heights −16,810 87,547 −0.192 0.85
Land Park 246,520 52,891 4.661 0.00 ***
Lawrence Park −7,709 58,364 −0.132 0.89
Little Pocket 84,943 62,215 1.365 0.17
Mangan Park 67,665 61,896 1.093 0.27
Mansion Flats 132,978 57,398 2.317 0.02 *
Marshall School 254,363 57,398 4.432 0.00 ***
Meadowview 11,394 50,780 0.224 0.82
Med
Center
116,584 55,934 2.084 0.04 *
Metro Center 67,041 65,325 1.026 0.30
Midtown / Winn Park /Capital Avenue 225,673 62,032 3.638 0.00 ***
Natomas Creek 65,283 51,367 1.271 0.20
Natomas Crossing 45,258 52,269 0.866 0.39
Natomas Park 66,546 51,387 1.295 0.20
New
Era Park
175,602 62,098 2.828 0.00 **
Newton Booth 232,174 65,360 3.552 0.00 ***
<none> 60,292 55,908 1.078 0.28
Noralto 8,362 53,284 0.157 0.88
North City Farms 35,728 53,016 0.674 0.50
Northgate 34,770 52,611 0.661 0.51
North Oak Park 26,439 52,270 0.506 0.61
Northpointe 6,774 54,379 0.125 0.90
Oak
Knoll
47,911 59,812 0.801 0.42
Old
North Sacramento
−23,970 53,159 −0.451 0.65
Old
Sacramento
163,225 87,657 1.862 0.06 .
Parker Homes 13,734 58,360 0.235 0.81
Parkway 3,678 51,142 0.072 0.94
Pocket 106,144 51,254 2.071 0.04 *
Raley Industrial Park 16,773 57,312 0.293 0.77
Regency Park 60,649 51,437 1.179 0.24
Richardson Village 21,114 55,362 0.381 0.70
River Gardens 30,205 56,515 0.534 0.59
River Park 197,027 52,929 3.722 0.00 ***
Robla 8,525 51,635 0.165 0.87
RP
Sports Complex
20,782 54,623 0.38 0.70
Sierra Oaks 215,607 56,732 3.8 0.00 ***
South City Farms 21,527 55,405 0.389 0.70
Southeast Village 5,970 52,907 0.113 0.91
South Hagginwood −8,753 51,696 −0.169 0.87
South Land Park 153,143 51,730 2.96 0.00 **
South Natomas 18,966 50,838 0.373 0.71
South Oak Park −7,677 51,940 −0.148 0.88
Southside Park 61,911 87,577 0.707 0.48
Strawberry Manor 10,889 52,524 0.207 0.84
Sundance Lake 54,388 52,921 1.028 0.30
Swanston Estates 49,717 53,604 0.927 0.35
Tahoe Park 70,237 51,977 1.351 0.18
Tahoe Park East 12,716 71,477 0.178 0.86
Tahoe Park South 82,362 54,625 1.508 0.13
Tallac Village 10,730 53,431 0.201 0.84
Upper Land Park 270,586 71,548 3.782 0.00 ***
Valley Hi / North Laguna 21,471 50,755 0.423 0.67
Village 12 70,025 52,229 1.341 0.18
Village 14 −92,908 71,715 −1.296 0.20
Village 7 70,633 51,926 1.36 0.17
West Del Paso Heights −344 52,897 −0.006 0.99
Westlake 55,083 51,816 1.063 0.29
West Tahoe Park 83,351 57,350 1.453 0.15
Willowcreek 54,207 51,955 1.043 0.30
Wills Acres 10,654 55,358 0.192 0.85
Woodbine −19,783 53,807 −0.368 0.71
Woodlake 71,336 61,928 1.152 0.25
Youngs Heights −21,183 61,896 −0.342 0.73
Z’berg Park 59,836 57,372 1.043 0.30

Residual standard error: 71470 on 2693 degrees of freedom

Multiple R-squared: 0.5953, Adjusted R-squared: 0.5803

F-statistic: 39.61 on 100 and 2693 DF, p-value: < 2.2e-16

So what do all the numbers mean?

Size and location are the two most important factors in the data determining house price. Not all locations statistically impact the sales price. In fact, only 21 of the 99 locations are statistically signficant as marked by asterisks. The more asterisks, the more statistically significant. The analysis relies on the city’s definition of neighborhoods which are radically different in size and number of houses. This variance means that some of the statistically significant neighborhoods are significant because a very small number of homes (less than 10) sold. But some of the best known neighborhoods, East Sacramento, Land Park, and the Pocket all both positively impact the price of a home and have a significant number of sales.

For ever additional square foot of space, the price goes up by $87 dollars. 87 is the coefficient by which the size the house is multiplied by. So for a 1400 square foot home, the size of the house adds $121,800 to the price of the home. The standard error amounts to a range that the model supports. With standard error of +- 4, the value of the aforementioned house could vary between $116,200 and $127,000. To predict the value of another home based on this model, the values for bed, size, and neighborhood by their respective coefficients to determine the mean value of the prediction.

One interesting part of the output is the coefficient for the number of bedrooms. A coefficient of -9,222 suggests that a home’s price could be improved simply by knocking down walls creating bedrooms. This is clearly not the case. The reason the regression analysis produced a negative coefficient in this case is that the number of bedrooms is closely correlated with size. This situation is called: multicollinearity. While multicollinarity makes it hard to understand how the component contributes to the result, it does improve an estimate or forecast based on the model.

Overall, the model is quite accurate. The adjusted R^2 .5803 means that the model explains 58% of the variation in house prices. The remaining 42% of the variation in a house’s price are things like lot size or quality.

To demonstrate the model, I’ve nearly randomly picked  a house from the ones on sale on Yahoo’s real estate site. The house is 1482 square feet, with 3 bedrooms, and is in the East Sacramento neighborhood.

To translate these statistics into an estimate the formula:

Intercept Coefficient +# of bedrooms * bed coefficient + size coefficient * size + neighborhood value

Which is:

11,775 + 3 * −9,222 + 1482 * 87 + 271,534 = 384,577

One important thing to understanding the output from a regression analysis is that while it is often expressed as rather precise result, in this case $384,577, a better way to interpret the results is to use the standard error of the coefficient to produce a range of output.  The high of the range is:

Intercept Coefficient +# of bedrooms * (bed coefficient + standard error) + size coefficient * (size + standard error) + neighborhood value + standard error

Or in numbers:

11,775 + 3 * (−9,222 + 2,365) + 1482 * (87 + 4) + (271,534 + 51,255) = 448,875

The low of the range is calculated by subtracted the standard error from the coefficient:

11,775 + 3 * (−9,222 – 2,365) + 1482 * (87 – 4) + (271,534 – 51,255) = 320, 299

So the model’s range is $448k to $320k with a likely outcome of $384k.  Zillow’s estimate for the property is $331,800 and reports an asking price of $357,950.   For this house, both are within the range, if on the low end.  I think the Zillow zEstimate is a bit misleading, since they only give a number and not a range.  In another post I will apply the model to July, August, and September data (the model data was from June 10 to June 11).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s