Become a Data Scientist:

Interactive Data Exploration

Start scrolling down to start your journey.

This is Dallas, TX

... and these are the houses that sold in late August, 2022.

This is one of the more expensive homes that was sold in Dallas.

This is one of the cheaper homes that was sold in Fort Worth.

Less homes were sold in the final week of August than in the 3rd week.

This chart shows houses sold by week. Click on any of the bars to see the specific houses sold.

the path is NOT relative to the bar08/12/228/13/22-8/14/228/15/22-8/19/228/20/22-8/21/228/22/22-8/26/228/27/22-8/28/228/29/22

Average Sales Price was $484,881

You may be wondering how you can better understand the Dallas housing market. You might want to know what the normal price for a house would be in this area. If we took the houses that were sold at the end of August, and sorted them from low to high, the middle value (the median) is what we want.

To do this, we are going to use a programming language called Python. Python can be run in Jupyter notebooks to save interactive results and charts. Here we are going to use a free service from Google to run the notebook in a new tab.

To use the notebook , click here to download the tab separated file containing the housing data.

As I show in the notebook, the code to calculate the average price looks like this:

import pandas as pd
# read in the supplied data file 
path="./dfw-aug-2022-house-data.tsv"
mergedf = pd.read_csv(path, sep="\t")

# The data file contains info on rentals
# Exclude the rental data 
histdf = mergedf[mergedf['is_for_rent'] == "False"]
# We are not looking at apartments
# or vacant land to determine
# the price of single family home. 
histdf = histdf[~histdf['home_type'].isin(['LOT', 'APARTMENT', 'MULTI_FAMILY'])]

histdf['pending_price'].mean()
# 484881.5379241517

Median Price was $399,900

An average doesn't tell the entire story of what is going on with housing prices. If we break up the prices into smaller chunks and plot it out, we can visually see that the middle price is actually closer to 400,000. Sales of expensive homes skew the average higher.

Click anywhere on the chart below to look at homes in different price ranges. Click on the globe icon in the upper right of your screen to zoom in, and click back on the book icon in the upper right hand corner to continue.

the path is NOT relative to the bar010000200000400000600000800000100000012000001400000

In Python, we can make a chart with a package called matplotlib that is similar to what is seen above with the following code:

price = 0
price_list = [0]
increment = 200000
while price < 1500000:
	price = price + increment
	price_list.append(price)

def histogram(column_name, price_bins):
	fig, ax = plt.subplots()
	counts, bins, patches = ax.hist(histdf[column_name], bins=price_bins)
	
	# Set the ticks to be at the edges of the bins.
	ax.set_xticks(bins)
	ax.set_xticklabels(ax.get_xticks(), rotation = 45)
	
	plt.subplots_adjust(bottom=0.15)
	plt.show()

histogram('pending_price', price_list)

Houses sold at an average of $213/ sq. ft.

Now that we have the histogram function built out, we can reuse the code to build out a similar chart which looks at the sales of homes based on their price per square foot. Again, click anywhere on the chart to examine homes in a specific price range.

the path is NOT relative to the bar01000050100150200250300350400450500

The code to create this chart in python is seen below:

price = 0
price_list = [0]
increment = 50
while price < 500:
price = price + increment
price_list.append(price)

histogram('psf', price_list)

We can also take a look at data by bedroom. As I discuss in depth in the notebook, the "box part" of the box plot shows the values between the 25th percentile and the 75th percentile. The line in the middle of the box represents the median. For this chart, I have simplified a typical box plot; the lines that extend from the box plot simply show the minimum and maximum price per square foot for houses with different numbers of bedrooms.

Go ahead and click on any of the light green boxes down below in order to take a look at houses that have a certain number of bedrooms.

the path is NOT relative to the bar0123456

At this point, feel free to click on the globe in order to further explore real estate sales in Dallas-Fort Worth.

Once you have inspected the data and once you have a mastery of the programming concepts in this data science tutorial, you are ready to now start building these interactive applications for yourself .

Click here to switch between the tutorial and the interactive globe.