Getting Daily Mean Temperature from Historical Weather Data

Cloud Jake
4 min readOct 31, 2020

--

Line graph of Min, Max, and Mean temperature over time

I’ve been working on a few home automation projects using historical weather data from the Big Query NOAA Public Weather Dataset to schedule automations, estimate energy usage, or predict when I’ll get my next home heating oil delivery. To take things a step further, I need to know the average mean temperature on a given date.

To do this, we’ll take the last 20 years of NOAA weather data for a given weather station and calculate the mean mean ;) (average temperature by day using the mean temperature values per day). We could just as easily calculate the mean high and low temperature for each day as well.

Let’s get started!

If you have not already done so, be sure to get familiar with the Big Query NOAA GSOD public dataset using my previously published tutorial Finding the Closest Weather Stations. Be sure that you are using a weather station with (complete) “good data”.

You will need the following information:

  • stn — Station number for the location
  • wban — Historical “Weather Bureau Air Force Navy” number

Let’s start with a simple query to get the mean temperature for each day of the year for the last full year:

SELECT 
CAST(temp as Numeric) as meantemp,
CONCAT(year,mo,da) as thedate # Construct a date field
FROM
`bigquery-public-data.noaa_gsod.gsod2019`
WHERE
max != 9999.9 # code for missing data
AND stn = '725037'
AND wban = '94745'
ORDER BY thedate

We CAST the temp value from a FLOAT to NUMERIC to avoid issues when performing calculations (despite what IEEE says). As you can see, we output a simple list of dates (thedate) with the mean temperature per date (temp).

Big Query results table of meantemp by thedate.

Now that we’ve gotten the mean temperature by date, let’s expand to a period that include several years of data so we can calculate a mean value per day from historical data. Using the BETWEEN operator, we can quickly and easily select a number of years at once.

SELECT 
CAST(temp as Numeric) as meantemp,
CONCAT(mo,da) as thedate # Get rid of the Year
FROM
`bigquery-public-data.noaa_gsod.gsod20*` # Use wildcard to select many tables at once
WHERE
max != 9999.9 # code for missing data
AND _TABLE_SUFFIX BETWEEN '00' AND '19' # Limit wildcard tables to 2000-2019
AND stn = '725037'
AND wban = '94745'

We should now have 20 mean temperature values per date:

Big Query Output table of meantemp and thedate

Big Query Output table of meantemp and thedate

To make this data useful, we we are going to need a way to GROUP data by date (minus the year) in MMDD format. In order to get the mean temperature by date, we are going to need to SUM the meantemp and divide by the COUNT (or number of occurrences) of thedate. By using the COUNT, we cover ourselves in case of missing data.

SELECT 
SUM(CAST(temp as Numeric))/COUNT(temp) as mean_mean_temp,
CONCAT(mo,da) as thedate # Get rid of the Year
FROM
`bigquery-public-data.noaa_gsod.gsod20*`
# Use wildcard to select many tables at once
WHERE
max != 9999.9 # code for missing data
AND _TABLE_SUFFIX BETWEEN '00' AND '19'
# Limit wildcard tables to 2000-2019
AND stn = '725037'
AND wban = '94745'
GROUP BY thedate
ORDER BY thedate

The result is 366 rows (Don’t forget about the leap years ;) of the mean mean temperature by date.

Big Query results table of mean mean temperature by day of the year.

We can easily do the same for high (max) and low (min) temperature by day as well.

SELECT 
SUM(CAST(temp as Numeric))/COUNT(temp) as mean_mean_temp,
SUM(CAST(min as Numeric))/COUNT(min) as mean_min_temp,
SUM(CAST(max as Numeric))/COUNT(max) as mean_max_temp,
CONCAT(mo,da) as thedate # Get rid of the Year
FROM
`bigquery-public-data.noaa_gsod.gsod20*`
# Use wildcard to select many tables at once
WHERE
max != 9999.9 # code for missing data
AND _TABLE_SUFFIX BETWEEN '00' AND '19'
# Limit wildcard tables to 2000-2019
AND stn = '725037'
AND wban = '94745'
GROUP BY thedate
ORDER BY thedate
Big Query results table of average mean, min, and max temp by day of the year.

Since this query uses over 3GB to process, we might consider saving this as its own Big Query table for use is later queries. Let’s click the “Save Results” button and select “BigQuery table”.

Big Query save query results dialog.

--

--

No responses yet