Time zones and databases don't mix

I've been working quite closely with relational databases lately. For me this means PostgreSQL. One fact I have known intuitively but never proved out before is that you shouldn't store timestamps with a time zone.

The thing that got me started trying to understand the SQL standard and timestamps was looking at the PostgreSQL driver for Golang. Here is the relevant function from encode.go.

// formatTs formats t as time.RFC3339Nano and appends time zone seconds if
// needed.
func formatTs(t time.Time) (b []byte) {
    b = []byte(t.Format(time.RFC3339Nano))
    // Need to send dates before 0001 A.D. with " BC" suffix, instead of the
    // minus sign preferred by Go.
    // Beware, "0000" in ISO is "1 BC", "-0001" is "2 BC" and so on
    bc := false
    if t.Year() <= 0 {
        // flip year sign, and add 1, e.g: "0" will be "1", and "-10" will be "11"
        t = t.AddDate((-t.Year())*2+1, 0, 0)
        bc = true
    }
    b = []byte(t.Format(time.RFC3339Nano))
    if bc {
        b = append(b, "  BC"...)
    }

    _, offset := t.Zone()
    offset = offset % 60
    if offset == 0 {
        return b
    }

    if offset < 0 {
        offset = -offset
    }

    b = append(b, ':')
    if offset < 10 {
        b = append(b, '0')
    }
    return strconv.AppendInt(b, int64(offset), 10)
}

The purpose of this function is to take an instance of time.Time and return it encoded as a slice of bytes. This is then transmitted over the connection to the database server. This should be relatively easy! After all a time is just a number isn't it? Well, not always. There are a large number of things at play here.

  • The instance of time.Time has a time zone offset associated with it.
  • The database column could be declared TIMESTAMP WITH TIME ZONE or could be declared TIMESTAMP WITHOUT TIME ZONE.
  • The caller may have already done some sort of conversion to account for time zones.

The behavior of formatTs is easy enough to understand.

  1. It formats the time value in RFC3339 with nanoseconds. This format includes the time zone offset in hours and minutes.
  2. It deals with oddities around the year zero and AD/BC terminology.
  3. It divides the time zone offset (in seconds) by 60, taking the remainder. This gives any leftover seconds from the above time zone.
  4. It checks if the remainder is zero. If it is, it returns the encoded value immediately.
  5. It appends the time zone reaminder to the result and returns the encoded value.

As you can see, formatTs attempts to handle time zones. However, it doesn't really cover all of the points I made above. This is because formatTs doesn't know about the destination column's type or if the destination is even a column. After all the following is valid SQL.

SELECT now() > date '2015-01-20';

All this does is check if the current time is after 2015-01-20 and return a boolean value. No data is stored on the server. It turns out this question of "correct" handling of time.Time and RDBMS has been brought up before. At this point you've probably figured out there is no correct answer. But there is a correct way to handle this problem. Don't ever use SQL's TIMESTAMP WITH TIME ZONE

Just try it

The easiest way to understand the problems around a particular software's feature is to just try it. Let's create the simplest table possible with both a TIMESTAMP WITH OUT TIME ZONE and a TIMESTAMP WITH TIME ZONE column.

CREATE TABLE demo ( 
    id serial unique,
    ts_at timestamp without time zone not null,
    ts_tz_at timestamp with time zone not null);

This creates a table with nothing more than an identifier column and two timestamps. One has a time zone while the other does not. Now let us add a row to see what happens.

ericu=# INSERT INTO demo (ts_at,ts_tz_at) VALUES(NOW(),NOW());
INSERT 0 1
ericu=# select * from demo;
 id |           ts_at            |           ts_tz_at            
----+----------------------------+-------------------------------
  1 | 2015-01-20 19:08:39.027437 | 2015-01-20 19:08:39.027437-06
(1 row)

At this point is worth mentioning that my /etc/timezone file on my computer has a value of "America/Chicago". Presently, that time zone has an offset of negative 6 hours from UTC. The ts_tz_at column displays the time and the time zone. In other words, it stored the complete result from calling NOW(). But the ts_at column doesn't display the time zone. But the timestamp is otherwise identical. So, it lost some of the data from the call to NOW().

In order to properly use the ts_at column we need to store the timestamp using the same time zone in all cases. The correct SQL to do this is

ericu=# INSERT INTO demo(ts_at,ts_tz_at) VALUES(CURRENT_TIMESTAMP at time zone 'utc', NOW());
INSERT 0 1
ericu=# select * from demo;
 id |           ts_at            |           ts_tz_at            
----+----------------------------+-------------------------------
  1 | 2015-01-20 19:08:39.027437 | 2015-01-20 19:08:39.027437-06
  2 | 2015-01-21 01:15:51.250127 | 2015-01-20 19:15:51.250127-06
(2 rows)

By using CURRENT_TIMESTAMP at time zone 'utc' the timestamp returned is always in the UTC time zone. Since UTC is +0 or zero hours from UTC there is no additional piece of information to be stored. At this point, you may be thinking that you should always declare columns to be TIMESTAMP WITH TIME ZONE. That way you would never lose any data. There are actually an ample number of reasons why this is a horrible idea.

Storing time zones requires developers to ask questions

If you store a time zone in your database with your timestamps you require developers to ask a question every time they need to store data into the database. Let's consider a table that is used to store records of each customer's purchase. It would probably have an approved_at column storing the purchase's approval time and date. If the approved_at column can store a time zone, which one should a developer use? Here are a couple answers to that question.

  • The time zone of the user. After all, you want your website to be convenient for the user.
  • The time zone of the business. Whenever business analysts run sales reports they always want to see times relative to their local time of day.
  • The time zone of your payment processor. Your payment processor is on the other side of the world, but you could store your timestamps in their time zone.

Even if the developer can answer this question themselves there still exists the mental burden of considering the question. But really, all of the above options are reasonable. How do you know that two developers will arrive at the same conclusion the same question?

Time zones change every year for many people of the world

Earlier I told you that my time zone was "America/Chicago". Why didn't I say -6 UTC? The time zone I'm in really isn't six hours behind UTC. It is six hours behind UTC on this day in January. Sometime this year it will be five hours behind UTC. This is all thanks to Daylight Savings Time. For non-western readers, each year most of North America and Europe set their clock forward by one hour in the spring and back by one hour in the fall. The reasons for this have been lost to history, but the practice is here to stay. The supposed advantage to this is being able to enjoy more hours of daylight at one point in the year or another. This is particularly ironic for parts of the United States such as Alaska that have an exceptionally long daylight period each day during the summer.

The point to realize here is that the correct time zone to display to your many of your users can change at least twice annually. Are you going to update all database records with timestamps? PostgreSQL attempts to handle Daylight Savings Time. This only works if you specify the time zone as full name such as "America/Chicago" and not "CST". The formatTs function from Golang's Postgres driver does not do this. It sends the time zone as a numeric offset. This means that Postgres does not know how to apply daylight savings time for your timestamps if you updated them using that library.

Time zones are a cultural phenomenon

There are plenty of time zone "maps" of sorts. There is a great one on WikiMedia. You might think that you can just look at a user's position relative to the Prime Meridian and compute their time zone. This is flatly wrong. Take a look at China's time zone. It's just one enormous time zone. This is contrast to the United States being chopped up into multiple time zones and spanning almost the same degrees of longitude on the globe. This is because the People's Republic of China decided time zones were nonsense. After 1949 it was decided that the multiple time zones in use at China were a waste of resources. All of China moved to using +8 UTC and has been there ever since.

The Pacific Republic of Kiribati acquired territory from the United States after its independence. This new territory meant that it was then a country split across the International Date Line. Due to this, the weekend started earlier for some of the country. This made various types of business within the nation difficult to carry on as the weekend came earlier for one part of the country than the other. The decision was made to to adopt +14 UTC as their time zone for some areas. From a scientific perspective this does not make any sense. There are only 24 hours in the day, so a time zone offset should never be more than +12 or less than -12 from UTC. But from a cultural perspective this is perfectly logical: There is no longer an offset of nearly an entire day across the nation of Kiribati.

The take away from this is to realize that time zones are a political and cultural phenomenom. Are you going to update all database records each time public opinion changes?

Time zones are a maintenance task that cannot be predicted

With the above facts in mind, consider now how to plan for database maintenance related to time zones. You can't. There is no sane way to plan out maintenance around time zones. You could throw thousands of man hours per year at the problem and still not have it under control. If you could predict changes in cultural norms you would be in a better position to profit in the financial industry than in the software industry.

How should I be storing my timestamps?

You should always store timestamps in a column type of TIMESTAMP WITH OUT TIME ZONE and storing the timestamp using the UTC time zone. This way no one ever has to ask about how timestamps are stored in your database. Time zones are a problem to be solved outside of the RDBMS.

How can I show my users times in their locale?

There are plenty of ways to do this. You could store an integer number that represents the users desired time zone as the number of seconds away from UTC. You would only have to store this on a single record that represented each user or the user's preferences. Whenever a user changes his or her preferences only a single row has to be updated. This doesn't solve the problem of DST, but that is not the point I am making. My point is that you don't want to leave time zones to be decided by your RDBMS. Instead, you should use the RDBMS to store data and let some other piece of software decide how it should be displayed to the user.

Are all my problems solved?

Sorting out proper usage of time zones is one thing, but it doesn't solve all the problems around times and dates. Here a few of the fun ones you are likely to run into if you deal with times and dates that aren't from the modern era.

  • Historically the day was considered to end at sundown instead of midnight.
  • Most of the world uses the Gregorian calendar, except the part that doesn't. The year 2015 spans the years 1436 and 1437 on the Islamic Calendar.
  • Of the countries using the Gregorian calendar, all of them previously used the Julian Calendar. The first countries changed as early the 16th century but the last countries didn't change until the 20th century. Between 10 and 13 calendar days were skipped depending on when they changed.
  • France briefly used the "French Republican Calendar" in the late 18th and early 19th centuries. It intentionally has no resemblance to any other calendars having features such as a week with ten days.

If you still aren't convinced there is a great page in the PostgreSQL manual about this topic.


© Eric Urban 2015