Materialized views with PostgreSQL

Database views can range in complexity and cost from a simple single table select query to multiple tables joins/aggregate views.  If you find that some of your high-cost views are being run on a frequent basis and are having a large impact on your database performance, you may want to consider materialized views.

To achieve near instantaneous access to the data results of these high-cost views, materialized views can be used to do the work in advance and store the results inside a table ready for rapid access.

Serious candidates for materialized views are those views that are based on tables that don’t change very often. Since materialized views are basically a snapshot of the result of a complex query at a point in time, you will want to use several possible refresh strategies for periodic updates:

    * Force a refresh every time data is being selected so will always return the most recent “live data”
    * X seconds since the last refresh and data is being selected
    * A preset time interval of X seconds since the last refresh, whether or not data was requested or not.

While other commercialized databases are able to auto-materialize views, you can achieve the same results in PostgreSQL by using triggers and PL/pgsql language.

Jonathan Gardner’s “Materialized views in PostgreSQL” post cover this topic well.  I noticed that it’s been written in 2004 and am wondering what the progress is on this?  Does the latest PostgreSQL version natively support this yet?  Still some more research to do ahead.  The awesome PostgreSQL General Bits over at Varlena has more on this topic.  (still written in 2004!!)

Essentially, Jonathan covers three different kinds of materialized views and each comes with their pros/cons:

Snapshot materialized views:

   Where you just simply remove all the rows of the materialized view table and refill the table with the updated data from a view.  You can do a complete refresh of the materialized view as needed.  This approach is good for views with underlying tables that may significantly change over time but the latest data doesn’t need to be immediately accessed by the users (maybe updating once or twice a day is satisfactory enough).
  However, if there were only a few changes in the underlying tables, the very expensive operation of a complete refresh from scratch would be somewhat of overkill.  Another  downside of this approach is that during the wait between each complete refresh of the table, the data of the materialized view can start to get out of sync with the underlying tables when the data starts to change.

Eager materialized views:
   As the name implies, this approach eagerly keeps the materialized view up to date as much as possible by using triggers on the underlying tables of a view.  The triggers take care of each row of the materialized view during the inserting, deleting, or updating.  The net result is that the materialized view is always kept up to date as data changes in the underlying tables.
    This approach makes sense if you expect only a few updates to the underlying tables over time.  This could also affect performance for the worse if all of sudden, some of the underlying tables get hit with many updates over a short period of time. The database will struggle to keep the materialized view constantly in sync with the underlying tables.

Very Lazy materialized view:

  This materialized view is the trade-off you would have if you met halfway between the two approaches listed above.  Essentially, this materialized view is the same as eager materialized view but instead of immediately updating the materialized view, a log of changes is kept.  When the time comes, you can start the refresh and the log of changes will be immediately applied towards the materialized view.  This works very nicely if there are only a few updates which can be applied quickly.  The updating would be much less expensive than a complete refresh using a snapshot materialized view.
  On the opposite side of the coin, if by the time you were ready to refresh the materialized view, there were many updates logged to be made against the materialized view, they all can be grouped together and applied in one go.  This approach avoids the expensive overhead associated with eager materialized view when a high number of updates hit the system.  The downside of a very lazy materialized view is the same problem with out of sync data during the wait between each refresh, once the underlying tables start to change.
  However, with this approach you would be able to better manage the overhead by controlling how often to refresh the very lazy materialized view!  You could first check to see how many updates need to be applied to the materialized view.  If there are only a few, you could have them immediately executed.  If there is a large log of updates to be carried out, you could force a wait until the opportune time when the database server load is lower before unloading them on to the materialized view.

If you decide to use a materialized view and need to choose which approach to use, you need to ask yourself “How up-to-date does the copy of the data need to be and how expensive is it to keep it updated?”

This entry was posted in Databases. Bookmark the permalink.