A modern data warehouse platform acts as the centralized data reporting foundation for any data-driven organization.

By Vadim Orlov, Matt Kosovec, Joe Swanson, and Jacob Crell

If you’re not building a data warehouse, chances are you may be considering it or are simply maintaining your data warehouse in order to centralize and tap into actionable data within your organization.

This whitepaper helps guide decision makers on the important factors at play when assessing data warehouse platforms and gives tangible metrics to help support their decisions. The performance-per-dollar approach provides an effective way to evaluate platforms by focusing on two metrics that matter for every customer: performance and cost of ownership. These metrics tend to be most important for organizations because they quantify operational costs of the platform for the required performance and scale. To make the comparison easier, we combined these into a single performance-per-dollar metric described below.

As a group of data and analytics practitioners, we wanted to see how vendor claims held up and which aspects turned out to be not so scientific. This is not a scientific approach, but simply an exercise that our data and analytics team approached to best make well-informed recommendations for organizations when choosing the best data warehouse platform for  their organization.

In this paper, you will learn: 

  • How to use a metric purely focused on cost of performance to help you make an objective decision around selecting a data warehouse technology
  • How to compare apples to apples between data warehouse tools that are complicated with different features and functions
  • How to make a more informed decision regarding which platform to choose, and build a business case

Evaluating 5 leading data warehouse platforms

For this comparison, we started by selecting four leading data warehousing platforms used by our data engineering practice at West Monroe to implement data solutions. This was a fairly narrowly focused analysis based on cost of performance, though we are not claiming to have tested features comprehensively. These data engineering solutions include dashboarding, data warehousing, data source integration, and advanced analytics, including Machine Learning and AI. These solutions require a foundational platform that offers a wide array of features alongside powerful performance. To round out the list to five platforms, we selected the new, GPU-based Brytlyt platform. GPU computing is a fast evolving field, and we wanted to gain experience with it to see how it compares with more traditional columnar databases.

For consistency, we chose to standardize on AWS cloud infrastructure where possible, minimizing any differences in infrastructure architectures.

Platforms and key features

SQL Server

  • SQL Server 2017 Enterprise instance running on an Amazon EC2 instance
  • Only focused on columnar storage with compression tables

Azure SQL DW

  • Fully managed cloud data warehouse based on SQL Server
  • Columnar storage with compression
  • Leverages Massively Parallel Processing (MPP)
  • Scales into Petabyte range

Amazon Redshift

  • Based on Postgres 8 with Amazon-developed columnar storage extension
  • MPP query processing
  • Machine learning extensions
  • Scales into Petabyte range
  • Recently announced concurrency scaling—new feature that automatically deploys and removes capacity as needed to serve changing query workload

Snowflake

  • Fully relational ANSI SQL data warehouse-as-a-service in AWS/Azure clouds
  • Columnar storage with compression
  • Automatic clustering removes need to re-cluster data when loading into tables
  • Usage-based pricing for compute and storage, and fully segmented compute from storage

Brytlyt

  •  Graphics Processing Unit (GPU) accelerated database built as PostgreSQL extension
  • Extension to PostgreSQL done via GPU memory external table
  • Scale by adding nodes and GPU resources

 

Download PDF