Embucket Integration Test Framework

This project provides a comprehensive test framework for running integration tests and benchmarks on Embucket (a Snowflake-compatible database) using industry-standard datasets: ClickBench and TPC-H.

Overview

The test framework consists of three main scripts:

make.sh - Core utilities for database setup, Docker management, and data operations
clickbench.sh - ClickBench dataset management and benchmarking
tpch.sh - TPC-H dataset management and benchmarking

Architecture

The framework uses Docker Compose to orchestrate the following services:

Embucket (port 3000) - The main database being tested, Snowflake-compatible interface
MinIO (ports 9000, 9001) - S3-compatible object storage for testing cloud data integration
Toxiproxy (port 8474) - Network proxy for simulating latency and failures
MC (MinIO Client) - Automated MinIO bucket setup

All data is stored in the storage/ directory, which is mounted into Docker containers and gitignored.

Prerequisites

Docker and Docker Compose
Python with virtual environment support
Snowflake CLI

Quick Start

Start Docker services:
```
sh make.sh up
```
Initialize database and schema:
```
sh make.sh setup
```

Load benchmark data (choose one):

ClickBench (web analytics benchmark):

sh clickbench.sh clickbench_partitioned

TPC-H (decision support benchmark):

# First, manually place TPC-H parquet files in storage/tpch/100/
sh tpch.sh tpch_setup

Core Commands

make.sh Commands

sh make.sh install_snowflake - Install Snowflake CLI
sh make.sh up - Start Docker Compose services
sh make.sh down - Stop Docker Compose services
sh make.sh volume - Create S3-based external volume
sh make.sh volume_file - Create file-based external volume
sh make.sh database - Create demo database
sh make.sh schema - Create schema
sh make.sh setup - Run complete database, schema setup
sh make.sh snowsql "query" - Execute Snowflake SQL
sh make.sh sparksql "query" - Execute Spark SQL
sh make.sh equality table1 table2 - Compare data between tables

clickbench.sh Commands

sh clickbench.sh cp_download_partitioned - Download partitioned ClickBench data
sh clickbench.sh cb_download_single - Download single ClickBench file
sh clickbench.sh cb_create_table - Create ClickBench table schema
sh clickbench.sh cb_copy_into_partitioned - Load partitioned data
sh clickbench.sh cb_copy_into_single - Load single file data
sh clickbench.sh clickbench_partitioned - Full partitioned setup
sh clickbench.sh clickbench_single - Full single file setup
sh clickbench.sh clickbench_spark_partitioned - Create Spark Iceberg table
sh clickbench.sh benchmark - Run ClickBench queries and measure performance

tpch.sh Commands

sh tpch.sh volume_local_file - Create local file-based external volume
sh tpch.sh tpch_create_tables - Create all TPC-H table schemas
sh tpch.sh tpch_copy_into_tables - Load data from mounted storage (/storage/tpch/100/)
sh tpch.sh tpch_copy_into_tables_file - Load data from local filesystem (tpch/100/)
sh tpch.sh tpch_setup - Create tables and load data (complete setup)
sh tpch.sh benchmark - Run TPC-H queries from tpch/queries/ and measure performance

Data Preparation: TPC-H data must be manually placed in the storage/tpch/100/ or tpch/100/ directory as Parquet files. The script expects these files:

customer.parquet
orders.parquet
lineitem.parquet
nation.parquet
region.parquet
part.parquet
supplier.parquet
partsupp.parquet

Example Usage:

# Ensure TPC-H data files are in place
# (manually copy parquet files to storage/tpch/100/ or tpch/100/)

# Create tables and load data
sh tpch.sh tpch_setup

# Run benchmark
sh tpch.sh benchmark

Creating Test Files

Test files should follow the pattern in tests/example.sh:

#!/bin/bash
source ./make.sh
source ./clickbench.sh

# Start services
up

# Initialize Snowflake
setup

# Load test data
clickbench_partitioned
clickbench_spark_partitioned

# Run test queries
snowsql "SELECT watchid FROM demo.spark.hits LIMIT 100;"
sparksql "SELECT watchid FROM demo.embucket.hits LIMIT 100;"

# Verify data equality
equality demo.embucket.hits demo.spark.hits

# Cleanup
down

Test File Structure

Start services - Use sh make.sh up to start Docker containers
Initialize - Run sh make.sh setup to create Snowflake resources
Load data - Choose appropriate data loading function
Execute tests - Run your specific test queries
Verify results - Use sh make.sh equality or custom validation
Cleanup - Use sh make.sh down to stop services

Available Data Loading Options

sh clickbench.sh clickbench_partitioned - Load all 100 partitioned files
sh clickbench.sh clickbench_partitioned_small - Load only first partition for testing
sh clickbench.sh clickbench_single - Load single large file
sh clickbench.sh clickbench_spark_partitioned - Create corresponding Spark tables

Storage Configuration

Two storage types are configured:

S3 storage (mybucket) - MinIO-based object storage
File storage (local) - Local filesystem access

Both point to the same data location for testing different ingestion paths.

Usage Examples

Basic Integration Test

sh tests/example.sh

Custom Test Creation

# Create new test file
cp tests/example.sh tests/my_test.sh
# Edit to add your specific test logic
# Run your test
sh tests/my_test.sh

Manual Operations

# Start only the infrastructure
sh make.sh up
sh make.sh setup

# Load specific dataset
sh clickbench.sh clickbench_single

# Run custom queries
sh make.sh snowsql "SELECT COUNT(*) FROM demo.embucket.hits"
sh make.sh sparksql "SELECT COUNT(*) FROM demo.spark.hits"

# Compare results
sh make.sh equality demo.embucket.hits demo.spark.hits

# Cleanup
sh make.sh down

Configuration Files

config.toml

Snowflake CLI configuration pointing to the local Embucket instance:

[connections.dev]
host = "localhost"
port = 3000
user = "user"
password = "password"
database = "demo"
schema = "embucket"
warehouse = "warehouse"

docker-compose.yaml

Defines all services (Embucket, MinIO, Toxiproxy, MC) with port mappings and volume mounts to the storage/ directory.

s3.sh

Sets environment variables for S3/MinIO access:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION

Source this file with . ./s3.sh when needed for manual S3 operations.

Test Suite

For detailed information about the test suite and available test scripts, see TESTING.md.

Available test files:

tests/example.sh - Basic integration test
tests/clickbench.sh - ClickBench benchmark
tests/clickbench_file.sh - File-based storage test
tests/tpch.sh - TPC-H benchmark
tests/merge.sh - MERGE operations test

Environment Variables

The scripts automatically handle:

SNOWFLAKE_HOME - Set to current project directory
Virtual environment activation via venv.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embucket Integration Test Framework

Overview

Architecture

Prerequisites

Quick Start

Core Commands

make.sh Commands

clickbench.sh Commands

tpch.sh Commands

Creating Test Files

Test File Structure

Available Data Loading Options

Storage Configuration

Usage Examples

Basic Integration Test

Custom Test Creation

Manual Operations

Configuration Files

config.toml

docker-compose.yaml

s3.sh

Test Suite

Environment Variables

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
clickbench		clickbench
scripts		scripts
snowplow		snowplow
tests		tests
tpch		tpch
.gitignore		.gitignore
README.md		README.md
TESTING.md		TESTING.md
clickbench.sh		clickbench.sh
config.toml		config.toml
docker-compose.yaml		docker-compose.yaml
make.sh		make.sh
s3.sh		s3.sh
snowplow.sh		snowplow.sh
tpch.sh		tpch.sh
venv.sh		venv.sh

Folders and files

Latest commit

History

Repository files navigation

Embucket Integration Test Framework

Overview

Architecture

Prerequisites

Quick Start

Core Commands

make.sh Commands

clickbench.sh Commands

tpch.sh Commands

Creating Test Files

Test File Structure

Available Data Loading Options

Storage Configuration

Usage Examples

Basic Integration Test

Custom Test Creation

Manual Operations

Configuration Files

config.toml

docker-compose.yaml

s3.sh

Test Suite

Environment Variables

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages