Iceberg Catalog Federation

Miguel Diaz
Mar 02, 2026
05 Mins read
Databricks

In complex enterprise environments, data rarely lives in a single system. Modern organizations have information distributed across Databricks, Snowflake, AWS Glue, and other specialized catalogs. Traditionally, to create analytics that combine these sources, teams have had to duplicate data across systems (expensive and prone to inconsistencies) or create complex ETL pipelines to synchronize information (slow and difficult to maintain).

Apache Iceberg catalog federation solves this problem by allowing queries to distributed tables across multiple catalogs as if they were a unified system, without needing to copy or move data. This capability eliminates unnecessary data duplication and enables truly integrated analytics by directly accessing data where it’s stored.

What is Iceberg Catalog Federation?

Catalog federation in Apache Iceberg allows multiple catalogs containing Iceberg tables to work in a coordinated manner through the REST Catalog Protocol. This standard protocol acts as a common API that enables engines like Spark, Trino, or Flink to access Iceberg tables stored in different catalogs (AWS Glue, Databricks Unity Catalog, Snowflake, Nessie) as if they were a single system.

The key is that Iceberg uses a standardized metadata format and the REST protocol to coordinate operations between catalogs. This means that a query can simultaneously access Iceberg tables in Databricks and Snowflake, leveraging the native capabilities of each system while maintaining the transactional consistency and schema evolution that characterize Iceberg.

How Does Iceberg Catalog Federation Work?

Iceberg catalog federation operates through an intelligent flow that optimizes access to distributed data:

🔍 Intelligent Discovery

Unity Catalog automatically detects that the queried table resides in an external federated catalog (like Snowflake) during query planning.

📋 Metadata Negotiation

Databricks queries the external system to verify it’s an Iceberg table and obtain the location of metadata.json and data files.

⚡ Optimized Direct Access

Instead of sending the complete query to the external system, Databricks reads directly from storage (S3, Azure, GCS) using local compute.

🔄 Automatic Fallback

If the table doesn’t meet criteria for direct access, the system automatically falls back to traditional federation via JDBC as backup.

This hybrid approach combines the flexibility of accessing multiple catalogs with the efficiency of reading directly from storage, avoiding costly data transfers between systems.

Practical Implementation: Databricks and Snowflake

To fully understand Iceberg Catalog Federation, let’s imagine a company with data distributed across two independent Iceberg catalogs that need to work together.

The Problem to Solve

Company XYZ has:

Databricks (Unity Catalog): Iceberg tables with real-time transactional data
Snowflake (Horizon Catalog): Iceberg tables with historical data and predictive analytics
Need: Queries that combine both datasets without duplicating information

💡 The Solution: Iceberg Catalog Federation

Step 1: Configuration Fundamentals

-- Why External Location?
-- Iceberg stores data in storage (S3/Azure/GCS) + metadata in catalogs
-- For federation, Unity Catalog needs direct access to shared storage
CREATE EXTERNAL LOCATION snowflake_iceberg_storage
URL 's3://company-xyz-iceberg/snowflake-tables/'
WITH (STORAGE CREDENTIAL iceberg_access_credential);

🔍 What’s happening here?

Iceberg Catalog Federation requires both catalogs to access the same file storage
Unity Catalog needs permissions to read the Parquet files where Snowflake stores Iceberg tables
This isn’t just a connection - it’s direct access to the shared Iceberg format

Step 2: Establish Communication Channel Between Catalogs

-- Specialized connection for Iceberg Catalog Federation
CREATE CONNECTION snowflake_iceberg_catalog
TYPE SNOWFLAKE
OPTIONS (
  host 'company-xyz.snowflakecomputing.com',
  port '443',
  user 'databricks_user',
  warehouse 'ANALYTICS_WH'          -- Only for metadata queries
)
WITH CREDENTIAL oauth_iceberg_federation;

🔍 What’s happening here?

This connection is NOT for transferring data - it’s for coordinating Iceberg metadata
Snowflake will tell Unity Catalog: “This table is in Iceberg format, here are its metadata”
REST Catalog Protocol allows both catalogs to “speak the same Iceberg language”

Step 3: Create the Federated Catalog (Here’s where the magic happens!)

-- Official registration of external catalog in the federated ecosystem
CREATE FOREIGN CATALOG snowflake_iceberg_federated
USING CONNECTION snowflake_iceberg_catalog
OPTIONS (
  database 'ICEBERG_CATALOG',
  storage_location 's3://company-xyz-iceberg/snowflake-tables/',
  authorized_paths 's3://company-xyz-iceberg/snowflake-tables/'
);

🔍 What’s happening here?

Unity Catalog now “knows” that another federated Iceberg catalog exists
snowflake_iceberg_federated becomes a valid namespace in Unity Catalog
Snowflake’s Iceberg tables appear as if they were local in Databricks

Step 4: The Federated Query - Where You See the Power

-- One query, multiple Iceberg catalogs!
SELECT
    -- Real-time data from Unity Catalog
    txn.transaction_id,
    txn.customer_id,
    txn.amount,
    txn.transaction_timestamp,

    -- Historical data from Snowflake Catalog (federated)
    hist.customer_ltv,
    hist.risk_score,
    hist.predicted_churn_probability

FROM main.realtime.transactions txn              -- Iceberg table in Unity Catalog
INNER JOIN snowflake_iceberg_federated.analytics.customer_history hist  -- Iceberg table in Snowflake
    ON txn.customer_id = hist.customer_id

WHERE txn.transaction_timestamp >= CURRENT_TIMESTAMP() - INTERVAL 1 HOUR
    AND hist.risk_score < 0.3;

Iceberg Catalog Federation Execution Flow

Phase 1: Intelligent Catalog Discovery

Unity Catalog analyzes: "snowflake_iceberg_federated.analytics.customer_history?"
→ Detects: "It's an Iceberg table in federated catalog"
→ Activates: Iceberg federation protocol

Phase 2: Iceberg Metadata Negotiation

Unity Catalog → Snowflake: "Give me metadata for customer_history Iceberg table"
Snowflake → Unity Catalog: "Here's metadata.json location + Iceberg schema"
Unity Catalog: "Perfect, it's compatible - proceeding with direct access"

Phase 3: Direct Access to Iceberg Format

Unity Catalog reads DIRECTLY:
- metadata.json from Snowflake's Iceberg table
- manifest-list.json to get active files
- Specific Parquet files according to query filters
- Everything using standard Iceberg format (no conversions!)

Phase 4: Unified Execution in Databricks

Databricks Compute processes:
- Local Iceberg table: main.realtime.transactions
- Remote Iceberg table: customer_history (read from Snowflake's S3)
- Optimized JOIN using statistics from both Iceberg tables
- Result: One query, two catalogs, unified Iceberg format

The Result: Truly Federated Iceberg Catalogs

What we just accomplished:

Zero data duplication: Data remains in its original location
Unified format: Both tables speak “native Iceberg”
Optimized performance: Direct file access, no data transfer
Total transparency: Federation is invisible to the end user

info

Both systems understand the standard Iceberg metadata format, enabling intelligent coordination between distributed catalogs without losing the native advantages of each platform.

Featured Use Cases

Gradual Migration

Companies selectively migrate workloads from Snowflake to Databricks while maintaining access to historical data without costly duplication.

Cross-Platform Analytics

Data science teams access data distributed across multiple systems using unified Databricks tools.

Domain Separation

Different departments maintain specialized catalogs while enabling integrated corporate analytics.

Conclusion

Iceberg Catalog Federation revolutionarily transforms distributed data management. Organizations no longer need to choose between costly data duplication or maintaining isolated silos. With Iceberg’s standard metadata format as a “common language,” multiple specialized catalogs work as a unified system, enabling gradual modernization, cost optimization, and cross-platform analytics without complex ETL. The future of enterprise data is distributed, and this technology lays the foundation for ecosystems where physical location becomes transparent.

tip

To implement Iceberg Catalog Federation in your organization, start with a low-risk pilot use case, establishing federation between two catalogs with non-critical datasets to validate configuration and performance before full rollout.

Resources

#Apache Iceberg
#Databricks
#Snowflake
#Unity Catalog

Iceberg Catalog Federation

What is Iceberg Catalog Federation?

How Does Iceberg Catalog Federation Work?

🔍 Intelligent Discovery

📋 Metadata Negotiation

⚡ Optimized Direct Access

🔄 Automatic Fallback

Practical Implementation: Databricks and Snowflake

The Problem to Solve

💡 The Solution: Iceberg Catalog Federation

Step 1: Configuration Fundamentals

Step 2: Establish Communication Channel Between Catalogs

Step 3: Create the Federated Catalog (Here’s where the magic happens!)

Step 4: The Federated Query - Where You See the Power

Iceberg Catalog Federation Execution Flow

Phase 1: Intelligent Catalog Discovery

Phase 2: Iceberg Metadata Negotiation

Phase 3: Direct Access to Iceberg Format

Phase 4: Unified Execution in Databricks

The Result: Truly Federated Iceberg Catalogs

Featured Use Cases

Conclusion

Resources

Table of Contents