Cloudflare Pipelines now supports Terraform for full lifecycle management. You can create streams, configure R2 Data Catalog sinks, and manage Apache Iceberg tables entirely through Infrastructure as Code — enabling zero-egress analytics with zero operational overhead.
What You Will Learn
- How to configure R2 bucket with Data Catalog using Terraform
- Creating pipeline streams, sinks, and SQL transformations
- Authenticating with scoped API tokens for Iceberg tables
- Querying data with R2 SQL, Spark, PyIceberg, and DuckDB
Why Cloudflare Pipelines + R2 Data Catalog?
Cloudflare Pipelines lets you ingest streaming data via Workers or HTTP endpoints, transform it with SQL, and write it to R2 as Apache Iceberg tables. The R2 Data Catalog, now available in the Cloudflare Terraform provider (v5.19.0+), manages those Iceberg tables with built-in compaction, time travel, and ACID transaction support.
The key advantage? Zero egress fees. Your analytics queries run against data stored in R2, and you never pay to move that data out. Combined with Terraform for Infrastructure as Code, you can version-control your entire data pipeline stack.
For production workloads, always use scoped API tokens instead of global account tokens. The Terraform configuration below demonstrates the least-privilege approach with specific permission groups for the pipeline sink.
Prerequisites
- Cloudflare account with R2 and Pipelines enabled
- Terraform v1.6+ installed locally
- Cloudflare provider v5.19.0+ in your Terraform configuration
- API token with R2 Admin Read & Write permissions
Terraform Configuration: Complete Pipeline Setup
This end-to-end Terraform configuration creates a complete data pipeline: an R2 bucket with the data catalog enabled, a scoped API token for the sink, and the stream, sink, and pipeline resources that ingest JSON data into an Apache Iceberg table.
Provider Setup
terraform {
required_providers {
cloudflare = {
source = "cloudflare/cloudflare"
version = "~> 5.19"
}
}
}
provider "cloudflare" {
api_token = var.cloudflare_api_token
}
R2 Bucket with Data Catalog Enabled
resource "cloudflare_r2_bucket" "data_lake" {
account_id = var.account_id
name = "analytics-data-lake"
}
resource "cloudflare_r2_data_catalog" "iceberg_catalog" {
account_id = var.account_id
bucket = cloudflare_r2_bucket.data_lake.name
}
Scoped API Token for Pipeline Sink
data "cloudflare_account_api_token_permission_groups_list" "sink_permissions" {
filter {
name = "R2"
}
}
resource "cloudflare_account_api_token" "pipeline_token" {
name = "pipeline-sink-token"
policies = [{
effect = "allow"
resources = {
"com.cloudflare.r2.bucket.${cloudflare_r2_bucket.data_lake.id}" = "*"
}
permission_groups = data.cloudflare_account_api_token_permission_groups_list.sink_permissions.permission_groups[*].id
}]
}
Pipeline Stream and Sink
resource "cloudflare_pipeline_stream" "events_stream" {
account_id = var.account_id
name = "analytics-events-stream"
}
resource "cloudflare_pipeline_sink" "iceberg_sink" {
account_id = var.account_id
name = "iceberg-sink"
pipeline_id = cloudflare_pipeline.pipeline.id
sink_type = "r2_data_catalog"
sink_config {
r2_data_catalog {
bucket = cloudflare_r2_bucket.data_lake.name
namespace = "analytics"
table = "events"
format = "parquet"
}
}
}
resource "cloudflare_pipeline" "main_pipeline" {
account_id = var.account_id
name = "analytics-pipeline"
stream_id = cloudflare_pipeline_stream.events_stream.id
sink_ids = [cloudflare_pipeline_sink.iceberg_sink.id]
sql = <<-EOT
SELECT
event_id,
event_type,
timestamp,
user_id,
properties,
created_at
FROM stream
EOT
}
This configuration creates a pipeline that receives JSON events via HTTP, applies SQL transformations, and writes the results as Apache Iceberg tables to R2 Data Catalog. The sink automatically creates the namespace and table if they do not exist.
Querying Iceberg Tables
Once your data lands in Iceberg tables, you can query it using multiple engines. R2 Data Catalog exposes a standard Iceberg REST catalog interface, compatible with:
R2 SQL
Built-in SQL query interface in Cloudflare dashboard
PyIceberg
Python library for Iceberg table operations
Apache Spark
Scala and PySpark connectors available
DuckDB
Fast OLAP queries with Iceberg catalog support
Example: Connecting with PyIceberg
from pyiceberg.catalog import load_catalog
catalog = load_catalog(
"rest",
uri="https://${account_id}.r2.cloudflarestorage.com",
warehouse="analytics-data-lake",
s3_endpoint="https://${account_id}.r2.cloudflarestorage.com",
aws_access_key_id="${access_key}",
aws_secret_access_key="${secret_key}",
)
table = catalog.load_table("analytics.events")
print(table.scan().to_pandas())
R2 Sink vs Iceberg Sink
You can choose between two sink types depending on your data format requirements:
To write raw Parquet or JSON files to R2 instead of Iceberg tables, replace the sink resource with an R2 sink. This requires R2 S3-compatible credentials instead of a catalog token.
R2 Data Catalog sinks only support Parquet format. JSON format is not supported for Iceberg tables. If you need JSON output, use the R2 bucket sink instead.
Clean Up Resources
When you no longer need the pipeline, destroy the resources in the correct order to avoid orphaned dependencies:
# Destroy in reverse order of creation terraform destroy -target cloudflare_pipeline.main_pipeline terraform destroy -target cloudflare_pipeline_sink.iceberg_sink terraform destroy -target cloudflare_pipeline_stream.events_stream terraform destroy -target cloudflare_account_api_token.pipeline_token terraform destroy -target cloudflare_r2_data_catalog.iceberg_catalog terraform destroy -target cloudflare_r2_bucket.data_lake
Final Verdict
Cloudflare Pipelines + R2 Data Catalog with Terraform provides a production-ready data pipeline architecture. You get managed Iceberg tables with ACID transactions, zero egress fees, and full Infrastructure as Code support. The scoped API tokens ensure least-privilege security, while the REST catalog interface lets you connect your preferred query engines.
Last Updated: May 06, 2026 | Source: Cloudflare Developers Documentation (Official Website)