Announcing Native Flink SQL Support in Flinkflow

We’re excited to announce that Flinkflow now natively supports Apache Flink SQL as a first-class type: sql step in the YAML DSL. This means Data Analysts, SQL-native developers, and anyone comfortable with SQL can now contribute streaming logic directly into production Flinkflow pipelines — no Java, no Maven, no boilerplate.

Why Flink SQL?

SQL is the most widely understood data language in the world. By bringing Flink SQL into Flinkflow’s declarative model, we’re expanding the platform’s accessibility to an entirely new audience:

Data Analysts who think in SQL can now author production streaming transformations.
Existing Flink SQL users can embed their queries in Flinkflow YAML and gain Kubernetes-native deployment, GitOps workflows, and Agentic AI capabilities.
Polyglot teams can mix SQL with Python, Java, and Camel in a single pipeline — using the right tool for each step.

What You Can Do with `type: sql`

🔍 Filtering & Transformation

Write standard SQL queries against your streaming data:

- type: sql
  name: filter-active-users
  properties:
    schema.userId: "string"
    schema.age: "int"
    schema.status: "string"
    query: "SELECT userId, age FROM input WHERE status = 'active' AND age >= 18"

📊 Windowed Aggregations

Perform time-windowed analytics with watermark support:

- type: sql
  name: revenue-per-window
  properties:
    schema.productId: "string"
    schema.eventTime: "timestamp"
    schema.revenue: "double"
    watermark.column: "eventTime"
    watermark.delay: "5"
    query: |
      SELECT window_start, window_end, productId, SUM(revenue) AS total_revenue
      FROM TABLE(TUMBLE(TABLE input, DESCRIPTOR(eventTime), INTERVAL '1' MINUTE))
      GROUP BY window_start, window_end, productId

🔗 Multi-Table Joins

Join data from multiple upstream sources with per-table schemas:

- type: sql
  name: enriched-transactions
  inputs: [transactions, accounts]
  properties:
    schema.transactions.txId: "string"
    schema.transactions.accountId: "string"
    schema.transactions.amount: "double"
    schema.accounts.accountId: "string"
    schema.accounts.ownerName: "string"
    schema.accounts.riskLevel: "string"
    query: |
      SELECT t.txId, a.ownerName, t.amount, a.riskLevel
      FROM transactions t
      JOIN accounts a ON t.accountId = a.accountId
      WHERE t.amount > 10000.0

📤 Changelog Output

Control how stateful query results are emitted downstream:

- type: sql
  name: running-totals
  properties:
    schema.userId: "string"
    schema.amount: "double"
    outputMode: "changelog"
    query: "SELECT userId, SUM(amount) AS total_spent FROM input GROUP BY userId"

Output modes: append (default), changelog (for stateful aggregations), or auto (FlinkFlow chooses).

Schema Validation at Load Time

Unlike standalone Flink SQL where schema errors surface at runtime, FlinkFlow validates your entire SQL step configuration at pipeline load time:

✅ Schema type validation (string, int, long, float, double, boolean, timestamp, decimal)
✅ Watermark column existence and type checking
✅ Output mode validation
✅ Multi-table schema namespace verification
✅ Missing query detection

Errors are caught before your job ever touches a Flink cluster.

Fits Right Into the Polyglot Model

The SQL step is not a separate system — it’s a native step type that integrates seamlessly with FlinkFlow’s polyglot engine. A single pipeline can flow through SQL, Python, Java, and Camel steps:

Source → SQL (filter/join) → Python (ML scoring) → Camel (routing) → Sink

This is how modern data teams work: each contributor uses the language they know best, and FlinkFlow orchestrates the entire pipeline declaratively.

How to Get Started

If you’re already using Flinkflow, simply add type: sql steps to your existing YAML pipelines. No new dependencies or configuration required.

If you’re new to Flinkflow:

Clone the repository: git clone https://github.com/talwegai/flinkflow
Explore SQL examples in the examples/ directory
Run locally: ./scripts/run-local.sh your-sql-pipeline.yaml
Read the full docs at talwegai.github.io/flinkflow

What’s Next

This is just the beginning of our SQL journey. On the roadmap:

Temporal table joins for slowly changing dimension enrichment
SQL-based Flowlets — reusable, parameterized SQL components
Catalog integration (Hive Metastore, AWS Glue) for shared schema management
Interactive SQL console in the Flinkflow dashboard

We’d love your feedback. Join us on Zulip or open an issue on GitHub.

Flinkflow: SQL, Python, Java, and Camel — one pipeline, zero boilerplate.

Announcing Native Flink SQL Support in Flinkflow

Why Flink SQL?

What You Can Do with `type: sql`

🔍 Filtering & Transformation

📊 Windowed Aggregations

🔗 Multi-Table Joins

📤 Changelog Output

Schema Validation at Load Time

Fits Right Into the Polyglot Model

How to Get Started

What’s Next

Related Posts

Real-Time Cybersecurity: Low-Latency Log Joins & Enrichment

Integrating Real-Time Data Streams with AI Models: The Agentic Future

Why FlinkFlow Outshines Flink SQL: A Practical Comparison

Getting Started with Flinkflow: Democratizing Real-Time Data Streaming

Why Flink SQL?

What You Can Do with type: sql

🔍 Filtering & Transformation

📊 Windowed Aggregations

🔗 Multi-Table Joins

📤 Changelog Output

Schema Validation at Load Time

Fits Right Into the Polyglot Model

How to Get Started

What’s Next

Related Posts

Real-Time Cybersecurity: Low-Latency Log Joins & Enrichment

Integrating Real-Time Data Streams with AI Models: The Agentic Future

Why FlinkFlow Outshines Flink SQL: A Practical Comparison

Getting Started with Flinkflow: Democratizing Real-Time Data Streaming

What You Can Do with `type: sql`