Iceberg MoR the Hard Way: StarRocks Code Dive

How I Learned to Stop Worrying and Love the Delete Files: A Journey into StarRocks' Iceberg MoR Implementation

Anton Borisov

fresha-data-engineering

· ~13 min read · September 3, 2025 (Updated: September 3, 2025) · Free: Yes

They say curiosity killed the cat, but in my case, it just killed my weekend. It started innocently enough: "How Iceberg's Merge-on-Read is implemented in Starrocks?" I asked myself, blissfully unaware that I was about to descend into a rabbit hole of queues, bitmaps, and anti-joins that would make Alice's adventure look like a casual stroll.

You see, most people encounter Iceberg MoR tables and think, "Cool, it handles deletes efficiently." Normal people. Reasonable people. But not me. No, I had to know how StarRocks actually pulls this off. What started as a simple grep for "iceberg" in the codebase turned into a three-day odyssey through Java optimizers, Thrift serialization, and C++ scanners that left me questioning my life choices and developing an unhealthy relationship with queue data structures.

The thing about MoR is that it sounds deceptively simple: "Just track the deletes and apply them when reading." Sure, just like building a house is "just stacking some bricks." But when you have position deletes that need bitmaps, equality deletes that need hash joins, multiple equality ID combinations that need separate queues, and all of this needs to work at scale while being parallelizable… well, suddenly you're not in Shoreditch anymore— you're deep in the code wilderness.

So grab your coffee (you'll need it), and let me take you on the journey I wish someone had documented before I started. We'll trace how a simple SELECT statement becomes an elaborate dance of frontend orchestration and backend execution, where delete files are first-class citizens and queues are the unsung heroes of the story.

Warnings:

Side effects may include newfound appreciation for query optimizers and an irresistible urge to explain queue-based architectures at parties. All code in this article relates to Starrocks 3.5.4.

Deletes are different

Iceberg's merge-on-read mode revolves around two kinds of delete files: positional deletes and equality deletes. StarRocks treats them fundamentally differently.

Two flavours

Positional deletes are handled immediately during scanning. They encode (file_path, pos) pairs that tell the scanner exactly which rows to skip. The backend builds a bitmap and checks it inline - deleted rows never even materialize into chunks.

Equality deletes take a completely different path. They don't point at positions but carry key values (like user_id = 42). These become separate scan ranges that later participate in anti-joins to filter out matching rows.

Diagram showing how StarRocks handles Iceberg merge-on-read deletes. Positional deletes use the FileScan node and IcebergDeleteBuilder to create bitmaps that filter rows during scan. Equality deletes use key rows in an anti-join processed by a hash joiner. Both paths merge with data files to produce the final output rows.

StarRocks Merge-on-Read delete paths: positional deletes are applied during scan via bitmaps, while equality deletes flow through an anti-join to the hash joiner before producing output rows.

Why the different treatment?

Position deletes are cheap, it's just a bitmap check in the hot path. Equality deletes require matching on arbitrary columns, which needs the full power of the hash join machinery. The frontend decides which path each file takes:

// IcebergMORParams.java - The routing decision
enum ScanTaskType {
    DATA_FILE_WITHOUT_EQ_DELETE,  // Fast path - no equality deletes
    DATA_FILE_WITH_EQ_DELETE,     // Needs anti-join treatment
    EQ_DELETE,                    // The equality delete files themselves
    ORIGIN                        // Tables without identifier columns
}

Schema evolution resilience

Because Iceberg supports schema and partition spec evolution, delete files reference columns by field IDs rather than names. If you rename a column from customer_id to client_id, the equality deletes still work because they reference field ID 5, not the column name. StarRocks preserves these IDs throughout.

These IDs and spec metadata are then shipped to the backend via Thrift:

// gensrc/thrift/PlanNodes.thrift#L311
enum TIcebergFileContent {
    DATA,
    POSITION_DELETES,
    EQUALITY_DELETES,
}

// gensrc/thrift/PlanNodes.thrift#L317
struct TIcebergDeleteFile {
    1: optional string full_path
    2: optional Descriptors.THdfsFileFormat file_format
    3: optional TIcebergFileContent file_content
    4: optional i64 length
}
// gensrc/thrift/PlanNodes.thrift#L340
struct THdfsScanRange {
    21: optional list<Types.TSlotId> delete_column_slot_ids;
    28: optional map<Types.TSlotId, Exprs.TExpr> extended_columns; // e.g. spec_id
}

This ensures that equality deletes remain valid even if columns are renamed or repartitioned.

In short:

Positional deletes are consumed directly in backend scanners with a skip bitmap.

Equality deletes are not; they're carried as separate ranges and applied later.

FieldIds and specId make equality deletes robust to schema evolution.

So, the rest is optional, save your mental health

Joking :)

Frontend: constructing MoR scan ranges

When StarRocks plans a query against an Iceberg merge-on-read table, the frontend decides how to split work between data file scans with positional deletes and stand-alone equality delete ranges.

Data files with positional deletes attached

When the frontend encounters a data file, it attaches only the positional delete files and skips equality ones:

// IcebergConnectorScanRangeSource.java#L155
private List<TScanRangeLocations> buildScanRanges(FileScanTask task, Long partitionId) 
        throws AnalysisException {
    THdfsScanRange hdfsScanRange = buildScanRange(task, task.file(), partitionId);
    List<TIcebergDeleteFile> posDeleteFiles = new ArrayList<>();
    for (DeleteFile deleteFile : task.deletes()) {
        FileContent content = deleteFile.content();
        if (content == FileContent.EQUALITY_DELETES) {
            continue;  // Skip these - they'll be handled separately
        }
        TIcebergDeleteFile target = new TIcebergDeleteFile();
        target.setFull_path(deleteFile.path().toString());
        target.setFile_content(TIcebergFileContent.POSITION_DELETES);
        target.setLength(deleteFile.fileSizeInBytes());
        posDeleteFiles.add(target);
    }
    if (!posDeleteFiles.isEmpty()) {
        hdfsScanRange.setDelete_files(posDeleteFiles);
    }
    return Lists.newArrayList(buildTScanRangeLocations(hdfsScanRange));
}

The Queue Orchestra: Managing scan tasks at scale

Here's where things get interesting. StarRocks doesn't just blindly process files one by one. Instead, it orchestrates scan tasks through a sophisticated queue system that categorizes and routes work based on delete file associations.

Starrocks FE looks at you

Enter the IcebergRemoteSourceTrigger

Think of IcebergRemoteSourceTrigger as a traffic controller at a busy intersection. It receives file scan tasks from Iceberg and routes them to the appropriate processing lanes:

// IcebergRemoteSourceTrigger.java#L39
public class IcebergRemoteSourceTrigger {
    private final RemoteFileInfoSource delegate;
    
    // The three main highways for our scan tasks
    private Optional<Deque<RemoteFileInfo>> dataFileWithEqDeleteQueue = Optional.empty();
    private Optional<Deque<RemoteFileInfo>> dataFileWithoutEqDeleteQueue = Optional.empty();
    private Optional<Deque<RemoteFileInfo>> eqDeleteQueue = Optional.empty();
    
    // For the complicated case: multiple equality ID combinations
    private final Map<IcebergMORParams, Deque<RemoteFileInfo>> queues = new HashMap<>();
}

Why separate queues? Performance. Clean data files (no equality deletes) can zoom through the fast lane. Files with equality deletes need to merge at the anti-join intersection. And equality delete files themselves? They're building the roadblock that the anti-join will use.

The dispatch logic

The trigger's dispatch method is where the routing happens. It's like a sorting hat for scan tasks:

// IcebergRemoteSourceTrigger.java#L99
public synchronized void trigger() {
    if (!delegate.hasMoreOutput()) {
        return;
    }
    
    IcebergRemoteFileInfo remoteFileInfo = delegate.getOutput().cast();
    FileScanTask scanTask = remoteFileInfo.getFileScanTask();
    
    // Check what kind of deletes we're dealing with
    List<DeleteFile> eqDeleteFiles = scanTask.deletes().stream()
        .filter(f -> f.content() == FileContent.EQUALITY_DELETES)
        .collect(Collectors.toList());
    
    // Route to the appropriate queue
    if (eqDeleteFiles.isEmpty()) {
        // Fast lane - no equality deletes!
        dataFileWithoutEqDeleteQueue.map(queue -> queue.add(remoteFileInfo));
    } else {
        // Needs anti-join processing
        dataFileWithEqDeleteQueue.map(queue -> queue.add(remoteFileInfo));
        
        if (!needToCheckEqualityIds) {
            // Simple case: all equality deletes use the same columns
            eqDeleteQueue.map(queue -> queue.add(remoteFileInfo));
        } else {
            // Complex case: different equality ID combinations
            // A data file might have multiple eq delete files with different IDs
            // We need to fill ALL corresponding queues
            // (more on this nightmare scenario below)
        }
    }
}

The multiple equality IDs nightmare

Here's a fun scenario that'll keep you up at night. Imagine a table where different equality delete files use different identifier columns:

Delete file 1: Uses column user_id
Delete file 2: Uses columns user_id, timestamp
Delete file 3: Uses column transaction_id

A single data file might need to be checked against all three! Miss one queue, and you've got phantom data appearing in your results. The comment in the code puts it perfectly:

// IcebergRemoteSourceTrigger.java#L119-122
// now the different equality ids have its own file info queue.
// but a data file may corresponding to many eq delete files. 
// and these eq delete files may not have the same equality ids, 
// we should fill in all the corresponding queue.
// otherwise delete table scan with certain equality ids can get no results 
// and miss the join result.

When does the rewrite trigger?

The rule has specific conditions before it rewrites anything:

// IcebergEqualityDeleteRewriteRule.java#L75
@Override
public boolean check(final OptExpression input, OptimizerContext context) {
    Operator op = input.getOp();
    if (!op.isLogical()) {
        return false;
    }
    
    LogicalIcebergScanOperator scan = (LogicalIcebergScanOperator) op;
    
    // Already processed? Skip it
    if (scan.isFromEqDeleteRewriteRule()) {
        return false;
    }
    
    // No equality deletes? Nothing to do here
    if (!scan.hasEqualityDeletes()) {
        return false;
    }
    
    return true;
}

Incremental processing: Not drowning in metadata

For tables with thousands of files, loading all metadata upfront would be like downloading an entire Netflix series when you just want to watch one episode. StarRocks implements incremental scan range generation to avoid this:

Diagram showing StarRocks Iceberg planning. Scan ranges are fetched with getRemoteFiles or getRemoteFilesAsync, then routed by IcebergRemoteSourceTrigger into three queues: data without equality deletes, data with equality deletes, and equality delete rows. These queues are then passed to the backend (BE) or compute nodes (CN) for execution via getSourceRangeLocations.

How StarRocks routes Iceberg scan ranges during planning: the router splits ranges into separate queues for regular data, data with equality deletes, and equality delete rows before dispatching them for BE/CN execution.

The batching mechanism

// IcebergConnectorScanRangeSource.java#L121
@Override
public List<TScanRangeLocations> getSourceOutputs(int maxSize) {
    try (Timer ignored = Tracers.watchScope(EXTERNAL, "ICEBERG.getScanFiles")) {
        List<TScanRangeLocations> res = new ArrayList<>();
        while (hasMoreOutput() && res.size() < maxSize) {
            RemoteFileInfo remoteFileInfo = remoteFileInfoSource.getOutput();
            IcebergRemoteFileInfo icebergRemoteFileInfo = remoteFileInfo.cast();
            FileScanTask fileScanTask = icebergRemoteFileInfo.getFileScanTask();
            res.addAll(toScanRanges(fileScanTask));
        }
        return res;
    }
}

The maxSize parameter (controlled by connector_incremental_scan_ranges_size), determines how many scan tasks to generate per batch. Too small, and you're constantly asking for more work. Too large, and you risk OOM on metadata.

Async metadata fetching

When enable_connector_incremental_scan_ranges is enabled, StarRocks can fetch metadata asynchronously:

// IcebergScanNode.java#L163-172
if (enableIncrementalScanRanges) {
    remoteFileInfoSource = GlobalStateMgr.getCurrentState()
        .getMetadataMgr().getRemoteFilesAsync(icebergTable, params);
} else {
    List<RemoteFileInfo> splits = GlobalStateMgr.getCurrentState()
        .getMetadataMgr().getRemoteFiles(icebergTable, params);
    remoteFileInfoSource = new RemoteFileInfoDefaultSource(splits);
}

The async version returns immediately with a handle, letting the query start executing while metadata loads in the background. It's like ordering your coffee on the app while you're still driving to the store.

Smart optimizations: Not all deletes are created equal

Position delete pruning

Here's a clever trick. Not every position delete file affects every data file. StarRocks checks if a position delete actually references the current data file:

// IcebergConnectorScanRangeSource.java#L201-210
for (DeleteFile del : fileScanTask.deletes()) {
    if (del.content() == FileContent.POSITION_DELETES) {
        int filePathId = 2147483546; // Magic number from Iceberg spec
        
        // Is this a file-level delete (references specific file)?
        if (del.referencedDataFile() == null && 
            del.lowerBounds() != null && 
            !del.lowerBounds().get(filePathId).equals(
                del.upperBounds().get(filePathId))) {
            // This is partition-level delete - might affect this file
            res.add(fileScanTask);
        }
        // File-level delete for different file? Skip it!
    }
}

This optimization can save massive amounts of I/O when position deletes are file-specific rather than partition-wide.

Manifest pruning

Before even looking at files, StarRocks can prune entire manifests:

// Session variable: enable_prune_iceberg_manifest
// Skips manifests that don't overlap with query predicates

It's like checking the table of contents before reading every chapter of a book.

The Physical Plan: IcebergScanNode

After the optimizer splits the logical scan into multiple operations, each becomes an IcebergScanNode in the physical plan. This is where the MoR strategy becomes concrete:

// fe/fe-core/src/main/java/com/starrocks/planner/IcebergScanNode.java#L60
public class IcebergScanNode extends ScanNode {
    private final IcebergTableMORParams tableFullMORParams;
    private final IcebergMORParams morParams;
    // ...
}

Each IcebergScanNode instance knows its specific role through these parameters. The critical setup happens in two methods:

Setting up the queue-specific view

// IcebergScanNode.java#L130
if (morParams != IcebergMORParams.EMPTY) {
    boolean needToCheckEqualityIds = tableFullMORParams.size() != 3;
    IcebergRemoteSourceTrigger trigger = new IcebergRemoteSourceTrigger(
            remoteFileInfoSource, morParams, needToCheckEqualityIds);
    Deque<RemoteFileInfo> remoteFileInfoDeque = trigger.getQueue(morParams);
    remoteFileInfoSource = new QueueIcebergRemoteFileInfoSource(trigger, remoteFileInfoDeque);
}

This wraps the file source with a queue-specific view. Each scan node only sees files relevant to its role — a clean separation of concerns.

The incremental vs batch decision

The scan node can operate in two modes:

// IcebergScanNode.java#L119
if (enableIncrementalScanRanges) {
    // Async mode - returns immediately with a handle
    remoteFileInfoSource = GlobalStateMgr.getCurrentState().getMetadataMgr()
        .getRemoteFilesAsync(icebergTable, params);
} else {
    // Batch mode - loads all metadata upfront
    List<RemoteFileInfo> splits = GlobalStateMgr.getCurrentState().getMetadataMgr()
        .getRemoteFiles(icebergTable, params);
    remoteFileInfoSource = new RemoteFileInfoDefaultSource(splits);
    // ... then wraps with queue if needed
}

Backend Execution: Where Theory Meets Reality

The BE doesn't know what an "IcebergScanNode" is — and it doesn't need to. Through Thrift serialization, the Java planner's sophisticated MoR strategy becomes simple HDFS scan ranges with delete files attached.

The Translation Layer

When the FE's IcebergScanNode serializes itself, it becomes an ordinary HDFS_SCAN_NODE:

// IcebergScanNode.java#L292
@Override
protected void toThrift(TPlanNode msg) {
    msg.node_type = TPlanNodeType.HDFS_SCAN_NODE;
    THdfsScanNode tHdfsScanNode = new THdfsScanNode();
    tHdfsScanNode.setTuple_id(desc.getId().asInt());
    msg.hdfs_scan_node = tHdfsScanNode;
}

The Iceberg-specific magic travels in the scan ranges:

// PlanNodes.thrift#L340
struct THdfsScanRange {
    21: optional list<Types.TSlotId> delete_column_slot_ids;
    22: optional list<TIcebegDeleteFile> delete_files;  // The secret sauce
    28: optional map<Types.TSlotId, Exprs.TExpr> extended_columns;
}

Diagram showing the StarRocks planning flow for Iceberg scans. The IcebergScanNode in Java generates a TPlanNode and calls getScanRangeLocations from IcebergConnectorScanRangeSource. This builds a list of TScanRange objects. Each TScanRange is linked to a THdfsScanRange structure that includes delete files (positional and equality deletes), delete column slot IDs, and extended column metadata for execution.

How StarRocks builds scan ranges for Iceberg Merge-on-Read: IcebergScanNode produces a TPlanNode, which is serialized into THdfsScanNode and a list of TScanRange objects. Each scan range carries metadata for positional deletes, equality deletes, and extended columns before being dispatched for execution.

Position Deletes: The Bitmap Tarantella

When a scanner encounters position delete files, IcebergDeleteBuilder springs into action:

// iceberg_delete_builder.cpp#L66-72
Status IcebergDeleteBuilder::fill_skip_rowids(const ChunkPtr& chunk) const {
    const ColumnPtr& file_path = chunk->get_column_by_slot_id(k_delete_file_path.id);
    const ColumnPtr& pos = chunk->get_column_by_slot_id(k_delete_file_pos.id);
    for (int i = 0; i < chunk->num_rows(); i++) {
        if (file_path->get(i).get_slice() == _params.path) {
            _deletion_bitmap->add_value(pos->get(i).get_int64());
        }
    }
}

The builder reads the delete file (Parquet or ORC), extracts file_path and pos columns, and builds a bitmap. These special columns are identified by magic slot IDs:

// iceberg_delete_builder.cpp#L35
static const IcebergColumnMeta k_delete_file_path{
    .id = INT32_MAX - 101,  // Magic number recognized by BE
    .col_name = "file_path", 
    .type = TPrimitiveType::VARCHAR
};

The bitmap is then checked during data file scanning — deleted positions simply never materialize into chunks. It's elegant: a simple bitmap check in the hot path instead of complex join logic.

Equality Deletes: The Hash Join Tango

For equality deletes, the BE leverages its existing hash join machinery, but with a twist. The HashJoiner recognizes it's handling MoR through a special parameter:

// hash_joiner.cpp#L82 (constructor)
HashJoiner::HashJoiner(const HashJoinerParam& param)
    : _mor_reader_mode(param._mor_reader_mode),  // MoR awareness!
      // ... other initialization

The hash table parameters get MoR-specific configuration:

// hash_joiner.cpp#L148
void HashJoiner::_init_hash_table_param(HashTableParam* param, RuntimeState* state) {
    param->mor_reader_mode = _mor_reader_mode;  // Propagate MoR mode
    // ... rest of configuration
}

The join operates as anti-join: data files on the build side, equality delete files on the probe side. Rows that match equality delete keys get filtered out:

The Metrics Trail

The BE tracks MoR overhead separately from regular scan metrics:

// iceberg_delete_builder.cpp#L229
void IcebergDeleteBuilder::update_delete_file_io_counter(RuntimeProfile* parent_profile, ...) {
    const std::string ICEBERG_TIMER = "ICEBERG_V2_MOR";
    ADD_COUNTER(parent_profile, ICEBERG_TIMER, TUnit::NONE);
    
    // Track delete file I/O separately
    ADD_CHILD_COUNTER(parent_profile, "MOR_AppIOBytesRead", TUnit::BYTES, prefix);
    ADD_CHILD_TIMER(parent_profile, "MOR_AppIOTime", prefix);
    
    // Track bitmap operations
    ADD_CHILD_COUNTER(parent_profile, "MOR_FSIOBytesRead", TUnit::BYTES, prefix);
    // ... more metrics
}

For hash joins handling equality deletes:

// hash_joiner.cpp#L40
void HashJoinProbeMetrics::prepare(RuntimeProfile* runtime_profile) {
    search_ht_timer = ADD_TIMER(runtime_profile, "SearchHashTableTime");
    other_join_conjunct_evaluate_timer = ADD_TIMER(runtime_profile, "OtherJoinConjunctEvaluateTime");
    // Tracks the anti-join overhead
}

Diagram of StarRocks execution for Iceberg Merge-on-Read deletes. THDFS scan ranges for data files enter the positional deletes lane, optionally using IcebergDeleteBuilder. Equality delete ranges enter the equality deletes lane, passing through a hash joiner. The process outputs cleaned ranges, result chunks for queries, and metrics or probes for monitoring.

How StarRocks executes Iceberg Merge-on-Read deletes: positional deletes are applied in-line via the IcebergDeleteBuilder path, while equality deletes pass through a hash joiner. Both lanes produce cleaned ranges that feed into result chunks and execution metrics.

Why This Design Works

The BE's approach is pragmatic:

Reuse existing infrastructure: HDFS scanners for file reading, hash joiners for equality deletes
Simple abstractions: Delete files are just metadata attached to scan ranges
Performance isolation: Position deletes use fast bitmaps, equality deletes use proven hash join code
Clear metrics: MoR overhead is tracked separately, making performance tuning straightforward

The BE doesn't need to understand Iceberg's complex MoR semantics. It just sees:

Files to scan with bitmaps to check (position deletes)
Two sets of files to join (equality deletes)

This separation of concerns — FE handles complexity, BE handles execution — is what makes StarRocks' MoR implementation both sophisticated and efficient.

The Layered Abstraction

What makes StarRocks' design distinctive is this clean layering:

Optimization Layer (IcebergEqualityDeleteRewriteRule): Decides on the MoR strategy
Planning Layer (IcebergScanNode): Creates queue-based file sources
Execution Layer (IcebergDeleteBuilder and HashJoin): Consumes pre-organized scan ranges

The QueueIcebergRemoteFileInfoSource wrapper is the key abstraction - it presents a unified interface while actually pulling from specific queues. The backend executors don't need to understand MoR complexity fully; they just process scan ranges mostly.

This contrasts with other engines:

Spark: Makes delete handling decisions at task runtime
Trino: Builds delete filters on-demand during execution
Dremio: Integrates delete handling into vectorized execution

StarRocks trades runtime flexibility for predictability and simpler backend execution.

Putting it all together: The complete pipeline

Let's trace a query through the entire system:

Diagram of StarRocks Merge-on-Read processing. In the Frontend (FE), SQL queries trigger IcebergEqualityRewriteRule, producing TPlanNodes for ranges with equality deletes, equality deletes only, or without equality deletes. These are serialized to THDFSScanRange (Plan → Thrift). In the Backend/Compute Node (BE/CN), FileScanNodes and Parquet/ORC scanners handle data chunks, with IcebergDeleteBuilder applying positional deletes and HashJoiner processing equality deletes. Cleaned data chunks are me

End-to-end flow of Iceberg Merge-on-Read execution in StarRocks: the FE rewrites the query with IcebergEqualityRewriteRule, generates TPlanNode variants for data with and without equality deletes, and serializes them into THDFSScanRange. In the BE/CN, positional deletes are applied with IcebergDeleteBuilder, equality deletes flow through a hash joiner, and the final cleaned chunks are merged into query results.

Query arrives: SELECT * FROM iceberg_table WHERE date = '2025-01-01'
Optimizer kicks in: IcebergEqualityDeleteRewriteRule examines the table's delete files
Scan node splits: If equality deletes exist, creates multiple scan operators: one for data files without equality deletes, one for data files with equality deletes, one for each unique equality delete column combination.
Queue initialization: IcebergRemoteSourceTrigger sets up the appropriate queues
Metadata fetching begins: Async if enabled, pulling manifest files
Scan task routing: Each FileScanTask gets sorted into its queue
Incremental generation: Scan ranges produced in batches of 500 (or configured size)
Backend execution: Fast-lane files processed directly with position delete bitmaps, equality delete files scanned for their key values, anti-joins applied for data files with equality deletes
Results merged: All paths converge to produce the final result set

Configuration knobs you should know

# Enable incremental processing for large tables
enable_connector_incremental_scan_ranges
# Batch size for scan range generation (memory vs latency tradeoff)
connector_incremental_scan_ranges_size
# Async queue size for parallel metadata fetching
connector_remote_file_async_queue_size
# Prune manifests that don't match predicates
enable_prune_iceberg_manifest
# Handle tables with partition evolution
enable_read_iceberg_equality_delete_with_partition_evolution

The beauty of this design

What makes StarRocks' approach elegant is the separation of concerns:

Frontend handles all the metadata complexity and routing logic
Backend just executes simple operations: scan with bitmap, or scan for anti-join
Queues provide natural parallelism and work distribution
Incremental processing keeps memory bounded regardless of table size

It's turning Iceberg's complex MoR semantics into well-defined, parallelizable operations. The backend doesn't need to understand the full complexity of Iceberg's specification — it just needs to execute bitmap checks for position deletes and hash joins for equality deletes, using work packages the frontend has carefully prepared and labeled with MoR parameters.

The key insight is that the BE understands what to do (check bitmaps, perform anti-joins) but doesn't need to understand why (Iceberg's MoR specification, time-travel semantics, etc.). The FE handles the complexity of deciding which files need which treatment.

#data #apache-iceberg #starrocks #data-engineering #big-data