Support for Window Functions
Initial (phase 1) Query runtime for window functions with ORDER BY within the OVER() clause (#10449)
Support for the ranking ROW_NUMBER() window function (#10527, #10587)
Set Operations Support
Support SetOperations (UNION, INTERSECT, MINUS) compilation in query planner (#10535)
Timestamp and Date Operations
Support TIMESTAMP type and date ops functions (#11350)
Aggregate Functions
Support more aggregation functions that are currently implementable (#11208)
Support multi-value aggregation functions (#11216)
Support Sketch based functions (#)
Make Intermediate Stage Worker Assignment Tenant Aware (#10617)
Evaluate literal expressions during query parsing, enabling more efficient query execution. (#11438)
Added support for partition parallelism in partitioned table scans, allowing for more efficient data retrieval (#11266).
[multistage]Adding more tuple sketch scalar functions and integration tests (#11517)
Multistage engine enhancements
Turn on v2 engine by default (#10543)
Introduced the ability to stream leaf stage blocks for more efficient data processing (#11472).
Early terminate SortOperator if there is a limit (#11334)
Implement ordering for SortExchange (#10408)
Table level Access Validation, QPS Quota, Phase Metrics for multistage queries (#10534)
Support partition based leaf stage processing (#11234)
Populate queryOption down to leaf (#10626)
Pushdown explain plan queries from the controller to the broker (#10505)
Enhanced the multi-stage group-by executor to support limiting the number of groups, improving query performance and resource utilization (#11424).
Improved resilience and reliability of the multi-stage join operator, now with added support for hash join right table protection (#11401).
Multistage engine bug fixes
Fix Predicate Pushdown by Using Rule Collection (#10409)
Try fixing mailbox cancel race condition (#10432)
Catch Throwable to Propagate Proper Error Message (#10438)
Fix tenant detection issues (#10546)
Handle Integer.MIN_VALUE in hashCode based FieldSelectionKeySelector (#10596)
Improve error message in case of non-existent table queried from the controller (#10599)
Derive SUM return type to be PostgreSQL compatible (#11151)
Index SPI
Add the ability to include new index types at runtime in Apache Pinot. This opens the ability of adding third party indexes, including proprietary indexes. More details here
Null value support for pinot queries
NULL support for ORDER BY, DISTINCT, GROUP BY, value transform functions and filtering.
Upsert enhancements
Delete support in upsert enabled tables (#10703)
Support added to extend upserts and allow deleting records from a realtime table. The design details can be found here.
Preload segments with upsert snapshots to speedup table loading (#11020)
Adds a feature to preload segments from a table that uses the upsert snapshot feature. The segments with validDocIds snapshots can be preloaded in a more efficient manner to speed up the table loading (thus server restarts).
TTL configs for upsert primary keys (#10915)
Adds support for specifying expiry TTL for upsert primary key metadata cleanup.
Segment compaction for upsert real-time tables (#10463)
Adds a new minion task to compact segments belonging to a real-time table with upserts.
Pinot Spark Connector for Spark3 (#10394)
Added spark3 support for Pinot Spark Connector (#10394)
Also added support to pass pinot query options to spark connector (#10443)
PinotDataBufferFactory and new PinotDataBuffer implementations (#10528)
Adds new implementations of PinotDataBuffer that uses Unsafe java APIs and foreign memory APIs. Also added support for PinotDataBufferFactory to allow plugging in custom PinotDataBuffer implementations.
Query functions enhancements
Add PercentileKLL aggregation function (#10643)
Support for ARG_MIN and ARG_MAX Functions (#10636)
refactor argmin/max to exprmin/max and make it calcite compliant (#11296)
JSON and CLP encoded message ingestion and querying
Add clpDecode transform function for decoding CLP-encoded fields. (#10885)
Add CLPDecodeRewriter to make it easier to call clpDecode with a column-group name rather than the individual columns. (#11006)
Add SchemaConformingTransformer to transform records with varying keys to fit a table's schema without dropping fields. (#11210)
Tier level index config override (#10553)
Allows overriding index configs at tier level, allowing for more flexible index configurations for different tiers.
Ingestion connectors and features
Kinesis stream header extraction (#9713)
Extract record keys, headers and metadata from Pulsar sources (#10995)
Realtime pre-aggregation for Distinct Count HLL & Big Decimal (#10926)
Added support to skip unparseable records in the csv record reader (#11487)
Null support for protobuf ingestion. (#11553)
UI enhancements
Adds persistence of authentication details in the browser session. This means that even if you refresh the app, you will still be logged in until the authentication session expires (#10389)
AuthProvider logic updated to decode the access token and extract user name and email. This information will now be available in the app for features to consume. (#10925)
Pinot docker image improvements and enhancements
Make Pinot base build and runtime images support Amazon Corretto and MS OpenJDK (#10422)
Support multi-arch pinot docker image (#10429)
Update dockerfile with recent jdk distro changes (#10963)
Operational improvements
Rebalance
Rebalance status API (#10359)
Tenant level rebalance API Tenant rebalance and status tracking APIs (#11128)
Config to use customized broker query thread pool (#10614)
Added new configuration options below which allow use of a bounded thread pool and allocate capacities for it.
This feature allows better management of broker resources.
Drop results support (#10419)
Adds a parameter to queryOptions to drop the resultTable from the response. This mode can be used to troubleshoot a query (which may have sensitive data in the result) using metadata only.
Make column order deterministic in segment (#10468)
In segment metadata and index map, store columns in alphabetical order so that the result is deterministic. Segments generated before/after this PR will have different CRC, so during the upgrade, we might get segments with different CRC from old and new consuming servers. For the segment consumed during the upgrade, some downloads might be needed.
Allow configuring helix timeouts for EV dropped in Instance manager (#10510)
Adds options to configure helix timeouts
external.view.dropped.max.wait.ms`` - The duration of time in milliseconds to wait for the external view to be dropped. Default - 20 minutes. external.view.check.interval.ms`` - The period in milliseconds in which to ping ZK for latest EV state.
Enable case insensitivity by default (#10771)
This PR makes Pinot case insensitive be default, and removes the deprecated property enable.case.insensitive.pql
Newly added APIs and client methods
Add Server API to get tenant pools (#11273)
Add new broker query point for querying multi-stage engine (#11341)
Add a new controller endpoint for segment deletion with a time window (#10758)
New API to get tenant tags (#10937)
Instance retag validation check api (#11077)
Use PUT request to enable/disable table/instance (#11109)
Update the pinot tenants tables api to support returning broker tagged tables (#11184)
Add requestId for BrokerResponse in pinot-broker and java-client (#10943)
Provide results in CompletableFuture for java clients and expose metrics (#10326)
Cleanup and backward incompatible changes
High level consumers are no longer supported
Cleanup HLC code (#11326)
Remove support for High level consumers in Apache Pinot (#11017)
Type information preservation of query literals
[feature] [backward-incompat] [null support # 2] Preserve null literal information in literal context and literal transform (#10380)
String versions of numerical values are no longer accepted. For example, "123" won't be treated as a numerical anymore.
Controller job status ZNode path update
Moving Zk updates for reload, force_commit to their own Znodes which … (#10451)
The status of previously completed reload jobs will not be available after this change is deployed.
Metric names for mutable indexes to change
Implement mutable index using index SPI (#10687)
Due to a change in the IndexType enum used for some logs and metrics in mutable indexes, the metric names may change slightly.
Update in controller API to enable / disable / drop instances
Update getTenantInstances call for controller and separate POST operations on it (#10993)
Change in substring query function definition
Change substring to comply with standard sql definition (#11502)
Full list of features added
Allow queries on multiple tables of same tenant to be executed from controller UI #10336
Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig #10352