Skip to main content

6 posts tagged with "databricks"

Databricks platform related topics and usage

View All Tags

Spice v1.0-stable (Jan 20, 2025)

ยท 8 min read
William Croxson
Senior Software Engineer at Spice AI

๐ŸŽ‰ After 47 releases, Spice.ai OSS has reached production readiness with the 1.0-stable milestone!

The core runtime and features such as query federation, query acceleration, catalog integration, search and AI-inference have all graduated to stable status along with key component graduations across data connectors, data accelerators, catalog connectors, and AI model providers.

Highlights in v1.0-stableโ€‹

Breaking Changesโ€‹

  • Default Runtime Version: The CLI will install the GPU accelerated AI-capable Runtime by default (if supported), when running spice install or spice run. To force-install the non-GPU version, run spice install ai --cpu.

  • Default OpenAI Model: The default OpenAI model has updated to gpt-4o-mini.

  • Identifier Normalization: Unquoted identifiers such as table names are no longer normalized to lowercase. Identifiers will now retain their exact case as provided.

  • Sandboxed Docker Image: The Runtime Docker Image now runs the spiced process as the nobody user in a minimal chroot sandbox.

  • Insecure S3 and ABFS endpoints: The S3 and ABFS connectors now enforce insecure endpoint checks, preventing HTTP endpoints unless allow_http is explicitly enabled. Refer to the documentation for details.

Spice v1.0-rc.2 (Dec 16, 2024)

ยท 8 min read
William Croxson
Senior Software Engineer at Spice AI

Announcing the release of Spice v1.0-rc.2 ๐Ÿ”—

Spice v1.0.0-rc.2 is the second release candidate for the first major version of Spice.ai OSS. This release continues to build on the stability of Spice for production use, including key Data Connector graduations, bug fixes, and AI features.

Highlights in v1.0-rc.2โ€‹

  • MS SQL and File Data Connectors: Graduated from Alpha to Beta.

  • GraphQL and Databricks Delta Lake Data Connectors: Graduated from Beta to Release Candidate.

  • gospice SDK Release: The Spice Go SDK has updated to v7.0, adding support for refreshing datasets and upgrading dependencies.

  • Azure AI Support: Added support for both LLMs and embedding models. Example spicepod.yml configuration:

embeddings:
- name: azure
from: azure:text-embedding-3-small
params:
endpoint: https://your-resource-name.openai.azure.com
azure_api_version: 2024-08-01-preview
azure_deployment_name: text-embedding-3-small
azure_api_key: ${ secrets:SPICE_AZURE_API_KEY }
models:
- name: azure
from: azure:gpt-4o-mini
params:
endpoint: https://your-resource-name.openai.azure.com
azure_api_version: 2024-08-01-preview
azure_deployment_name: gpt-4o-mini
azure_api_key: ${ secrets:SPICE_AZURE_TOKEN }

Accelerate subsets of columns: Spice now supports acceleration for specific columns from a federated source. Specify the desired columns directly in the Refresh SQL for more selective and efficient data acceleration.

Example spicepod.yaml configuration:

datasets:
- from: s3://spiceai-demo-datasets/taxi_trips/2024/
name: taxi_trips
params:
file_format: parquet
acceleration:
refresh_sql: SELECT tpep_pickup_datetime, tpep_dropoff_datetime, trip_distance, total_amount FROM taxi_trips

Spice v0.15.2-alpha (July 15, 2024)

ยท 4 min read
Luke Kim
Founder and CEO of Spice AI

The v0.15.2-alpha minor release focuses on enhancing stability, performance, and introduces Catalog Providers for streamlined access to Data Catalog tables. Unity Catalog, Databricks Unity Catalog, and the Spice.ai Cloud Platform Catalog are supported in v0.15.2-alpha. The reliability of federated query push-down has also been improved for the MySQL, PostgreSQL, ODBC, S3, Databricks, and Spice.ai Cloud Platform data connectors.

Highlights in v0.15.2-alphaโ€‹

Catalog Providers: Catalog Providers streamline access to Data Catalog tables. Initial catalog providers supported are Databricks Unity Catalog, Unity Catalog and Spice.ai Cloud Platform Catalog.

For example, to configure Spice to connect to tpch tables in the Spice.ai Cloud Platform Catalog use the new catalogs: section in the spicepod.yml:

catalogs:
- name: spiceai
from: spiceai
include:
- tpch.*
sql> show tables
+---------------+--------------+---------------+------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------+---------------+------------+
| spiceai | tpch | region | BASE TABLE |
| spiceai | tpch | part | BASE TABLE |
| spiceai | tpch | customer | BASE TABLE |
| spiceai | tpch | lineitem | BASE TABLE |
| spiceai | tpch | partsupp | BASE TABLE |
| spiceai | tpch | supplier | BASE TABLE |
| spiceai | tpch | nation | BASE TABLE |
| spiceai | tpch | orders | BASE TABLE |
| spice | runtime | query_history | BASE TABLE |
+---------------+--------------+---------------+------------+

Time: 0.001866958 seconds. 9 rows.

ODBC Data Connector Push-Down: The ODBC Data Connector now supports query push-down for joins, improving performance for joined datasets configured with the same odbc_connection_string.

Improved Spicepod Validation Improved spicepod.yml validation has been added, including warnings when loading resources with duplicate names (datasets, views, models, embeddings).

Spice v0.15.1-alpha (July 8, 2024)

ยท 4 min read
Luke Kim
Founder and CEO of Spice AI

The v0.15.1-alpha minor release focuses on enhancing stability, performance, and usability. Memory usage has been significantly improved for the postgres and duckdb acceleration engines which now use stream processing. A new Delta Lake Data Connector has been added, sharing a delta-kernel-rs based implementation with the Databricks Data Connector supporting deletion vectors.

Highlightsโ€‹

Improved memory usage for PostgreSQL and DuckDB acceleration engines: Large dataset acceleration with PostgreSQL and DuckDB engines has reduced memory consumption by streaming data directly to the accelerated table as it is read from the source.

Delta Lake Data Connector: A new Delta Lake Data Connector has been added for using Delta Lake outside of Databricks.

ODBC Data Connector Streaming: The ODBC Data Connector now streams results, reducing memory usage, and improving performance.

GraphQL Object Unnesting: The GraphQL Data Connector can automatically unnest objects from GraphQL queries using the unnest_depth parameter.

Spice v0.11.1-alpha (April 22, 2024)

ยท 3 min read
Luke Kim
Founder and CEO of Spice AI

The v0.11.1-alpha release introduces retention policies for accelerated datasets, native Windows installation support, and integration of catalog and schema settings for the Databricks Spark connector. Several bugs have also been fixed for improved stability.

Highlightsโ€‹

  • Retention Policies for Accelerated Datasets: Automatic eviction of data from accelerated time-series datasets when a specified temporal column exceeds the retention period, optimizing resource utilization.

  • Windows Installation Support: Native Windows installation support, including upgrades.

  • Databricks Spark Connect Catalog and Schema Settings: Improved translation between DataFusion and Spark, providing better Spark Catalog support.