A classic data lakehouse is built on open-source table formats such as, Iceberg, or Hudi and seamlessly integrates with big data platforms like Apache Spark and event buses like Apache Kafka or Amazon Kinesis. The popularity of the data lakehouse stems from its ability to combine the quality, speed, and simple SQL access of data warehouses with the cost-effectiveness, scalability, and support for unstructured data of data lakes.
With the advent of generative AI models and the potential of using techniques such as Retrieval-augmented generation (RAG) in combination with fine-tuning or pre-training custom LLMs, a new paradigm has emerged in 2023: AI-infused lakehouses. These platforms use generative AI for code generation, natural language queries, and semantic search, LLM callouts from SQL, enhancing governance and automating documentation.
How do lakehouses adapt to the integration of new AI capabilities?
The live demonstration will include the continuous ingestion of IoT events through a declarative, serverless data pipeline. The live demo for the audience will process events originating from hundreds of phones in the audience amounting to around 100 million per day.
This talk is for data architects who are not afraid of some code, for data engineers who love open source and cloud services, and for practitioners who enjoy a fun end-to-end demo. The Databricks Lakehouse is used for the demos.
Frank Munz
Frank Munz works as a principal TM engineer at Databricks. 
Before joining Databricks, Frank authored three computer science books and played a pivotal role in establishing technical evangelism and developer relations for Amazon Web Services across Germany, Austria, and Switzerland. 
Frank presented at top-notch conferences on every continent excluding Antarctica due to his aversion to cold temperatures. Frank has been invited to speak at events such as re:Invent, Devoxx, Kubecon, VoxxedDays, Big Things, and Java One. He holds a Ph.D. with summa cum laude in Computer Science from TU Munich and once upon a time, he worked as a data scientist in a group that received a Nobel Prize for demonstrating the link between certain cancers and viruses.