DDL: Data Mesh - Lessons from the Field

DataHub
14 Mar 202447:09

Summary

TLDR在本集DDL节目中,AutoTrader的工程总监Darren Hacken与主持人、ACIL联合创始人兼CTO Shashanka讨论了数据领域的演变和数据网(Data Mesh)的概念。Darren分享了他个人的职业经历,以及AutoTrader如何通过分散数据团队来提高数据处理能力。他们还探讨了数据网的实施,包括如何通过数据产品和元数据管理来实现更好的数据治理和可观察性。Darren对数据网的未来充满期待,认为它将帮助组织以更分散的方式构建和利用数据产品。

Takeaways

  • 🎉 Darren Hacken 是 AutoTrader 的工程总监,负责平台和数据,该公司是英国最大的汽车平台。
  • 🚀 Darren 初期对数据工作不感兴趣,但随着大数据技术的兴起,他对数据领域产生了热情。
  • 🌐 AutoTrader 的数据团队设置相对分散,有多个平台团队和专注于特定问题领域的数据团队。
  • 🔄 数据团队的演变从集中式到分散式,反映了随着组织规模的扩大,对数据管理方式的适应。
  • 🤖 数据网格(Data Mesh)是一种社会技术概念,强调了文化和团队结构的重要性,以及如何实现去中心化。
  • 🛠️ 实施数据网格的过程中,AutoTrader 遇到了技术工具集中化与去中心化需求之间的差距。
  • 📊 通过 Kubernetes 和 Data Hub,AutoTrader 正在构建数据产品的思维和实践,以提高数据的可发现性和治理。
  • 🔧 数据网格的实施带来了对数据产品命名和数据建模实践的新挑战。
  • 🌟 Darren 认为数据产品的概念是数据网格中最有力的部分,它有助于更好地组织和利用数据。
  • 🚫 数据网格的实施并非一蹴而就,需要时间和持续的技术进步来克服现有的挑战。
  • 🔮 未来,Darren 期待数据网格和数据产品能够进一步推动组织内部的数据使用和创新,特别是在 AI 和 ML 领域。

Q & A

  • Darren Hacken目前担任什么职位?

    -Darren Hacken目前担任AutoTrader公司的工程总监,负责平台和数据方面的工作。

  • AutoTrader公司主要业务是什么?

    -AutoTrader公司是一个汽车市场和科技平台,主要业务是作为英国最大的汽车平台,涉及买卖汽车等相关服务。

  • Darren Hacken对于数据领域有哪些看法?

    -Darren Hacken非常关注数据领域,他认为数据是非常重要的,可以塑造和改变组织,并且随着AI和ML等技术的发展,数据领域一直在成长。

  • Darren Hacken的职业经历中有哪些转变?

    -Darren Hacken在职业生涯初期并不喜欢数据相关工作,因为他不喜欢基于ETL工具的重复性工作。但随着大数据技术的兴起,他发现数据领域变得非常吸引人,最终成为了他热爱的领域。

  • Darren Hacken提到的数据产品是什么?

    -数据产品是指将数据和相关功能捆绑在一起的产品,它可以帮助组织更有效地管理和使用数据,支持数据的发现、分析和治理。

  • AutoTrader公司的数据团队是如何运作的?

    -AutoTrader公司的数据团队是分散式的,有多个平台团队和数据团队,他们专注于不同的业务领域,如广告、用户行为、车辆定价等,并致力于构建数据产品和提供自助分析服务。

  • Darren Hacken如何看待数据治理和元数据管理?

    -Darren Hacken认为数据治理和元数据管理是实现数据分散化后的关键需求,特别是在数据产品之间建立清晰的所有权和依赖关系,以及确保数据的质量和安全性。

  • Darren Hacken提到了哪些技术在数据领域的应用?

    -Darren Hacken提到了DBT、Kubernetes、Cuberes、数据Hub等技术在数据领域的应用,这些技术帮助他们实现了数据产品的创建、管理和治理。

  • Darren Hacken对于数据领域的未来有哪些期待?

    -Darren Hacken期待数据产品的概念能够更加深入人心,同时他也希望看到更多支持数据分散化的技术出现,使得数据管理和治理变得更加容易。

  • Darren Hacken如何看待数据领域的挑战?

    -Darren Hacken认为数据领域的挑战在于如何保持数据质量和实践的高标准,以及如何在没有中央团队的情况下维持这些标准。此外,数据命名和建模也是持续存在的挑战。

  • Darren Hacken对于数据合同有何看法?

    -Darren Hacken认为数据合同是一个有趣的领域,他们目前更多地隐含地使用数据合同,通过标准化的方法和验证器来检测模式变化,并对未来数据合同的发展持开放态度。

Outlines

00:00

🎤 开场与介绍

本段介绍了视频节目的开场,主持人表达了对讨论话题的兴奋之情,并欢迎嘉宾Darren Hacken加入节目。Darren是AutoTrader的工程总监,负责平台和数据。主持人Shashanka是acil的联合创始人和CTO,也是数据Hub项目的创始人。Darren分享了他与数据结缘的经历,以及他如何从不喜欢数据工作转变为对数据充满热情。

05:01

🔍 数据团队的结构与运作

Darren描述了AutoTrader的数据团队结构,包括平台团队和专注于特定领域的数据团队。他强调了数据团队的去中心化,以及如何通过构建数据平台来支持组织中的数据能力。他还提到了数据团队与其他团队的互动,以及如何围绕问题组织团队。

10:03

🌐 数据网格的理解和实践

Darren分享了他对数据网格的理解,将其视为一种社会技术实践和文化转变。他提到了数据网格的起源和它如何帮助组织实现去中心化。Darren讨论了他们如何开始应用数据网格原则,特别是在技术架构上从集中式模型转变为更加分散的数据产品。

15:04

🛠️ 数据产品的治理与挑战

Darren讨论了在实施数据网格过程中遇到的挑战,特别是在数据治理、元数据管理和可观察性方面。他提到了技术工具在支持去中心化方面的不足,并分享了他们如何使用元数据和数据Hub来解决这些问题。

20:05

🔄 数据产品的创建与管理

Darren解释了他们如何通过使用Kubernetes作为控制平面来创建和管理数据产品。他讨论了如何通过自动化和代码化的方式来处理数据产品的元数据,并分享了他们如何使用数据Hub来收集和连接数据产品。

25:07

🤔 数据网格的挑战与未来

Darren探讨了数据网格在组织中可能带来的架构压力,以及如何在没有中央团队的情况下保持数据实践的质量。他还提到了数据命名和建模的挑战,以及他们如何使用数据合同来隐含地处理这些问题。

30:09

🚀 数据网格的未来展望

Darren对未来的数据网格和数据产品表示兴奋。他预见了数据产品思维将如何帮助组织更好地利用数据,以及数据网格如何帮助缩短产品上市时间并提高市场响应速度。他还提到了AI和数据产品如何相互促进,并对未来的技术发展表示乐观。

35:11

🙌 结语与感谢

节目的最后,主持人Shashanka感谢Darren的参与和分享,并对未来的合作表示期待。他们讨论了数据产品和数据网格的未来,以及如何通过社区和开源项目来推动这些概念的发展。

Mindmap

对数据产品的看法
个人对数据网格的看法
数据网格的行业趋势
AI与数据网格
数据合同的应用
数据治理的挑战
数据产品的前景
AutoTrader的实践
实施数据网格的挑战
Darren对数据网格的看法
数据网格的定义
团队间的互动
数据团队的角色
组织结构
Darren的职业经历
嘉宾Darren Hacken
节目主持人Shashanka
行业趋势与个人感悟
未来展望与挑战
数据网格(Data Mesh)的理解和实践
AutoTrader的数据团队与文化
节目介绍与嘉宾背景
数据网格(Data Mesh)的实践与未来展望
Alert

Keywords

💡数据网格(Data Mesh)

数据网格是一种数据架构模式,旨在通过分散数据所有权和责任来促进组织内的数据管理。在视频中,Darren提到数据网格是关于如何实现去中心化的方法和原则,以及如何通过数据产品思维来提高数据的利用和治理。

💡数据产品(Data Products)

数据产品是指将数据和相关功能打包在一起,以便用户可以轻松访问和使用的数据集合。在视频中,Darren强调了数据产品的重要性,它们如何帮助组织更好地理解和利用其数据资源。

💡元数据(Metadata)

元数据是关于数据的数据,它提供了有关数据内容、来源、格式和结构的信息。在视频中,Darren讨论了元数据在实现数据网格和数据产品中的关键作用,特别是在数据治理和可观察性方面。

💡数据治理(Data Governance)

数据治理是一套流程、政策和标准,旨在确保数据的质量、可用性和一致性。在视频中,Darren讨论了在实施数据网格过程中,如何通过元数据和数据产品来改善数据治理。

💡数据平台(Data Platform)

数据平台是指支持数据存储、处理、分析和可视化的技术和工具集合。在视频中,Darren作为AutoTrader的数据平台负责人,分享了他们如何构建和维护支持数据网格的数据平台。

💡数据团队(Data Teams)

数据团队是指专注于数据相关任务的专业人员集合,包括数据工程师、数据科学家和分析师等。在视频中,Darren讨论了AutoTrader的数据团队如何围绕业务领域和问题进行组织,并如何通过数据网格模式进行工作。

💡数据所有权(Data Ownership)

数据所有权是指对数据资产的管理和控制权。在数据网格架构中,数据所有权通常是分散的,每个团队或部门负责管理与其业务领域相关的数据。

💡数据质量(Data Quality)

数据质量是指数据的准确性、完整性和一致性。在视频中,Darren强调了在数据网格模式下,如何通过数据产品和元数据来提高数据质量,以及如何通过技术手段来确保数据的可靠性。

💡数据发现(Data Discovery)

数据发现是指在大量数据中找到相关和有价值的信息的过程。在数据网格架构中,数据发现尤为重要,因为它有助于用户理解和利用分散的数据资源。

💡数据合同(Data Contracts)

数据合同是指在数据提供者和消费者之间就数据格式、结构和使用规则达成的协议。在视频中,Darren提到了数据合同在确保数据产品之间兼容性和一致性方面的重要性。

💡数据架构(Data Architecture)

数据架构是指组织数据的蓝图,包括数据的存储、处理、管理和使用方式。在视频中,Darren讨论了数据网格如何作为一种数据架构模式,帮助组织实现数据的去中心化管理。

Highlights

Darren分享了自己对数据领域的热情以及其在AutoTrader的角色和职责。

Darren讲述了自己职业生涯的转变,从最初不喜欢数据工作到成为数据领域的领导者。

AutoTrader的数据团队结构是分散式的,有专门针对不同领域如广告和用户行为的数据团队。

Darren解释了数据产品的概念以及如何通过数据产品实现团队间的协作和数据共享。

AutoTrader在数据平台建设上面临的挑战,特别是在技术分界和数据治理方面。

Darren讨论了数据网格(Data Mesh)的概念以及它如何帮助组织实现数据的去中心化。

Darren分享了AutoTrader实施数据网格的经验,包括技术挑战和文化变革。

讨论了数据治理、元数据管理和可观察性在数据网格实施中的重要性。

Darren提到了使用Kubernetes作为数据产品的控制平面,并如何通过自动化提高效率。

讨论了数据网格的未来,以及它如何影响组织内部的数据使用和产品开发。

Darren对于数据产品和数据合同在数据网格中的作用和未来发展的展望。

讨论了数据网格的挑战,包括如何保持数据质量和实践中的困难。

Darren分享了对于数据网格概念未来的看法,以及它如何适应不断变化的技术环境。

讨论了数据网格如何帮助组织更好地利用数据,并提高决策的速度和质量。

Darren对于数据网格和数据产品的未来发展表示乐观,并期待技术的进步。

Transcripts

00:18

[Music]

00:23

[Music]

00:41

[Music]

01:05

[Music]

01:08

hello everyone and welcome to episode

01:11

four of the ddl show I am so excited

01:16

that we're going to be talking about a

01:17

topic that used to be exciting and has

01:21

stopped being exciting and that itself

01:23

is exciting so I'm super excited to

01:25

bring on Darren hacken uh I think our

01:28

first conversation Darren was literally

01:32

on the data mesh learning group first

01:34

time we met um and so it's it's kind of

01:37

a full circle I'm super excited to

01:40

welcome you to the show Darren is an

01:41

engineering director heading up uh

01:43

platform and data at AutoTrader and I'm

01:46

your host shashanka co-founder and CTO

01:49

at uh acil and founder of the data Hub

01:52

project so Darren tell us uh about

01:54

yourself and how you got into Data hi

01:57

shash thank you for having me today um

02:00

yeah so my name is Darren I work for a

02:02

company in the UK in the United Kingdom

02:04

called aut Trader so we're a

02:07

automotive Marketplace and Technology

02:09

platform that drives it's the UK's

02:11

largest um Automotive platform so buying

02:14

and selling cars that kind of thing and

02:17

one of the areas I deeply deeply care

02:18

about is is the data space um so here at

02:21

aut Trader I kind of look after our kind

02:24

of data platform um the capabilities

02:27

that we need in order to surface data

02:29

been working in data a long time now

02:31

maybe eight nine years um I my I Funny

02:37

Story I v I would never work in data

02:40

because when I started my career I

02:43

worked in fintech for in a in a data

02:46

team and I absolutely hated it because

02:48

it was all guwy based ETL tools and I

02:53

got out of this F as I possibly could

02:54

and said never again I love engineering

02:57

I you know I'm a coder I need to get

03:00

away and do this other thing you don't

03:03

like pointing and clicking clearly I

03:05

didn't like pointing and clicking I like

03:07

I like code um and then it kind of got

03:10

really sexy and big data and technology

03:13

changed and I think it's one of the most

03:15

exciting areas of Technology now so

03:18

never say never is probably my I always

03:20

find that a funny kind of starting point

03:22

for me in terms of data to leave a leave

03:24

a rooll and go never again and here I am

03:27

um so yeah passionate about data really

03:30

think it's one of them things that

03:32

really can shape and change

03:33

organizations it's um and it's it's

03:36

growing all the time right with things

03:37

like Ai and LMS and hype Cycles around

03:40

things like that but yeah thanks for

03:43

having me they do say data has gravity

03:45

and you know uh normally it's like

03:47

pulling other data close to it but uh

03:51

clearly people also get attracted to it

03:53

and can never leave I was literally the

03:55

same way uh well I never went to data

03:57

and I wasn't able to leave so I was um

04:01

you know an engineer on the um online

04:03

data infrastructure teams right so I was

04:05

uh doing U display ads and uh doing

04:08

real-time bidding on ads at Yahoo and

04:12

then I uh was offered the uh chance of a

04:16

lifetime to go rebuild linkedin's data

04:19

infrastructure and I didn't actually

04:21

know what data meant at that point I was

04:23

scared of databases honestly because you

04:25

know it's hard to build something that's

04:27

supposed to be a source of Truth like

04:29

wait you're responsible for actually

04:31

making sure the right actually made it

04:32

to dis and it actually got flushed and

04:34

was replicated three times so that no

04:37

one loses an update well that seems like

04:39

a hard problem so you know that was my

04:42

mission impossible that I went to

04:43

LinkedIn for and I never left I've just

04:45

been in data this whole time so can

04:48

totally relate you never escape the

04:51

gravity you do not um so well so you're

04:55

you're leading big uh teams at auto

04:58

trader right now you know platform and

05:00

data tell me a little bit about what

05:03

that team does because you know as I

05:05

have talked to so many data leaders

05:08

around the world it seems clear to me

05:10

that all data teams are similar but not

05:13

all teams are exactly the same so maybe

05:16

walk our audience through what does the

05:19

data team do and who are the surrounding

05:21

teams and how do they interact with them

05:24

yeah um so we've so interestingly aut

05:28

Trader as a or A's been around for about

05:31

40 years so they started as a magazine

05:34

you could go into your you know local

05:36

store and find the magazine and pick it

05:39

up so that's interestingly means that as

05:41

Technologies evolved throughout the

05:42

decades you know they've gone through

05:44

many chapters of of it um but today

05:48

we're relatively decentralized in terms

05:50

of our data team setup and you know

05:52

we'll get into that I guess a little bit

05:53

more when we talk about data mesh today

05:57

um but we have a kind of platform team

06:00

so we have several platform teams and we

06:02

have a platform team um predominantly

06:04

built made up of Engineers and kind of

06:06

Sr de you know folks and they build um

06:11

what we call our data platform and that

06:13

is the kind of product name I guess for

06:15

the bundling of

06:17

technology which would would help Drive

06:20

data capabilities across the

06:21

organization you know that might be

06:23

building data products which we can get

06:25

into later it could be um metadata

06:28

management how to create security

06:30

policies with data um but crucially

06:32

their play is about building

06:34

capabilities that let other people um

06:36

lose these capabilities and and build

06:38

technology and other than that we try to

06:40

keep data teams closer to um the domain

06:44

of of a of an area or a problem so we

06:47

may have data teams we focus a lot on

06:50

like advertising or user Behavior maybe

06:53

more around like vehicles and pricing

06:55

and fulfillment type problems um but we

06:58

we tend to have kind of Engineers or

07:00

Engineers that specialize in data um

07:03

scientists and analysts so they they're

07:06

kind of as a discipline together and

07:08

manage together from a craft perspective

07:10

but then in terms of how how they work

07:12

together we chend to form form them

07:14

around problems um pricing as I said

07:18

earlier and things like that and they

07:19

would maybe do analytics self- serve

07:22

analytics um product analytics machine

07:26

learning um you know feature engineering

07:30

very much that kind of thing and we're

07:31

trying to keep it as close to kind of

07:33

engineering as as possible so very much

07:35

a decentralized play or that's been our

07:38

current our current generation of people

07:41

wear and team topologies um got it got

07:44

it and by the way for the audience who's

07:48

listening in um definitely uh feel free

07:51

to ask questions we'll we'll try to pull

07:53

them up uh as they come in so you know

07:55

this is meant to be me talking to Darren

07:57

and Darren talking to me and all of you

07:59

being uh having the ability to kind of

08:01

participate in the conversation so um

08:04

definitely as we keep talking about this

08:06

topic uh keep asking questions and we'll

08:08

try to pull them up and um combine them

08:11

so Darren you talked a little bit about

08:13

how the teams were structured it

08:15

definitely resonated with kind of how uh

08:17

LinkedIn evolved over the over the years

08:20

I was there we started out uh with uh a

08:24

single data team that was uh responsible

08:27

for both platform as well as

08:30

uh business so you know they were

08:33

responsible for making decisions like

08:34

what warehousing technology to use and

08:37

how to go about it and then but also

08:39

building the executive dashboard and

08:42

building the foundational data sets we

08:45

had so many debates about whether to

08:47

call them foundational or gold but the

08:50

concept was still the same you build

08:52

kind of the the the canonical business

08:55

model on top of which you want all um

08:59

insights as well as you know analytics

09:02

as well as AI to be derived from and

09:04

then over the years we definitely had a

09:07

lot of stress with that centralization

09:10

and had to kind of split apart the

09:13

responsibilities uh we ended up going to

09:16

a model where there was essentially a

09:18

data unaware or semantics unaware team

09:21

that was fully responsible just for the

09:24

platform and um sub teams that emerged

09:28

out of those out of that original team

09:31

that sometimes got fully embedded into

09:34

product delivery teams to actually um

09:37

essentially have a local Loop where

09:39

product gets built data comes out of it

09:43

and then the whole Loop of creating

09:46

insights models and features and then

09:48

shipping it back into the product was

09:50

all owned and operated by um a specific

09:53

team so it looks like that's kind of

09:54

where you've ended up as well yeah in

09:58

fact that's spookily similar I mean we

10:00

started definitely more centralized and

10:03

then these teams sort of came out of

10:06

that more centralized model so like we

10:08

we built a team about use behavior and

10:10

advertising kind of build that that went

10:13

really well and then they felt a lot

10:15

more connected and it did evolve like

10:16

that um and and a lot of this I think

10:18

just spawns from scale really so I mean

10:22

my organization is definitely another

10:23

the figers where you were previously

10:25

working shashanka but we definitely find

10:28

that you know the more hungaryan

10:29

organization gets for data eventually

10:32

you you simply can't keep up with this

10:34

centralized team with this scarcity of

10:36

resource and everyone fighting over the

10:37

same thing gets really hard to think

10:39

about you know do I invest in the

10:41

finance team do I uh invest in our

10:44

advertising or our marketing team so

10:45

like eventually like partitioning almost

10:47

your resource in some way feels

10:50

inevitable that you have to to otherwise

10:52

it becomes it becomes so

10:55

hard cool so let's let's let's talk

10:58

about the topic of

10:59

the day what does data mesh mean to you

11:02

then now that we've kind of understood

11:03

how the teams have evolved and what your

11:06

uh teams are doing day today yeah and I

11:09

think it's a really good point that we

11:11

started around teams and culture

11:14

actually because that is really what I

11:17

think the heart of what J mesh is um so

11:21

I I used to work um For Thought Works

11:23

where shaku also kind of came up with

11:26

the the data mesh thing um kind of came

11:28

from and I I wasn't working at the time

11:31

but I remember reading it and we've

11:34

we're already on this journey of like we

11:36

need to decentralize and our platform is

11:39

really important to us and we need

11:41

capabilities and we want more people to

11:43

do that and in fact you know we were

11:46

succeeding at decentralizing and scaling

11:49

um but I think when we did that we were

11:51

entering new spaces where a lot of

11:53

people hadn't really talked about it so

11:55

for me data mesh one of the things that

11:57

it means it's a you know socio technical

12:00

thing a cultural thing it's like devops

12:02

really or something like that for me

12:04

she's done a great job describing how

12:07

to

12:09

um you know get there like data products

12:13

and all this kind of thing but one of

12:14

the great things I think that J did with

12:17

talking about dat mesh was built a

12:18

lexicon a grammar a way of us all

12:21

communicating to each other like

12:22

shashanka me and you met on a on a data

12:25

mesh you know community and immediately

12:28

we we were able to speak at a level that

12:31

we simply wouldn't have been able to

12:32

maybe if we would have met five years

12:34

ago and try to have the same

12:36

conversation um so a lot of it's that

12:38

for me that's what data mesh is it's

12:39

about it's a it's a method or an

12:41

architectural pattern or set of

12:43

principles or guidelines about how you

12:45

could achieve decentralization and and

12:48

move away from this this Central team

12:51

and kind of break apart from it um and

12:53

that has been and that has been the big

12:56

draw right to of of the concepts because

12:59

a lot of people relate to it uh and kind

13:02

of resonate with it and then that from

13:06

that um what is it Summit of Hope comes

13:09

the the valley of Despair where you you

13:13

start figuring out okay how do I

13:15

translate this idea into reality and how

13:19

much do I need to change um so walk us

13:22

through your journey of like how have

13:24

you implemented data mesh how have you

13:26

taken these principles and brought them

13:29

to life or at least attempted to bring

13:31

them to life and we'll see how you feel

13:32

about it like would you give yourself an

13:34

a grade or a b-grade we we'll we'll

13:37

figure that out later but what have you

13:38

done in in BR to life so so at the point

13:44

when we started trying to apply data

13:47

mesh um we were in this place where we

13:49

we decentralized some of our teams but

13:52

our technology underneath is still very

13:54

much centralized and shared so almost

13:56

like a monolith with teams contributing

13:59

to it but everything was partitioned or

14:03

structured around technology so we'd end

14:05

up with I don't know a DBT projects or

14:07

something right or we had a monolith

14:09

around spark jobs and things it's very

14:12

technology partitioned um and then when

14:15

we started looking at data mesh we were

14:17

really excited because one of the big

14:19

things that we took out it was this term

14:22

data product and we're like great we've

14:24

now found this this this language to

14:27

describe how we were going to try and

14:29

break things down like before that we

14:31

were trying to break break you know lots

14:33

of data down into chunks of data but we

14:36

just couldn't think of like the wording

14:37

gave us a lot more power to to start

14:39

communicating so we we started trying to

14:41

break down our DBT monolith essentially

14:45

into Data products um and that's been

14:47

one of our journeys of like breaking it

14:49

partitioning it and doing that so that

14:51

was the big starting point of doing that

14:55

um so it was very much like we had some

14:57

teams that were decentralized and then

14:59

like how do we almost catch the

15:01

technology up so DBT was the starting

15:04

point of

15:06

that so you went from a monolithic repo

15:09

where all of your transformation logic

15:12

was being

15:14

hosted to chopping it up and um

15:17

splitting it up uh across multiple

15:19

different teams um great so once you did

15:23

that what did you then

15:25

find well then you find that the tooling

15:28

and system that we've got today has some

15:31

gaps when you start to think about

15:33

decentralization like a lot of the

15:35

technologies that we use in the data

15:36

space do promote very much very Central

15:39

centralized approach um like I think

15:42

it's becoming a little bit less popular

15:43

but you know airflow it' be like one

15:45

airflow for your whole

15:46

organization EBT might say one big

15:49

projects even though they are saying

15:50

that less now but there was definitely a

15:52

period where like you know that was the

15:54

that was the popular approach so we you

15:58

broke things apart

15:59

and now you've got gaps between data

16:01

products where you've got DBT and DBT

16:05

and now you've got gaps and that's where

16:06

you really start to realize that there

16:08

are other requirements that start to

16:10

come in that you need and two big ones

16:12

that felt obvious for us were around

16:15

data governance metadata kind of knowing

16:19

more about these data products at at a

16:21

met at a meta level observability and

16:25

how you define that and also how you

16:27

start creating security policy between

16:29

them so it's the classic thing of when

16:31

organizations move to microservices like

16:34

all of a sudden like monitoring between

16:36

things things breaking in you know in

16:38

the infrastructure level between the the

16:40

network protocol starts to

16:42

happen I think the data world is not

16:46

there and is catching up and I think it

16:49

will one day but today they were some of

16:51

the gaps that we started to see um so

16:54

like by breaking down I'll give you an

16:55

example so like by breaking down dbts

16:58

have this monolith with maybe I don't

17:01

know 50 people working on an area of a

17:03

monolith and then you break that down

17:04

into Data products you then start to

17:07

realize well we didn't really have clear

17:09

ownership with that like who owned it

17:11

like people were contributing together

17:13

as maintainers maybe but who owns who

17:16

owns this data asset who actually who is

17:18

the team that do it and that's where we

17:20

started to realize well you need kind of

17:23

metadata over the top to start labeling

17:25

things like that or we also had this

17:27

other symptom coming out because we had

17:29

all of our code in one place it was very

17:31

easy for like team a and Team B to use

17:33

data between each other and not really

17:35

realize and start creating dependencies

17:38

so then we were almost trying to start

17:40

using metadata to say well who should be

17:42

allowed to use my data product and that

17:46

stuff starts to get teased out so cross

17:49

cross team discoverability cross team

17:54

lineage and visibility and some sort of

17:59

understandability and governance and

18:00

observability

18:02

across uh started to become an important

18:05

need for you yeah exactly like if you're

18:08

a analyst or a scientist when it was all

18:10

in one monolith they essentially just

18:13

open the Brows expand and try to find

18:16

data they were looking for and then when

18:18

we break things out more into Data

18:20

products and you've not got that kind of

18:22

ability we started to see people kind of

18:24

move into slack and looking for tribal

18:26

knowledge and being like hey does anyone

18:27

know where I can

18:29

find this data product I used to see it

18:31

in the monolith somewhere where is it

18:33

now who owns it and things like that so

18:35

this is where like

18:36

discoverability um lineage became even

18:39

more critical who was the owner should

18:42

this person change this code or should

18:44

it be only the owner that kind of thing

18:46

and these were really positive things

18:47

for is actually but when it was one

18:50

monolith we just couldn't we couldn't

18:51

really see that we were kind of missing

18:53

some of these quality components I guess

18:56

to to data

18:57

management so what did you end up uh

19:00

using for that uh to solve that

19:02

problem um so initially um we started

19:06

really

19:06

simple and what we did is we we used um

19:10

there's like a meta Block in DBT and we

19:13

started to Define Mead at that level and

19:16

then we started building kind of CIS or

19:18

tooling around it in our in our build

19:20

processes to grab that metad dator and

19:22

and make

19:23

decisions and that sort of gave us the

19:26

confidence the confirmation right that

19:29

this hypothesis we had that metadata was

19:31

going to like a metadata aware

19:34

environment was going to help drive a

19:37

lot of automation a lot of um data

19:39

management decisions right through

19:41

systems not through humans and then we

19:43

ended up um moving to uh to data Hub to

19:46

acrel and and using that as the the

19:48

product to start collecting U metadata

19:51

and and like building this kind of

19:54

connections between data products and

19:56

treating that as a first class citizen

19:59

now you started the conversation talking

20:02

about not liking pointand click

20:05

experiences and not liking you know

20:08

being in the UI too much if you could

20:09

avoid it so how have you tried to apply

20:12

those same principles in how you've

20:14

implemented data mesh like are your data

20:17

owners and data producers and consumers

20:19

kind of going into acry and typing out a

20:22

ton of documentation or annotating data

20:25

sets or like how how are they bridging

20:29

these two worlds between you know the

20:32

the product experience and you know the

20:35

the DBT meta yaml you know

20:38

yeah yeah I wish I wish we'd fully solve

20:41

this I mean our preference is always to

20:43

try and do as much inversion control as

20:46

possible um so like one of the big

20:48

initial challenges we had that has made

20:51

this journey feel frankly quite slow it

20:54

there's no there's no tooling that

20:55

exists to create a data product that I'm

20:58

whereever all if it is it's it's it's

21:00

hot off the press so um we're heavy

21:03

users of cuberes that's how we manage a

21:06

lot of our services um and for those on

21:08

today that don't know much about cuberes

21:11

one of the great things that it has is

21:13

is almost like a resource manifest or

21:17

it's got a database underneath it where

21:18

if I want to create a resource in the

21:20

cloud I can create this resource and it

21:22

is like a a gaml definition and I can I

21:25

can do that so what we started to do was

21:28

Define definitions for data products

21:30

create them as resources and store them

21:32

in in kubes um and kubernetes is very

21:35

nice because it also has like events so

21:37

it can send events when new resources

21:39

created and when they're updated and all

21:42

that kind of thing so we've gone to this

21:44

place where we've provisioned data

21:45

products and then we've automated

21:48

creating them so again that's very um

21:51

kind of data products as code I guess

21:53

and we try to do the same for as much as

21:55

we can with tools like me Data Systems

21:59

and and other things um and that's

22:01

mostly to have that governance of of the

22:04

metadata so like The Meta to to be

22:07

active with the metadata and to automate

22:09

things with it we need so youve

22:11

essentially used kubernetes as your

22:13

control plane for for data and anytime

22:17

any changes happen you've got your

22:19

operators kind of publishing metadata as

22:21

events that comes into acry and that's

22:24

why everything stays fresh and life I

22:26

think uh that's that's essentially how

22:29

we've implemented it so and then yeah we

22:31

use like like a broader view so like

22:33

you've almost got like the

22:35

infrastructure view in cues and then the

22:39

data product view is almost um wrapped

22:42

around that and we we use data hub for

22:44

that to kind of fully complete the

22:46

picture so if I'm a product manager or

22:48

somebody like that they would gravitate

22:50

more towards the viewing in data Hub my

22:54

data platform team probably gravitate

22:56

more into the kind of QB's world because

22:58

they're looking get you know like big

23:00

query provisioning snowflake

23:01

provisioning um object storage service

23:05

service accounts that kind of thing

23:07

right right so you fully embraced kind

23:09

of this shift left philosophy of

23:11

defining data metadata all of these

23:14

things as code checking them in

23:16

versioning them and I guess you're

23:19

waiting for the promised land where this

23:23

and the product experience kind of bire

23:26

work with each other and are able to to

23:28

you know stay in sync and you know you

23:30

can kind of live between the two worlds

23:33

uh without

23:34

losing uh

23:36

context yeah and I think I think that's

23:38

one of the big challenges with data mesh

23:42

today is just I guess it's like the cost

23:45

still is very high to to apply these

23:49

principles um but you don't need to

23:52

apply them all in one go like I mean

23:55

we've been kind of progressing towards

23:58

this this as a as a kind of Journey but

24:01

I I really still hope that we will start

24:03

to see more emerging technology um it's

24:06

just really hard because like as I kind

24:09

of said before we moved into a data mesh

24:11

world every technology almost is very

24:14

they own one piece of the stack so you

24:17

have a one company that just own

24:20

scheduling airflow one company that own

24:22

transformation for example and it's

24:25

really hard because you kind of need to

24:27

Pivot that round and have somebody just

24:29

say we'll let you define data products

24:32

and they're going to span multiple

24:34

Technologies that's a hard problem but

24:36

it it doesn't feel

24:38

unsolvable I mean we we did it right

24:41

internally um other companies are doing

24:44

this it it must be possible um and and

24:48

even I I mean I feel promised today that

24:49

things like data Hub exist and we have

24:51

far better observability tools yeah like

24:54

five years ago I we we didn't have any

24:56

observability tools really that even

24:59

remotely close to anything you could get

25:01

in terms of monitoring a microservice it

25:04

just didn't exist yeah so I feel hopeful

25:07

but it's we know it's a journey we all

25:08

have to kind of we go on together

25:11

definitely a journey I mean we started

25:12

out with um you know data Hub the

25:14

project started out first with saying

25:16

let's just bring visibility let there be

25:19

light I guess is how we started out with

25:21

like let's actually shine a light on all

25:24

the corners of your data stack and to

25:26

you know right now we're just talking de

25:28

DBT right but in general we talk DBT and

25:31

upstream and further Upstream in fact

25:34

when we look at our Telemetry and we

25:35

look at what are the things that people

25:37

are connecting data Hub up to guess what

25:39

is the number one source that people

25:41

connected to

25:43

postgress so postgress is still winning

25:45

and is still dominant because a lot of

25:47

data lives on postgress and so you know

25:49

that width and breadth is kind of the

25:51

central piece that we went after making

25:53

sure visibility was kind of prioritized

25:56

and now we're starting to see stories

25:57

where people are using it for

25:59

definitional data uh checkout.com for

26:02

example has done data products and they

26:04

Define them and register them on data

26:07

Hub and on the back of the chain stream

26:10

once you have a data product registered

26:12

they're starting to provision stuff in

26:14

the back like they're starting to

26:15

provide access or even set up those

26:18

tables so I think we're starting to see

26:21

that next step that you were alluding to

26:23

happen uh Even in our community so we've

26:25

talked a lot about kind of things you

26:27

did the technology you used let's talk

26:30

about the things that didn't work the

26:32

things you haven't yet

26:35

implemented so I think one of the

26:37

hardest things that's come from data

26:41

mesh um

26:43

is the architectural strain that it can

26:47

it can put on an organization so like

26:49

we've we've decentralized and now we

26:52

have like data teams focused on domains

26:54

and other things and that goes well but

26:56

it also it's much harder to encode at a

27:00

platform level what good looks like for

27:03

some for architecture around software

27:05

and even more so for um for data

27:08

products like what do you name a data

27:12

product like people people with te when

27:15

people did data warehousing and they had

27:17

a you facts and dimensions there are

27:19

bucks and bucks and Bucks telling you

27:23

recommended practices about how to name

27:26

tables based on certain character St

27:28

istics of the table I'm yet to find like

27:32

the you know like if I think about my

27:34

background in building apis you'd have

27:35

like different design patterns for them

27:37

and all that like we're still we're

27:39

still lacking that so we spend a lot of

27:41

time trying to think about like what do

27:43

we call this is it performance data is

27:46

it metrics is it like what are these

27:48

words that we should use but then also

27:51

gener encode um design patterns in them

27:54

because when you've got smaller units of

27:55

data like the design patterns that you

27:57

would create them is kind of different

27:59

if it was one humongous thing that you

28:01

would have apply these two so that's

28:03

being really difficult and then I think

28:06

not having that centralized team it it's

28:09

it's much harder to keep the sort of the

28:12

quality of the the practice is high when

28:14

it's decentralized it just takes a lot

28:16

of work in a way um that you wouldn't

28:20

need in a centralized team and we also

28:22

see this other scenario where if like if

28:24

a data team's closer to the product team

28:26

and they're also very skilled at

28:28

engineering you know they might say oh

28:31

well maybe we could stop doing that data

28:33

work and they could do some more product

28:34

work for example so you get like

28:36

stresses like that where you know You'

28:38

kind of decentralized and generalized

28:39

more and that tension now is sometimes

28:41

you want specialized and you want to

28:44

hold on to that and then there's there's

28:46

just that like think that challenge that

28:47

as a as a data leader you wouldn't have

28:49

had when you just have a box and say

28:52

that's that's your centralized data team

28:54

like it's it's allocated so that's

28:56

definitely one of the the challenges

28:58

that we've we've come under and quality

29:00

is a big part of that like how do you

29:03

how do you kind of work out if this data

29:05

product is of a higher quality than

29:07

another

29:09

one um we're starting to make a lot of

29:11

progress with that with with metadata

29:12

again where we're starting to kind of

29:14

label you know Puris stics like number

29:16

of incidents owners and building up kind

29:20

of a almost like a metad quity

29:23

framework yeah yeah

29:26

exactly cool I think it's uh I mean

29:29

don't beat yourself up too much naming

29:31

is hard I think it's one of the two

29:33

problems that are hard about computer

29:34

science so I think it'll continue to be

29:36

a a challenge wait till you get to

29:38

caching data products and then that'll

29:40

be uh the next hard problem but data

29:43

modeling completely agree um in fact

29:46

even at LinkedIn you know where we like

29:48

I said went through this journey of

29:50

going from centralized to trying to

29:52

decentralize and we faced it in the

29:54

microservices world very quickly uh we

29:56

started realizing data modeling prti

29:58

practices started uh fracturing and um

30:01

the initial reaction was really get

30:04

controled back and the the first thing

30:06

we did was formed a data model Review

30:08

Committee and for you know any LinkedIn

30:10

alumni or you know existing LinkedIn

30:13

employes if you're you know tuned in you

30:15

might kind of start to shudder and

30:17

Jitter because you know dmrc or data

30:19

model Review Committee was was a very uh

30:22

traumatic experience for the whole

30:24

company it was it was great for

30:26

centralized control but resulted in a

30:28

lot of delays um as as products went to

30:31

production because um halfway through

30:34

you know shipping a product you would

30:35

certainly get caught and and get told

30:39

that you have to go back and redesign

30:41

your your schema or your um your your

30:43

data

30:44

model and so you know that team spent

30:49

kind of a year or two asserting control

30:52

and then the next year or two trying to

30:55

Tool themselves out of existence and so

30:58

so I think we'll we'll see that kind of

31:00

pattern emerge in in real world

31:03

deployments as well where we'll see

31:04

these Central teams kind of have those

31:07

anxiety attacks when they start

31:09

decentralizing try to assert control

31:11

through gatekeeping but then realize

31:14

gatekeeping doesn't work and so you have

31:16

to kind of tool yourself out of

31:18

existence by just finding a way to

31:20

declare what good looks like finding a

31:22

way to describe what that good looks

31:25

like in a programmatic way and then

31:27

provid it to a platform that can then

31:29

make that thing happen so I think still

31:33

for me in my opinion something that uh

31:36

you know some folks like us are doing a

31:38

little bit of that in our product but I

31:39

think the future uh is is kind of being

31:42

able to autod describe what good looks

31:44

like and then uh being able to stand

31:47

standardize those practices without

31:49

needing uh too much human gatekeeping to

31:52

happen um would love to pull up uh one

31:54

of the questions from the audience at

31:56

this point before we uh drop into kind

31:58

of uh chatting about the future um you

32:02

know we have a question from the

32:03

audience around like are you using data

32:06

contracts if you are um we've talked a

32:08

lot about data products um and

32:12

thankfully we didn't talk about what a

32:14

data product is because that would be a

32:15

whole different hour conversation but

32:18

let's uh let's move and talk about data

32:21

contracts are you using them how was the

32:23

contract structured are you using a

32:26

standard template or something else

32:29

yeah so this feels like we what the way

32:34

um so we we kind of try to left shift a

32:36

lot of contracts or rigor around this

32:40

stuff so like for

32:41

example we had a real big push that if

32:44

we were going to ingest um data into the

32:47

analytical plane into our our data

32:49

platform we would expect that everything

32:52

was using AOS schemas and using Kafka

32:55

therefore like you're kind of shifting

32:57

responsibility back to a producer team

33:00

they need to you know make sure there is

33:02

a really good contract there and we have

33:04

got to this place where we've got

33:05

versioning and and really good things

33:07

like that and and and checks um so we

33:10

have that but between data products this

33:12

is definitely one of the areas that I'm

33:13

very very interested in is is we've been

33:16

doing data contracts probably more

33:17

implicitly so we have a bunch of

33:20

standardized um like method dat tests

33:22

and and validators um and we we try to

33:25

detect things um in a very automated way

33:27

around

33:28

detecting schem of changes and then kind

33:29

of triggering that for some of these

33:31

look into but I'm very much interested

33:33

in in in where the industry is moving

33:35

now where we're thinking about data

33:38

contracts um I find this I find this

33:41

space quite surprising because

33:42

everyone's talking about it or was but

33:44

it was very revolutionary not

33:48

evolutionary and as a as a as a

33:50

technologist as an engineer these feeli

33:52

like yeah like this this is a very much

33:55

a an obvious thing that I would do if I

33:58

was building an API for example I would

33:59

expect a contract and a and a schemer in

34:01

place but as as in most instances the

34:04

the data world is always a bit nent to

34:07

to some of these practices um but it it

34:10

seemed really positive and it's worked

34:12

in a lot of other software engineering

34:14

domains I think there's definitely a

34:16

bunch of uh so so thanks for for that

34:20

um uh response I think there's

34:24

definitely a lot

34:26

of

34:30

sociological processes that happen in

34:32

the data world where we get attracted to

34:36

A New Concept a new term we all rally

34:39

around

34:41

it uh we try to make that a

34:44

reality and then you know in a couple of

34:46

years disillusionment starts setting in

34:50

because heart problems continue to stay

34:52

hard and then um often a new term

34:56

emerges and then we all rally around

34:59

that um the the hard problems I think

35:02

around data quality data governance

35:04

metadata management they have continued

35:06

to stay hard and challenging um and I

35:10

think at different points in the

35:13

evolution of the data industry we've

35:14

kind of picked up different phrases as

35:16

the rallying cry to go and do something

35:19

about it so in fact I think about data

35:21

meeses an example of something like that

35:24

I think about data contracts or

35:26

something similar to that so you know

35:28

Gartner I think earlier this week or

35:30

maybe last week uh published an update

35:32

saying data mesh is now dead or it's

35:35

about to disappear we're not not no

35:37

longer tracking it or something like

35:38

that so what is your advice for others

35:41

that are thinking of either starting

35:44

continuing or abandoning their data mesh

35:50

strategies yeah I I could

35:52

have I can't tell if this is just you

35:55

know a marketing cycle around you know

35:58

like I'm I'm I'm expecting the articles

36:00

of data mesh is dead long live data mesh

36:02

next than you know the classic troes I

36:04

mean I think I think Ryan Dolly actually

36:07

did a quick video last week or this week

36:11

yeah so it's it's I think modern data

36:13

stack is dead so that's the new

36:15

marketing cycle and then the DAT mesh is

36:18

is is either dead or about to be dead so

36:20

that's the other other conversation

36:23

doing the rounds but anyways what what

36:25

do you think well well I think that

36:28

throughout all of my career as a

36:30

technologist as an engineer we we go

36:34

organizations have a level of

36:36

centralization or decentralization

36:38

around technology period um so so to me

36:43

data mesh was all about this idea that

36:46

particular organizations don't want to

36:48

be centralized it becomes too much of a

36:50

constraint for them to succeed with data

36:53

which isn't for everybody like I

36:54

wouldn't recommend you know an

36:56

organization of five people like a

36:58

startup should go all in on day mesh I

37:01

think it comes with a particular amount

37:02

of size right and need um you kind of

37:05

outgrow the central team just like most

37:08

other technology problems come with

37:09

scale so to me data mesh is all about

37:12

giving us that language and a set of

37:14

principles and values about how to

37:15

succeed at

37:17

decentralization maybe they AR aren't

37:19

right maybe some are good and some

37:21

aren't quite as good or they can't

37:23

really quite get there because

37:24

technology isn't ready today may be

37:27

ready in three years five years or maybe

37:29

it never will but I I can't imagine a

37:32

world where there isn't a reasonable

37:34

number of size organizations that need

37:37

to be more decentralized with how they

37:39

work with data um so I kind of don't

37:43

worry if it wins or dies I suppose on

37:45

some level because of that the bit which

37:47

I do really hope succeeds is that the

37:51

technology gets there Rises the occasion

37:53

to make it easier and I think biggest

37:57

reason I'm passionate about that is one

37:59

of the things that has changed for my

38:01

organization by doing data mesh is the

38:05

the data product thinking the product

38:07

thinking over data like when we went and

38:10

built some of our data products you kind

38:12

of um how to describe this like

38:15

sometimes like data is so fine grained

38:17

it's like grains of sand you know on the

38:20

beach and you can't you can't see like I

38:23

could build a castle right out of this

38:25

or anything so like when we start to

38:28

like group data and do that and catalog

38:30

and structure things you go actually we

38:34

could go further here you know um like

38:36

as an example we we we get lots of

38:38

observations about car sales and other

38:40

things and then we sort of realize when

38:43

we could see the shape of that that

38:45

actually we could go further and we

38:46

could start to bring more like

38:48

confidence intervals on sales and look

38:50

ahead and do forecasting and we could

38:52

always do that the data was there to

38:54

always do that by having shapes around

38:56

data products that was the exciting

38:58

thing so I think if to kind of conclude

39:01

that I think if it wins or succeeds I

39:04

think it's about if any of that lives on

39:07

and I think it should for any

39:09

organization that's that's the key thing

39:11

so if data mesh falls out of

39:13

popularity I I imagine there just be

39:15

another architectural blueprint about

39:18

how to do decentralization because I

39:20

can't I can't see that going away in

39:22

every company around the world so long

39:24

live data mesh principles yeah yeah sure

39:28

okay great um so on that note you know

39:32

you've you've

39:33

got uh the wind behind your back you've

39:35

done um data s code metadata s code

39:39

shift left you're you're kind of doing

39:41

control plane for data from what it

39:43

seems like with you know with kubernetes

39:45

as your um provider layer what are you

39:49

excited for uh about the future is it

39:52

continuing this decentralization game

39:55

and kind of making self data and then um

39:59

high quality data across teams a reality

40:01

or is it AI or is it like what what does

40:05

the future look like for your

40:08

teams well I think everybody is excited

40:10

about llms aren't they so isn't that are

40:13

given um I think for data mesh and data

40:16

products I think

40:19

I'm uh although we've made a lot of

40:21

progress there's just so much to do um

40:25

like we run a lot of our operational

40:27

servic our microservices our apis and

40:28

our you know operational systems on Qs

40:32

and we have applied a lot of these same

40:33

principles that were applying to the

40:36

data world today

40:38

operationally and it's it's completely

40:40

transformed how our organization

40:42

operates you know how we deploy Services

40:45

just how mature we are as a technology

40:47

business and I just see such an obvious

40:50

road to keep going that way with data

40:52

mesh and you know increase our profits

40:55

increase our time to Market find better

40:58

and more engaging ways of using our

41:00

data um you know shorten Cycles around

41:03

how we do an ml product and go to market

41:07

so that's probably one of the largest

41:10

areas that excites me with data mesh

41:12

it's the possibilities of what we get

41:13

more out of our

41:14

data where we're seeing this happen a

41:17

lot today is is probably now starting to

41:19

show up in more of our business areas so

41:22

like more emphasis on say marketing more

41:26

emphasis on um customer experience like

41:29

how we bring data into these spaces more

41:31

and and by unbundling from Central team

41:33

and building data teams and data

41:35

products around them with STS will lock

41:37

a lot more things with them than we

41:39

would have done when we had that that

41:41

that kind of centralized team with

41:43

everybody fighting over it um so that's

41:45

what excites me and and possibly llms

41:49

depending on if it's a a bubble or not

41:52

and on how you how you view that I guess

41:54

in fact a couple of episodes ago I was

41:56

chatting with uh HMA who's heading up

41:59

Kumo doai and um we were talking about

42:02

the the the needs of AI and you know

42:05

guess what metadata is one of the

42:07

biggest things that AI teams need to do

42:10

couple of things well one is

42:12

reproducibility and understandability

42:14

and

42:15

explainability um and the second thing

42:17

is actually prompt engineering and you

42:19

know real um uh you know for Rag

42:22

architectures and stuff like that so

42:24

it's actually kind of interesting how

42:25

the worlds of metadata and a are also

42:28

kind of coming together around the same

42:29

time with respect to data mesh I

42:31

definitely feel like the tooling like

42:34

you said coming together is is kind of a

42:36

theme I'm excited even within our

42:39

existing community and customer base uh

42:43

we're seeing a lot more um cases where

42:47

people our customers are talking about

42:49

data products they're actually using

42:50

data products in the catalog for real

42:53

not just asking about it so we've gone

42:56

from the stage where people would just

42:57

be interested because we had data

42:59

products like data data Hub had data

43:01

products like maybe a year ago at this

43:04

point you know we've gone from seeing

43:06

people just asking about data products

43:07

to actually having them and using them

43:10

and asking more so you know data product

43:13

lineage input output ports all of these

43:15

things that we've promised to the

43:17

community are actually going to come out

43:19

this year I'm really excited to see what

43:21

uh folks do with it um and the same

43:23

thing with data contracts you know we've

43:25

kind of had it in the product for a

43:26

while while but we're starting to see

43:28

people really want to connect these two

43:31

together and to see them um stitch

43:33

together last Town Hall we had a

43:36

gentleman Stefan uh from kpn actually

43:39

talk about you know a full-on data

43:41

product spec that he's working on and

43:43

developing on top of data Hub so it's

43:45

it's it's pretty exciting to

43:47

see uh kind of despite the overall hype

43:51

cycle around the term data mesh kind of

43:53

going down definitely seeing a lot of

43:55

the Practical implement mentations

43:57

coming to life and I think what you said

44:01

about um you know human comprehension

44:04

being a hard problem and so the more and

44:06

more data we generate and create as

44:09

Industries as companies the need to

44:13

simplify and communicate using simpler

44:16

terminology simpler more um coales uh

44:21

objects that we can all rally around and

44:24

govern uh has definitely been uh

44:27

uh an advantage that we've all gained a

44:31

relational person would be like well

44:33

that's what schemas in databases are

44:35

supposed to be and they're probably

44:37

right uh but I think in as we've split

44:40

apart databases and kind of fractured

44:43

the world into graphql schemas and open

44:45

API schemas and Kafka topics and S3

44:48

buckets and a bunch of snowflake tables

44:51

I think needing that logical layer on

44:54

top has kind of definitely uh come back

44:57

in and it's it is showing promise and I

44:59

think the I'm excited for the future of

45:02

trying to stay polyot and poly you know

45:07

imple at the lower layers because that's

45:10

where Innovation happens while still

45:12

staying harmonized and uh logical at the

45:16

at the understanding there yeah I think

45:19

uh dat Hub is met Data Systems but

45:21

especially data have the potential to be

45:25

the control plane that we have to

45:27

develop like and I think that's really

45:29

exciting and I do hope that I don't know

45:31

AAL the data Hub community the open

45:33

source Community realize that and build

45:36

around it like by by having these

45:39

materialized inner system we can build

45:42

you know the ability to make data

45:43

products and provision them in different

45:45

Cloud environments a lot more trivial

45:47

than it is today um and I do feel like

45:50

there will be a point when there'll be

45:51

enough technology available to do this

45:54

that you will really see it explode it

45:57

just needs that that moment but more

45:59

people are doing it

46:01

then it's probably you probably realize

46:04

like Z and the bit that I think we'll

46:06

live with with with or without data mesh

46:08

is data products this this idea of

46:12

encapsulating some of your data and

46:13

describing it as a as a system I feel

46:17

like that is a really powerful tool um

46:19

to come out of of data mesh as as a

46:21

vocabulary as a concept as a way of us

46:24

you know Building Systems from in in a

46:26

more decentralized way than a star

46:28

schema for

46:30

example well it's been a pleasure having

46:32

you on the show Darren and it's uh it's

46:34

been such a great chat I almost didn't

46:37

uh keep track of time we went a good 15

46:40

minutes over but hopefully yeah the

46:42

audience had a had a good time uh

46:44

listening and uh thanks for your

46:46

questions uh we'll we'll see you on the

46:48

internet as they say uh and um you know

46:52

it's been it's been a pleasure talking

46:54

to you Darren and it's been a pleasure

46:55

collaborating with you over the years as

46:57

well so looking forward to uh building

47:00

great things together thanks for having

47:02

Mr shanka and thanks to everybody that

47:04

joined and listened and the questions

47:06

that we got thank you very

47:07

much

Rate This

5.0 / 5 (0 votes)

Tags associés
数据网格数据产品化AutoTraderDarren Hacken技术演进组织变革数据治理数据架构未来趋势数据团队
Avez-vous besoin d'un résumé en français?