Skip to content

Data Discovery with Shinji Kim

Metadata

  • Author: Software Engineering Daily
  • Full Title: Data Discovery with Shinji Kim
  • Category: #Type/Highlight/Podcast

Highlights

  • The Snowflake of Cloud Computing? Summary: Snowflake is a cloud computing system that separates compute and storage. It allows companies to put all their data in one place, rather than spread it over several years or even decades. Snowflake charges for the cost of storing large amounts of data at once. The company has just gone public with its new service called Cloud Foundry. Transcript: Speaker 2 we did think for a while that everything was gong be done with these streaming systems. I give you wine to calculate the the coolest thing over time. I don’t know, you have like a variable that’s the coolest thing over time, and then all the streams wash over that variable and just refresh it with the coolest thing. That’s Speaker 1 true. Ah. At the same time, there’s just too many dimensions to re calculated, right? Speaker 2 As we had to stuff everything in the dinwere out. Yes. Speaker 1 So you do have to, you know, om, you know, i in a calumnar way. And i think the other big part is a starting to look at the compute and storage separately, which i think, has been nother like a big breakthrough. Everybodyalways Speaker 2 says that it’s the snowflake as tho break through, ecause it separates compute and storage. And i’m alwaysa, what? I don’t understand what you’re talking about. What does that even meank? I mean, i don’t have a microprocesser and like a a hard disk. Isn’t that separation of commune storage? Speaker 1 I think it comes down to the cost. So traditionally, the only data weor house or, ea, real real first cloud data were a house, used to be redshift. And then dhey charge for however much data you have. Enterprises have ton of data, and most of companies are starting to have more data. It’s like, well, i have all these data, but io, if i’m and i do want to put them all in one place, so i have the option to run my analysis for the last three months, six months, a year and five years. But in order for me to, if i were to run that in redshift, were traditional data were housing moro, i have to pay for all that five year worth of data, just for that storage, even though i’m not doing anything with that data. Usually, like, the most important data for me is the last month data for instance, right? Whereas like snowflake came in and said, hey, that’s ok. You can put as much data as you want, and you can run that quarry only for the last month, and we will just charge for that computation, regardless of the amont of data you already have gatche. (TimeĀ 0:29:08)