Data Collection and Storage: Here, data is collected from various sources such as sensor data and data from social media and is stored in their raw format.
Data Preparation: Data is prepared for analysis and by this, I mean making sure that the data is in the right format. For instance, cleaning the data to remove null or missing values.
Data exploration and visualization: Here, data is exploited to derive insights. After which it is then visualized.
Data engineers lay the groundwork that makes any data science activity possible.
Data, as first coined by the economist, is the new oil.
Crude oil is extracted from an oil field. It is then sent to a distillation unit where it is separated into several products which are then sent to their users. Some pipes go straight to airports to deliver kerosene. Others go to gas storage facilitates to deliver gasoline which will be stored in big tanks before they are then distributed to gas stations. There are many pipelines tying all this together.
Data engineers maintain data by following a procedure similar to oil processing.
Companies ingest data from different sources which need to be processed and stored in various ways. To handle that, we need data pipelines.