Skip to content

Better Extract/Transform/Load (ETL) Practices in Data Warehousing (Part 2 of 2)

Metadata

  • Author: codemag.com
  • Title: Better Extract/Transform/Load (ETL) Practices in Data Warehousing (Part 2 of 2)
  • Reference: https://codemag.com/article/1803051
  • Category: #Type/Highlight/Article

Highlights

  • Every aspect of data warehousing is about context. β€” Updated on 2021-12-16 16:34:29 β€” Group: #Topic/Dev

  • Every aspect of data warehousing is about context. β€” Updated on 2021-12-16 16:38:49 β€” Group: #Topic/Dev

  • Common Mistakes in Data Warehousing and ETL Applications β€” Updated on 2021-12-16 16:39:01 β€” Group: #Topic/Dev

  • Insufficient Logging Insufficient time spent vetting and testing the incremental extract process Insufficient time spent profiling the source data Not having a full understanding of the transactional grain (level of detail) of all source data Not taking enough advantage of newer technologies Less than optimal SQL code in the ETL layer Uneasiness/challenges in getting the business to agree on rules Not retaining historical data Building a solution that works but doesn’t get used Not using the Kimball model (or any model) and not using accepted patterns Not running ideas by team members β€” Updated on 2021-12-16 16:39:11 β€” Group: #Topic/Dev

  • Not having a full understanding of the transactional grain (level of detail) of all source data: I can’t stress enough that if you don’t have a firm understanding of the grain and cardinality of the source data, you’ll never be able to confidently design extract logic. β€” Updated on 2021-12-16 16:39:39 β€” Group: #Topic/Dev

  • Not retaining historical data: Here’s a cautionary tale that every data warehouse person should remember. Years ago, a company wanted to do a complete refresh of data from the source systems. Unfortunately, the source system had archived their data, and as it turned out, some of the backup archives weren’t available. (Yes, the situation was a mess). Had the ETL processes retained copies of what they’d extracted along the way before transforming any of the data in the staging area, the data warehouse team might not have needed to do a complete refresh from the source systems. If you’re a DBA, you’re probably feeling your blood pressure rising over the thought of storage requirements. Yes, good data warehouses sometimes need high storage capacity. β€” Updated on 2021-12-16 16:39:56 β€” Group: #Topic/Dev

  • Agile methodologies, Sprint processes, and frequent meaningful prototypes are tools and approaches. By themselves, they don’t guarantee success. β€” Updated on 2021-12-16 16:40:03 β€” Group: #Topic/Dev

  • Operational Data Store (ODS) Databases β€” Updated on 2021-12-16 16:40:17 β€” Group: #Topic/Dev

  • So, when determining the anticipated growth of your databases, factor in these four items:

Any new or anticipated subject matter data, data feeds, etc. Backups Historical/periodic snapshots Redundant information in staging areas and operational/intermediary areas β€” Updated on 2021-12-16 16:40:43 β€” Group: #Topic/Dev

  • Change Data Capture and Temporal Tables and Their Place in the Data Warehouse Universe β€” Updated on 2021-12-16 16:40:57 β€” Group: #Topic/Dev

  • Implement database triggers to capture inserted/updated/deleted data. Implement logic in stored procedures (or in the application layer) to capture inserted/updated/deleted values. Purchase a third-party tool, such as SQLAudit. β€” Updated on 2021-12-16 16:41:30 β€” Group: #Topic/Dev

  • Data Lineage Documentation β€” Updated on 2021-12-16 16:41:58 β€” Group: #Topic/Dev

  • Producing good documentation can be very time-consuming. Producing good (and up-to-date) documentation can be especially difficult if the team constantly faces change/enhancement requests. Different team members can have reasonable disagreements about what constitutes good and useful documentation. β€” Updated on 2021-12-16 16:42:13 β€” Group: #Topic/Dev

  • In some cases, I’ve been able to β€œcut to the chase” of hashing out business requirements by building prototypes and showing them to users. Without this approach, we might never have identified some of the more intricate details. β€” Updated on 2021-12-16 16:42:45 β€” Group: #Topic/Dev