Chapter 1 - Data Sources¶
Contents¶
- Terms
- Structured v. Unstructured Data
- Database v. Database Schema
- One-to-many relationships
- Many-to-many relationships
- Database normalization
- Dimensional Data Warehouses
Terms¶
Term | Definition | Notes |
---|---|---|
API | Application Programming Interface | |
SQL | Structured Query Language | |
RDBMS | Relational Database Management System | |
ODBC | Open Database Connectivity | Uses drivers to standardize interfaces between software applications and database. |
ERD | Entity Relationship Diagram | |
Foreign Key | Primary Key referenced by another table as a constraint | |
Structured vs. Unstructured Data¶
Unstructured:
- Text Documents
- Images
- etc.
Structured:
- Tabular
- Spreadsheets
- Database
- etc.
Database vs. Database Schema¶
- Database = Collection of tables
- Database Schema = Stores table information and relationships (i.e. defines the structure)
One-to-Many Relationships¶
- Where a unique entity only occurs in one table once but can have multiple entries in another
- e.g., patient tbl and appointments tbl
Many-to-Many Relationships¶
- Connection between entities where records on each side of the relationship can connect to multiple records on the other side
- Junction of associated table needs to capture the pairs of related rows
- Allows the ability to reduce the amount of redundant data stored in the database
Database Normalization¶
- Idea of not storing redundant data in a database
Dimensional Data Warehouses¶
-
Often contain data from multiple underlying sources
-
may contain row and summary data
- Can contain historical data logs, etc
-
Star scheme design (pg 7)
-
Divides data into facts/dimensions
- Facts tbl = metadata of an entity and measures
- Dimension tbl = property of entity you can group or “slice and dice” the fact records by, get further info, etc
-
Table grain
-
level of detail; what set of columns makes a row unique
-
Database roles
-
SME’s = subject matter experts
- DBA’s = Database administrators
- ETL engineers = PEople who extract, transform, and load data from a source system into a data warehouse
Appendix: Links and References¶
- 2022-10-28
- SQL for Data Scientists
- 3-Resources/Tools/Developer Tools/Data Stack/Procedural Languages/SQL
- Databases
Jimmy Briggs jimmy.briggs@jimbrig.com | 2022