Context: Databases are data stores that are in themselves, organized mechanisms for storing, managing and efficiently retrieving raw data. Databases have other features that allow data to be compiled in the form of reports. Stored data can be collated or statistically counted so that a business can learn or make decisions based on how the business intelligence data is analyzed. Other features are related to importing or consuming more data in automated (hands free) ways.
Note: This glossary section serves to clarify any terms used in related posts on this site or in external internet literature; business jargon, engineering vocabulary or technical terms and meanings associated with the use of database data stores are covered here.
--- --- ---
Of the many database products on the market, some are open source and some support additional features like business intelligence support, scripting / programming support, here is a list of some database engines / database tools.
Access, Cassandra, Clipper, dBase, FireBird, HBase, IBM DB2, IBM Informix, Ingress, InterBase, InterSystems Cache, Lotus, MySQL, MongoDB, Neo4j graph dbase, Oracle, Paradox, PostgreSQL, SQLite, SQL Server, Teradata, Visual Foxpro, and there are many more …
--- --- ---
Data Ingested / Data Ingestion
The act of consuming / importing / absorbing data into a companies data store database; this can be a computerised / automatic hands free process. It’s common to discuss Data Ingestion in the context of database migration or in the context of consuming large amounts of data for use in reports or data intelligence processing.
--- --- ---
Data is copied from a source system and translated, transformed or restructured into new data values, these replacement data values might also be formatted differently and / or stored as different data types to be compatible with the destination database system.
Transformation Scenarios: –
- Source data needs to be merged with data values from another database source before being stored into a destination system (ssis Union / Aggregate transformations, Merge / Join transformation)
- Composite data values needs to be split into several separate column values (ssis Conditional Split or Derived Column Transformation)
- Faulty data values need to be fixed and corrected or cleansed (ssis Data Conversion, Convert or Cast, Derived column)
- Null or missing values need to be replaced with defined business default values.
- Certain / targeted data values need to be filtered out and not stored into the destination system (ssis Lookup transformations, Sort transformation).
Transformation algorithms and automation processes convert data or information before storing the transformed values into the destination system.
Data mappings are used to identify where data comes from and where it should be directed to; a data dictionary is used to classify the target data types, the format of stored data and the rules for data transformation. The data dictionary is also used to clarify ownership and administration responsibilities for data wherever it is stored.
--- --- ---
3 steps to consume data into a business data store
ETL is used in database systems where we require additional data to be consumed or merged with company data for use in business intelligence / analysis of financials / marketing / commodities prices and market trends / …
For example one might want to consume data from a government or European union data provider and use this data in conjunction with company data to provide insights into market trends / identifying new prospects or business opportunities / understanding market share or demographic changes over time / …
Extract: Covers the data extraction of data records from an external data source such as the ONS Office for national statistics web site / ECB European Central Bank / some other European data source / …
Transform: A set of data processing algorithms or rules to transform the data from the source into a form that is compatible with the company data store database. In addition it’s often necessary to clean data records so that empty / missing data values are transformed into default values that can be used by any computer programs or company web sites without creating additional problems.
Load: The loading of this new data into the companies data store database. In addition it’s often necessary to ensure that duplicate data records are removed.
--- --- ---
RDBMS (Relational database management system)
A data store that stores data inside data containers called data tables. A database contains a set of tables containing data fitted into predefined categories. Each table contains one or more data categories in columns.
A table is a collection of data elements organized in terms of rows and columns. A table is also considered as convenient representation of relations. Data must be stored and presented as relations, i.e. tables that have relationships with each other, e.g., primary/foreign keys.