Context: Databases are data stores that are in themselves, organized mechanisms for storing, managing and efficiently retrieving raw data. Databases have other features that allow data to be compiled in the form of reports. Stored data can be collated or statistically counted so that a business can learn or make decisions based on how the business intelligence data is analyzed. Other features are related to importing or consuming more data in automated (hands free) ways.
Note: This glossary section serves to clarify any terms used in related posts on this site or in external internet literature; business jargon, engineering vocabulary or technical terms and meanings associated with the use of databases, data stores, data repositories and warehouses are covered here.
--- ----- ---
A guideline for standardizing how data is entered, validated, corrected, or applied. Having agreed rules are fundamental in ensuring quality data. Business rules are often defined in a data dictionary and serve as a focus and discussion point between stack holders, engineers and technical staff. A business rule defines the steps for entering, verifying, or correcting data; they also become a valuable resource for test engineers who need to define production release test algorithms as part of an acceptance test for a new system.
A data dictionary is a catalogue of information that describes a database by detailing what types of data are collected within a data store. The dictionary provides detailed information about business data, its format, structure, business meanings and standardised definitions of data items, and allowable values. For use within data processing systems, a data dictionary itemizes business meanings ascribed to missing or default field values within raw data tables. This can also include any conventions associated with lookup or default values that will be presented and used in computer programs or documentation. A dictionary helps developers by giving them guidance on how to present business data for use in front end data processing systems, business reports or web sites.Data dictionaries are used during database design or database data migration work, they are used by designers, acceptance test users and administrators of data warehouses and data stores used within information management processing systems.
Of the many database products on the market, some are open source and some support additional features like business intelligence support, scripting / programming support. Here is a list of some database engines / database tools.
Access, Cassandra, Clipper, dBase, FireBird, HBase, IBM DB2, IBM Informix, Ingress, InterBase, InterSystems Cache, Lotus, MySQL, MongoDB, Neo4j graph dbase, Oracle, Paradox, PostgreSQL, SQLite, SQL Server, Teradata, Visual Foxpro, and there are many more …
Data Import (Data Ingested / Data Ingestion)
The act of consuming / importing / absorbing data into a companies data store database; this can be a computerised hands free automatic process. It’s common to discuss Data Ingestion in the context of database migration or in the context of consuming large amounts of data for use in business reports or data intelligence processing.
Data is copied from a source system and translated, transformed or restructured into new data values, these replacement data values might also be formatted differently and / or stored as different data types to be compatible with the destination database system.
Transformation Scenarios: –
- Source data needs to be merged with data values from another database source before being stored into a destination system (ssis Union / Aggregate transformations, Merge / Join transformation)
- Composite data values needs to be split into several separate column values (ssis Conditional Split or Derived Column Transformation)
- Faulty data values need to be fixed and corrected or cleansed (ssis Data Conversion, Convert or Cast, Derived column)
- Null or missing values need to be replaced with defined business default values.
- Certain / targeted data values need to be filtered out and not stored into the destination system (ssis Lookup transformations, Sort transformation).
Transformation algorithms and automation processes convert data or information before storing the transformed values into the destination system.
Data mappings are used to identify where data comes from and where it should be directed to; a data dictionary is used to classify the target data types, the format of stored data and the rules for data transformation. The data dictionary is also used to clarify ownership and administration responsibilities for data wherever it is stored.
Data Warehouse – DWH
A central repository of all, or a significant portion of, the data that a business or company collects. A data warehouse generally provides access to data over a time span. It has many inflows of data from business data sources that are collated and used for business intelligence analysis, market research analytics, operational reports, … Data warehouses are designed to give a long-range view of data over time, they can be however, expensive to scale, and do not excel at handling raw, unstructured, or complex data. If constructed well, they can aggregate large amounts of data generated from multiple sources (from business operational and external databases) into a format that is easier to understand, to facilitate making executive decisions in near real time. There can be commercial and competitive advantages to having access to large stores of such data.
DBA Database Administrator
What is a DBA, the clue is in the name, the DBA is a role that is responsible for administering databases. A Database Administrator is not a designer or a development role, although, with respect, I have come across the occasional Database Administrator who has mastered TSQL, and has created maintenance reports to watchdog monitor systems that they are responsible for.
DBA Typical work activities: Computer software installation, system or program application configuration, upgrading, maintenance of data warehouses, databases, data stores, network infrastructure, firewalls, security configurations, … In large companies, system administrators are generally responsible for the hardware infrastructure and operating system on a given server, in smaller companies however, these activities come under the auspices of the DBA. DBA’s are responsible for installation of database software and often they are additionally tasked with installing most other kinds of computer software on company computers as well. This includes applying software updates and software patches. Often when the business has an IT development team, developers will work with the DBA and coordinate software rollouts and updates to the rest of the business. Developers are responsible for supplying training to DBA’s on how their software works, how software is to be installed and maintained, but there after, it is the DBA’s job to ensure that the business receives a working and maintained system.
Troubleshooting – Technical Support: When working in a business under the role of developer, the challenge to release software updates to the business affects many parties. Like most developers, we need the DBA to be part of the team to manage or help out with installing updates and software patches. But, we also need the DBA to be involved as part of the technical support for applications that are supplied into the business. In SME or micro businesses, the DBA can be a port of call for team leads and department managers when things go wrong. It’s quite natural and not unusual for a DBA to become aware of a software malfunction or training problem when business users are having difficulties. Collaboration between DBA’s and IT developers is very important for an effective operation.
DBA Typical responsibilities: Protecting business data, this includes backup and recovery of system data. Data security from the perspective of black-hat-crackers (data breaches and commercial data theft) and also from the perspective of maintaining the business as usual status for business systems.
Responsibility Exceptions – additional roles – for SME’s or micro businesses
For large enterprises, there is another role called network and hardware administrator who will pickup responsibilities for storage and capacity planning, otherwise for a small business, the DBA is often the go-to-person for these roles.
Optional Responsibilities: Storage and Capacity Planning for email systems, data storage on hardware infrastructure. Network and website security, preventing data breaches and illegal access to networks, servers and company data files.
3 steps to consume data into a business data store
ETL is used in database systems where we require additional data to be consumed or merged with company data for use in business intelligence / analysis of financials / marketing / commodities prices and market trends / …
For example one might want to consume data from a government or European union data provider and use this data in conjunction with company data to provide insights into market trends / identifying new prospects or business opportunities / understanding market share or demographic changes over time / …
step 1 – Extract: Covers the data extraction of data records from an external data source such as the ONS Office for national statistics web site / ECB European Central Bank / some other European data source / …
step 2 – Transform: A set of data processing algorithms or rules to transform the data from the source into a form that is compatible with the company data store database. In addition it’s often necessary to clean data records so that empty / missing data values are transformed into default values that can be used by any computer programs or company web sites without creating additional problems.
step 3 – Load: The loading of this new data into the companies data store database. In addition it’s often necessary to ensure that duplicate data records are removed.
RDBMS (Relational database management system)
A data store that stores data inside data containers called data tables. A database contains a set of tables containing data fitted into predefined categories. Each table contains one or more data categories in columns.
A table is a collection of data elements organized in terms of rows and columns. A table is also considered as convenient representation of relations. Data must be stored and presented as relations, i.e. tables that have relationships with each other, e.g., primary/foreign keys.