Data Analysis

B2B Database Migration what’s involved – P2 Data Analysis

.

Aspects of database migration analysis and associated work planning activities, we explore how to tackle some of the challenges and problems for building a successful migration strategy.

Business Analysis and Data Analysis are part of the preparation for migration of information systems that utilizes business data.

Conducting the data analysis part of the migration equation, we focus on assessing where data originates from, how it gets transformed, where it gets transferred to and the data flow. Such analysis covers timing, scheduling, ownership and management, security, how data gets used, data format, data quality, …

We first need to model how data will be transformed and integrated into existing business data processing activities.

It’s possible that a BA Business Analyst may also be involved in the project. If this is the case, they will have captured information on operational problems with using internal processes or computing applications. They will understand some of the departmental dependencies on business information to be migrated. Any data intelligence or workflow task processing that has been captured will not have a technical perspective, but the BA will know who all the stakeholders are; they will have a grasp of many of the departmental processes surrounding the use of business data. Information available from the BA will be from the perspective of users and how people do their work. This analysis will incorporate aspirations of key stakeholders and will feed into the overall understanding of how existing tools and application programs are utilized.

Migration of business database systems

Pure Migration being the process of translating X into Y without changing

    • the desired outputs,
    • or the business features,
    • or the project scope,

and without necessitating that the inputs to the system change.

In contrast to a BA, a Data Analyst will need to dig deeper and capture a technical understanding concerning system integration, data processing algorithms, scheduling and automatic data processing features that are in play within the business system.

So: What does a data analyst do? What is involved in their work-flow and what do they deliver to the business?

Migration Topics

..
.
Data analysis perspective: Working with Database Administrators DBAs and other technical staff, we want to discover how data gets into the database or data warehouse. Topical questions such as Data inflow – numbers of data sources, data volume, frequency of data updates, need to be addressed. We want to understand how data values are transformed and translated, when data is ingested into a warehouse or any database systems. Alongside this analysis, there is the question of change management; we need to capture a sense of any existing processing flaws or problems that are in play, and we need to grasp how the business performs its system integration and production release testing.

This is a lot to take on board; the best way forward is to take each question, methodically, step by step.

Technical perspective on Data flow and data sources: During this work phase, we are concerned with cataloguing all the types of data and identifying all sources from whence data originates. Important parts of this work are identifying any Data Latency: how often and when does data get changed. We also need to collate any information on security and password access to systems that are providing data. It’s important to understand who within the business, has the responsibility to have data corrected when data records or values are malformed, corrupted, incorrectly formatted, or even missing. There are often internal processes or algorithms that are used to audit data quality and tools that are used to correct data records that are found to be invalid. If the DA Data Analyst discovers that auditing algorithms do not exist, then it’s incumbent on the DA to create mechanisms that will facilitate auditing and testing of any migrate-able data during pre and post data migration.

ETL SSIS ProcessingData life cycle: The life cycle of a piece of data begins with the data’s origins and where it moves to, it includes how data is transitioned and changed and describes what happens to data as it goes through business and technical processes. Visibility of these processes facilitates data analytics and simplifies tracing any errors back to their sources.

DA Data Analyst work – Thoughts.

The initial meeting with the client is to agree on subjects such as, documentation formats and standards, presentation methods and any modelling techniques that will be used during analysis work. It’s customary to offer to sign any necessary NDAs and get permissions and security access to any company systems where work is required.

The DA will collaborate with stakeholders and identify business systems and workflow processes in use, that are pertinent to the tasks at hand.

Any project has unique business and technical challenges, none the less; it’s useful to have some sort of starter checklist which can suggest ingredients for prospective milestones. These milestones can be used in the preparation for a plan of analysis.

Example Checklist – Technical Feasibility.

Analysis of networks, infrastructure, systems, servers and databases
Example piece of a visio system diagram

o Liaise with DBA Database Administrators to determine what network access rights or privileges or data source access and or database passwords are needed in order to conduct analysis work.
o You need to define or get a definition of the system users. Identify who they are, job functions and responsibilities; ascertain their needs for the new system. This information aids the classification of SQL server user Accounts on domains, as well as logins, database access, database roles.
o Capture information on data sources of interest, where and what they are, plus any target systems for where data is to be delivered, as well as any data systems from whence data is sourced.
o Perform technical analysis of data stores, databases, network security, report processes, database backups, … Create and develop data dictionaries to facilitate an understanding of requirements for performing migration development work.
o ETL processing: We need to identify technology and processes and any data processing algorithms that are in play. We need to acquire an understanding of all ETL processes being used by any databases scheduled for migration.

One technical complication that can add a challenge to ETL data analysis, is the question,

Has the business implemented any data encryption security?

If this is the case, then data cleansing work will be difficult to validate. If testing tools are also not decrypting STORED data and making it human readable, and thus computationally testable, then our ability to prove that things are working properly, will be compromised.

Similarly, if data in motion is encrypted, then the flow of data between systems and between businesses will be utilizing encryption and decryption processing; this also renders data transmission packets, humanly unreadable.

Encryption and decryption make testing slightly more complicated. A data analyst will need to factor in a solution to allow effective testing to take place.

Considerations, questions and thoughts for this sort of work: –

  • What is being copied manually (by hand) such as data for a software lookup table for example?
  • What has been dropped from use and is no longer required?
  • Use of the national codes and other data format
  • Post codes, Ethnic codes, GMT date times, how many currency formats
  • People’s names – the days of Latin based two or three names are long gone.

Many technical considerations with business implications; a question of experience and ability to think through the technicalities of the system that you are working on.

Back to the Example Checklist – Technical Feasibility.

o Schedule data-audit-interviews, clarify any rules for data cleansing, data repair, migration processing, and any data normalisation and correction processes. Check and understand all default data values and any associated default-business-meanings. It is quite common for a business to have a number of default values for a given data item, you need to understand the specific meaning for each data value. This information will be valuable for test folks, data migration work and any post migration validation checks.
o Discuss results of all assessments, make recommendations to improve data quality auditing, and any record correction requirements.

Track recommendations and agreements, ensure that recommendations are either implemented or alternative workarounds have been validated.

o Inform database administrators of technical assessments, make sure that data backup and recovery processes are tested and in full working order. It’s my experience that occasionally DBA staff may over-estimate their ability to recover a database backup; in one or two cases I have come across DBAs that have not checked their recovery processes in over six months.

ENSURE DONT ASSUME – force the DBA to prove that their recovery processes work.

---          ---

Modelling techniques and analysis mapping

Some modelling techniques do not make sense to use. In the context of performing database migration, they duplicate the same work and do not provide additional information. For the purpose of performing database migration, the techniques and documentation practices that make the most sense are creating database dictionaries that contain data maps. In addition, Data Flow diagrams help to describe scheduling and the data transition paths that take data through various business systems. A Visio diagram showing network architecture, servers, intranet or extranet web sites and database connection details, also help to track some of the moving parts and complexities that go into making a successful project delivery. Visio diagrams add a lot of value when designing and developing ETL processes, reporting infrastructure, warehouse storage and other data transitioning processes.

Occasionally, for larger projects, it makes sense to also manage a risk register (see later) in order to track issues that are getting in the way of successfully completing some of the project challenges. If you are working with a Business Analyst then it will be beneficial to collaborate on this task; you may discover that your BA already has a risk register in play.

The Analysis of Data for systems of interest

i) DFD (Data-Flow Diagram)
ii) Visio network diagram
iii) State sequence and timing diagram

Modelling data processes can show the system under consideration as a single high-level process, modelling shows the relationship that the system has with other external entities (systems, organizational groups, external data repositories, …). A DFD (Data-Flow Diagram) is a graphical visualization of the movement of data through an information system.

Context Diagram example
Context Diagram example
Data flow diagrams help to facilitate conversations with business folk; DFDs are easy to understand, they serve a wide audience. Such diagrams allow a data analyst to document the bits and pieces of the system that are understood by different parties within the business. They can then be used to follow the data to identify the various system interactions and dependencies that exist. This tends to be a more natural approach as it uses the power of the technique to document the entire scope of the system and its surrounding environment.

Creating a visio network diagram is a powerful tool to technically document system interfaces and architecture with information such as IP addresses, passwords etc. Such diagrams can also show some timing and sequencing details for data flows between systems.

State sequence diagram: If there is a complex interplay between data flows into and out of the system scope, then a state diagram can be used to better describe, and to technically document what is taking place. Timing and sequencing for data flows between systems as well as details of how synchronization of processes takes place can be covered. For example, we can model how data processing takes place, in sequence or in parallel; a state diagram allows us to document the rules that trigger the flow of data to or from systems.

Overall, a data analyst has many tools, they can work with DFDs, Visio architectural diagrams, in some cases, state sequence timing diagrams can also be leveraged to capture and communicate the interactions and flow of data between business processes. These techniques can be used to model how information flows around a system, how data is processed and stored. Diagrams and modelling can show how a system is partitioned into independent units of manageable size that are verifiable, concise and easily understood in a hierarchical way.

Business expectations and stakeholder management:

I would like to make a point that for the system to be migrated, any computer software source code showing development history or database development history, is of particular interest. Similarly, knowing which business functions or departments are dependent on items that are to be migrated is just as important. The development history may explain existing or Collaboration and planningknown operational problems, bugs, or issues, as well as any related business problem workarounds. These are important things to know about, it should not be assumed that stakeholders or business analysts are in possession of all the technical facts, or understand how well the existing system is known to be working. I like to assume that there are bugs in existence; it’s my job to identify and document defects, and to highlight problems to stakeholders, so that business expectations can be better managed. After all, you don’t want to be blamed for creating a problem that was already in existence before the migration was started!

Documenting problem workarounds and known faults are an important part of the strategy for maintaining focus. Moreover, future expectations and work time frames are commensurate with understanding who has ownership of problems and solutions.

Additional points about performing migrations

Performing a trial migration, gives us a new migrated system, and the old current system to compare side by side. Silently running these two systems together, in parallel, before business switch over, is common practice. This practice gives us a risk free method for verifying that the migration has been successful. Having the old and the new system online at the same time allows business users to acquire familiarity, training and confidence with using new features before switch over.

When we are dealing with a simple system, it makes sense to complete the planned migration all in one go. It might make sense to do an overnight migration, followed by a complete switch over for the business. We can then spend the next day(s) monitoring the system for signs of problems (or not). Getting the final sign off after one or two business days’ worth of error-free activity and business as usual status, will naturally become the next logical step.

System complexitySometimes however, the system to be migrated is a little more complex and has several moving parts, each of which needs to be handled separately. We would call this a piece meal migration, which means that the system is broken down into several parts. Each migrate-able part or task is then prioritized according to technical complexity and business risk. Each part of the system is then migrated, and tested separately, and then signed off as complete. Database backups are taken when a given part or task has been migration completed.

The implementation details of how migrations are performed are a matter of design and planning, this comes out of the data analysis phase.

A piece meal migration involves a gradual phasing out of the old system and phasing in of the new replacement system features. The process of how this will occur, are all matters for discussion and planning according to the perceived business risks and technical wisdom.

—      —–      —

System downtime: One of the common reasons that data migrations occur during out of office hours, is to avoid application downtime during peak office periods. Depending on the type of data and applications being migrated, only a narrow downtime window may be available.
For example, if the HR system is offline for 24 hours over a weekend, there would be less business impact than if the eCommerce system was offline for the same period during the working day. Some systems may be so critical to business operations, that a few hours or even minutes of downtime, even during out of office hours may be unacceptable.

Business deliverable artefacts – data analysis:

Appropriate work: Not all projects require every type of migration deliverable. The size of system, architectural complexity and data flows associated with items being migrated, are all factors that will help you choose which deliverable artefacts are applicable for the tasks at hand.

The number of business features being supported by migrate-able items is one view of complexity. An alternative perspective is to look at technical dependencies for migrate-able items. For example, what systems and sub systems supply data, or make use of migrated data? How many data sources provide data that will be migrated?

Checklistdeliverable artefacts

» Scoping report –plus business sign off: A scoping report details what will be in scope for migration work challenges. It will also document any features and functions that will be left out of the migration, to ultimately become obsolete. Let me make one more point about this artefact, it’s not unusual for very clever negotiators to attempt to move the goal posts of what work has been agreed. I have no problem with putting more work into scope, as long as there is a reciprocal understanding that time frames and time-sheet budgets also move in synchronization with those moving goal posts. However, sometimes, there are simple misunderstandings of what work has or has not been agreed, it’s good to write it down.

During analysis work, we occasionally discover that not all of the business data, or business features, or analysis reports, … are in use anymore!

» optional Risk Register (project risks shared with the team). Sometimes the system is complex and there are many business risks. It makes sense to itemize, and discuss each red-amber-green problem and work out strategies to mitigate risks. We need a vehicle to help track and prioritize numbers of issues that affect more than one team or business cost centre. A risk register facilitates discussion and allows items to be ticked off the list as problems are solved or worked around.

new Example piece of a Risk Register
Redacted Part of a risk register – project example

» Network Diagram (showing all data sources and data targets). This diagrams shows context, the context gives a high level architectural understanding of the relationships between computers, servers, data source locations, databases and systems that receive data. It’s usual for depicted information to show IP addresses, machine names and systems connected to any databases that are in scope.

Network Diagram part
Redacted project fragment of a network diagram

» optional High level Data flow model [Level zero context diagram] (showing all data sources). Data-flow Diagrams can show performance measurements such as high-level volumetric’s. Data flow diagrams provide graphical representations of the “flow” of data through an information system, they model data processes and outline any functional processing that take place. A set of such diagrams facilitates the documentation of data source information in a way that it becomes easy for business folks to interpret.

» Data dictionary – maps data from source to destination

A data dictionary is a valuable artefact, it is an essential piece of work that helps to facilitate discussions with business users, and it helps to technically describe aspects of the migration requirements.

A data dictionary consists of a data catalogue that defines data groupings, data types and business meanings for data values to be stored. The data dictionary should also identify computer system level relationships that will be in play during the migration.

Data Map: A data map is used for development work, it is a source-to-target specification that details where datasets originate from, how data values are to be transformed, and it also describes the data containers for storing business data. The map explains any default data values for missing or NULL source values. Data import schedules for running ETL data ingest algorithms are also covered.

Data Dictionary example showing a business glossary section
Part of a Data Dictionary

When sitting down with a group of business people to talk about their system, their internal work processes, discussing different ways that their system is used, understanding their expectations with what the system achieves for various business teams, we learn what works well, and we learn the workarounds that they use to mitigate problems. You will also learn that teams will explain their work activities in ways that make sense in the context of their workflow and in their business. You might say that this is obvious, but the terms and phrases that they use, become part of a business lexicon, a vocabulary that is very important, in your work.

The business lexicon is sometimes cultural; lexicons can include terms and phrases that are common for a given industrial sector. For example, insurance, legal, marketing and financial sectors all use a mixture of jargon and technical terms that make sense within their technical expertise, and within context of their relationships to their clients or customers.

This is important because their ‘business-speak-jargon’, becomes part of any computer program screens, (screen titles, labels, screen prompts, computer program error messages, …), they also become part of any business reports, (graph titles, axis labels, tabulated column labels in summary tables, report titles, …). Furthermore, they indirectly reduce the amount of hands-on training that is needed for on-boarding future users of the system that you are building. Put another way, if you get this wrong, and use terms and business phrases in the wrong context, you will create confusion and more technical support calls than you can handle when the system is delivered to your customer. Data dictionaries are a valuable artefact and get used by technical staff who create business intelligence reports or computer software. They also get used by test people who conduct product release testing. Furthermore, data dictionaries help you to explain to stakeholders how the new system is going to work.

» Migration roll out plan – Disaster recovery plan

The business analyst will also have a lot to say about planning for a migration roll out and disaster recovery. It’s possible that draft plans may already be in existence. In any event, the data analyst and business analyst should collaborate and come to a mutual agreement on the planning for the migration rollout and disaster recovery activities.

Questions: If something should go wrong, can the migration easily be terminated and restarted, will the business suffer any downtime, can business processing continue, what are the time-sheet and financial implications and risks if the rollout is not successful?

Priorities for roll out recovery should be consistent with the priorities for recovery of business functions and processes that are most critical to the business operation.

Up to this point in time, it has not been possible to define a complete or comprehensive rollout plan, for the simple reason that no development work on creating migration processes and algorithms has been started. All the work thus far, has been analysis, planning, making agreements on how work will proceed, as well as educating stakeholders on their responsibilities and parts to play during the migration rollout. We also need to agree on the provisional expectations for business teams, during the period of post-migration business-as-usual work.

Once some of the migration development work has been completed and progressed to the point of delivering the first migration trial, it will be much clearer to business people, how a migration rollout will affect their individual teams.

Migration rollout plan – draft

In general, a draft rollout plan will address most stakeholder questions, and will outline expectations for parties involved or impacted by the migration.

You should utilize a migration planning checklist to ensure that all pre-migration planning steps have been executed. The plan will outline the format and formalities for the migration kick-off meeting; it will set out any schedules and ensure personnel are aware that a migration rollout is to take place.

The migration kick-off meeting PLAN will simply agree on the formalities of what is going to happen, who has responsibilities for what, the time schedule for when it’s going to happen, etc.

In the draft plan – For the day of the rollout.

  • The draft plan contents should document preliminary migration rollout procedures, explain how all participants will be available. It should identify affected teams, cost centres or departments; it should explain how parties are to be notified in advance, and in good time, so that there can be time available for any parties to be able to call a halt to proceedings, if there is a business reason to delay or halt the process.
  • ChecklistDraft contents: Test checks that will be used to decide if it is safe to proceed with a rollout. i.e. Some checks might be, to make sure that folks are not using the system, all users logged off; any automated processes are not running, etc.
  • Draft contents: Guidance on how to perform data backups (remember businesses generate large amounts of data, and data files are also changing throughout the workday).
  • Draft contents: Instructions on how to perform the migration rollout
  • Draft contents: Explanation of how to test the rollout and ensure that the new system is running
  • Draft contents: Explanation of how notifications will be sent out to stakeholders and managers of affected teams.
  • Note: I have deliberately left out an important piece of the plan which is to agree on what the business should be doing post-migration. I will cover the importance of this in a future post.
  • Draft contents: Instructions for what to do with the system when the migration has been completed, tested and validation checked. Personnel, who have been performing the migration work, will need to go home – have a rest. During the following working days, migration personnel will be involved in watching and monitoring the new system. They will be listening and watching people using the new system, training personnel who need things to be explained, being available to investigate any concerns or issues that people have.

Disaster recovery plan – draft

Recovery strategies, sometimes we call this the rollback, if we are performing a rollback, it’s because the business or the test folks are not happy with how the migrated system is performing. The objective of the roll-back is to return the business back to using the old system and attaining a business-as-usual status once again.

Recovery Plan Instruction: Explanation of how to perform a database restoration operation to return the database and data stores back to the state that they were in before the migration rollout was initiated.

Another Recovery Plan Instruction: Explanation of how the business is to re-enter any data entries that occurred post-migration. This is an important step and will be covered in more detail in a future post. It goes without saying that for business stakeholders, this is one of the most important parts of the migration process, because it is this step, which mitigates against the business risk of anything going wrong. This is the step that gives confidence to proceed with the migration and justifies the business sign off to progress to the next step in the migration process.

Being able to re-acquire the business-as-usual status and recover any lost data entries is important, it means that the business will not lose any sales or contracts or invoices or payments, or receipts of new manufacture parts or inventory stock…

Disaster recovery will be covered in more detail in a future post.

Development next steps

When developing migration algorithms, we need to rehearse and trial the process to see if it works. We isolate the migration trial from the rest of the business because it does not make sense to disrupt the business-as-usual-status of employees while we discover how effective a trial migration has been.

In the next phase of the migration process, practice trial data migrations will be conducted, this is followed by a methodical development test strategy. Developers will test-learn-discover what is not working and then develop improvements and corrections to the migration algorithms. They will repeat the test migrations until problems have disappeared or are acceptable to the business. I make these points because there are always some individuals who want to know why the migration work and the analysis work cannot be Future developments and collaborationconducted in parallel. We need to explain that the process of figuring out what is to be migrated, how it will be migrated and acceptance tested is not the same as creating, developing and testing the migration algorithms themselves. There is also the task of negotiation with teams who will be affected by migration activities that are taking place. Furthermore, there may be some requirement for training of business personnel who will be directly involved in performing migration acceptance testing,

Consider why we would want to, or need to, disrupt any business departments, when we are still developing the data transformation, cleansing, normalization of data, migration and validation test check algorithms! There will still be much work to do after the data analysis phase has been completed.

Business data is continually changing – so is analytics BI Business Intelligence technology

If it’s not obvious, we also need to understand and explain that, during office hours, data stores, data warehouses and business databases are living computational processes that change each and every time an operator dispatches an invoice, takes an order, logs a customer enquiry, updates stores and inventory, generates an automated letter, sends out a salesperson to meet a customer, signs a support contract, … Your database or warehouse may be in constant use by dozens or several thousands of employees; so there is a financial risk to your business if your system becomes unstable.

Deliverable artefacts: Data Analysis work.

What you are paying for!

  • optional NDA None Disclosure Agreement.
  • Scoping report –with business sign off.
  • optional Risk Register (project risks shared with the team).
  • optional DFD Data-Flow Diagram.
  • Visio network diagram.
  • optional State sequence and timing diagram (scheduled data transfers).
  • Data dictionary – including data mapping details.
  • draft Migration roll out plan.
  • draft Disaster recovery plan.

Commercial enterprises need to stay current with the latest business intelligence analytics and database technologies in a competitive environment. The next article on this tread will cover some of the different types of database migration and explain some of the practical reasons why migrations are important for businesses to move forward.

Epilogue

Thanks for reading, any thoughts or ideas welcome.

KeithSware@wordpress.comIf you want to sign up to receive future blog posts (look for the follow button at the top right side of your screen)

I’m also at LinkedIn: Keith Sware

Leave a comment