Steps of the data journey
The data journey represents the key stages of the data process. The journey is not necessarily linear; it is intended to represent the different steps and activities that could be undertaken to produce meaningful information from data.
Step 1: Define, find, gather
The first step is to get data, whether this is using a pre-established database or establishing what variables are needed and creating and implementing a collection method. Security measures should be established and implemented to protect the integrity of the data once it's been collected.
The following competencies apply to this step: data discovery, data gathering, and data management and organization.
Step 2: Explore, clean, describe
Data should be explored to understand the format and variables and also checked for for errors and missing values. It may be necessary to clean the data before using it for analysis which includes doing such things like correcting formatting, removing or correcting erroneous data, or something as simple as taking out extra space. It important to document what you found and what you did to clean the data.
The following competencies apply to this step: data cleaning, and data exploration.
Step 3: Analyze, model
The purpose of doing analysis and modeling is to use statistical techniques to turn the data into information to provide meaningful insights. Analysis and modelling is used to describe a phenomenon, draw conclusions about a population or make predictions about future events.
The following competencies apply to this step: data analysis, data modelling, and/or evaluating decisions based on data.
Step 4: Tell the story
The statistical information that comes from analysis and modeling is easier to digest if it is presented in some sort of story. It could be a research paper, an infographic, an article for the media, or some combination of these and other data presentation methods.
The following competencies apply to this step: data interpretation, data visualization and/or storytelling.
Foundation: stewardship, metadata, standards and quality
In order to successfully follow the steps of the data journey, it is essential to build your work on a solid foundation of stewardship, metadata, standards and quality.
- Stewardship encompasses all activities to govern, safeguard and protect data.
- Metadata should describe all the processing and manipulation that the data has undergone.
- Standard methods, practices and classifications should be applied throughout.
- Quality should be proactively managed throughout the process and relevant quality indicators should accompany all deliverables.