SOP FOR DATA LIFECYCLE MANAGEMENT Data management encompasses the design, collation, cleaning, and management of all information, observations and measurements. Efficient and effective data management is necessary to minimize risk of error. Data management is an integral part of any application. Data management involves not only data collection and entry, but also the ongoing management of data, data preparation, analysis and publication, data archiving and destruction (Figure 1). Ideally, a data manager should be consulted in this process, although this may not always be feasible. A minimal requirement for all studies is a Data Management Plan (DMP) to ensure the quality of data and outputs, integrity and repeatability, appropriate access to data, and appropriate reuse of data for subsequent analysis and to record the person(s) responsible for various aspects of data management. The most important ‘rule’ in data management is documentation – there should be an audit trail through which changes in information, and alterations to data, can be traced back to source, should anything need to be investigated. This documentation is a safety net for the data. The key elements of data management are: •Data must be recorded in a durable and appropriately referenced form that complies with relevant regulations and policies. •The data must be retained for sufficient time to allow reference. Recommendations for the duration of data retention should be as per Jhuma guidelines. •Data reported should be available for discussion with other users, without breaching confidentiality. SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 1
PURPOSE This SOP describes the processes for data collection and management (including procedures for quality control (QC), data query resolution, data retention and archiving). This document will guide the development of a Data Management Plan, and is written so that anyone involved in the study can understand the procedures with relative ease. SCOPE For the purposes of this document, data management includes the: a. Definition of good quality data b. Data entry processes c. Development of a data dictionary d. Access and permissions to documents and data e. Collection and storage of data f. Quality control and assurance of data g. Data archiving and destruction 1. Identify and describe key data The first step is to consider the data carefully in terms of what is being collected, why, and how it will be used. We have robust Data Classification Standards to identify key data: PII Data All organisations that handle customer data need to be careful about how this data is used and ensure that it is compliant with customer expectations and any privacy notice’s issued. As per the Information Commissioner’s Office the definition of PII data is: “personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person” In order to ensure compliance with GDPR, we seek to classify our data into one of the 4 options below. This enables us as an organisation to be aware of the types of personal information that we hold and can be explicit in what we can and can’t do with it. Classification Description Usage Notes How to Classify Name Data which is deemed ‘Not Personal’ Data which is not personal Not Personal is clearly not about a person has no additional usage (whether the person is identifiable requirements from this or not). It may be about a physical classification. Refer to asset/product or concept. additional categories SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 2
Personal(Not Data which is about a person (e.g. (Confidentiality and CDE) for Any piece or set of data that Identifiable) First Name) but would not be usage notes. is about people but would possible to identify an individual in Data which is personal but have the majority of its current context and would be not identifiable should be identifying characteristics unlikely to be able to readily identify freely used in line with other removed. It would likely be a an individual person. categorisations so long as it is generalised list with no row not joined with other data identifiers that would enable sets which would make the it to be joined to another data identifiable. dataset. Personal Data which is about a person and Personally identifiable Any piece of data which (Identifiable) would be possible to identify the information (PII) is a risk to would allow you to readily individual either directly (using the the organisation if it is identify an individual person, data available within one asset) or misused or lost. Misuse will based upon direct defining indirectly (by combining this data typically be deemed as using characteristics (such as full with other readily available the data in a manner that names) or by the ability to information). would not reasonably be use multiple separate expected by our customers or characteristics that would is not explicitly covered within allow you to narrow down to our privacy statement. an individual person. Personal(Special Special category data refers to a Personal data should be Special category data is Category) particular subset of personal obfuscated in the majority of clearly articulated by the ICO information that is deemed to be scenarios in which it is stored on their website. highly sensitive in nature or a or accessed - particularly protected characteristic. It is defined when analytics are being Some examples include: by the information commissioners performed. Personal data revealing racial office as a part of GDPR legislation Extreme care should be taken or ethnic origin; in the sharing and usage of Personal data revealing special category data. Access political opinions; to such data should be on an Personal data revealing exceptions basis only and by religious or philosophical default staff should not have beliefs; access to this information Personal data revealing trade unless there is a clearly union membership; documented reason Genetic data; otherwise. Any usage of this Biometric data (where used information should be in for identification purposes); accordance with the purposes Data concerning health; under which it was gathered, Data concerning a person’s which will have been sex life communicated with the customer at the point of capture. Special category data should be obfuscated in the majority of scenarios in which it is stored or accessed. Confidentiality The confidentiality category is used to define and articulate how far a particular data asset or piece of information should be shared. SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 3
Classification Description Usage Notes Name Public Information which is already in the public This information can be freely used. domain or which could be released into the Internal public domain Share freely with client employees and associated Information related to internal operations or Third Parties communications which is of general relevance to employees and appropriate for distribution throughout client org and to Managed Service Providers/ Third Parties Personal Information which is proprietary to client or This information should only be shared with staff (Identifiable) related to a particular business process and on a “need to know” basis. therefore access by all employees is not Personal(Special necessary or appropriate. Access to this This information will be limited to a select Category) information is only required by those with a need to know to fulfil their duties or roles, access and authorisation control will be in place. The most sensitive business type of information used within client, where the information is typically restricted to a small, discrete group of senior individuals and employees where appropriate. Access to this information is only required by those with a need to know to fulfil their duties or roles, access and authorisation control will be in place. CDE Critical Data Elements (CDE) are deemed to be vital to the successful operation of the business and would cause significant risk if lost or altered in any way without sufficient impact analyses and stakeholder engagement. Classification Description Usage Notes Name Non-CDE An item of data deemed non-critical may be Please refer to other categories defined in this supplementary information that is useful for the document for a given non-CDE for usage notes CDE running of our business but will not cause critical damage or risk to the organisation if it is An item deemed to be a CDE should undergo a amended. detailed impact assessment whenever it is to be Those data elements that drive capital or are used altered or amended in any way. There may be by executive management to make significant multiple up or downstream processes or business decisions or enable the running of our stakeholders dependent upon this element being business. in a pre-agreed format. The quality of CDE’s is of It must be possible to determine whether a utmost importance and should be monitored. business term or a data element is \"critical\" or not, where critical means that it has a direct impact on one or more of the following: • Key Customer Journeys You should engage with relevant data stewards • Brand consideration and owners when working with CDE’s to ensure • Any possible financial loss or significant gain that usage is compliant and does not cause risk to • Regulations such as GDPR the organisation. • Shareholder Reporting SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 4
• critical services - should link to customer journey / customer experience We propose a Data Dictionary and Guidance to succinctly describe Key data elements Our how-to guide (with examples) explains how and why data elements should be formatted and defined in a standardized way. This standard approach ensures that each data element can be understood and used by anyone in the organization, from executives to frontline management. The Data Dictionary template and guides contains guidance on the expected standards for definitions and specifications to be approved. As content will be added by many employees across the enterprise it is important to aim for common standards in how data and concepts are presented. This will provide a coherent experience for users and will aid in uptake and usage of the data. Attachments to be added – Template, Guidance 2. Outline the mechanisms to capture the data The steps and procedures for data collection and data entry should be outlined in detail and include the responsible person for each step of the process. Having extensive Data Validation is a key to ensure that all the data entering the system meets all the requirements. These can be defined around the below mentioned areas: 3. Outline the infrastructure and mechanisms to store the data The data management plan should also describe the mechanisms by which the data sources for storing data should be set up. SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 5
There are two basic requirements that need to be included in any modern data storage solution: Ability to scale: Data storage needs to scale up and down. And, since we’re dealing with massive amounts of data, it’s difficult to determine when an influx will occur. Modern storage infrastructure should scale as you ingest more data. There are two reasons for this: • It’s safe to expect data will exponentially increase over time. The amount of data customers are generating, and that businesses are collecting, is growing every day. • It doesn’t make sense to pay for storage until you need it. You should be able to scale your storage, and your expenses, based on your actual needs. Ability to handle concurrency: The workloads you’re planning to run will require some significant processing power. In addition, you’re probably going to need to be able to run multiple workloads concurrently. If the data storage solution you’re considering can’t handle this, then you probably should reevaluate your decision. Aligning Data Storage with Business Goals Your data storage solution must align with your business goals. In a world of digital transformation, your data is the core component to driving business value. Consider the following questions to understand how a data storage project supports your objectives. These are the same ones we review with our clients and work well to help define their specific needs. What applications will be running? What value does this application provide to the business? Data storage is all about facilitating your ability to compute. We must understand the We like to use two metrics to determine the processing requirements to recommend the value of the application to the business. right storage solution. Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs). RPOs identify how much data a business can afford to lose if they have a system failure. RTOs define how quickly you need to return to operation if the application goes down. How will the cost of the data storage Where will you store and solution be weighed against the compute data? value it provides to the business? You need to decide whether you’re going The cost of implementing a system to locate your platform on-premises, in the must give a competitive edge and drive so public cloud, or utilize a hybrid solution. much value that it makes sense to invest significant resources in it. SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 6
What is your comfort with managing What are your governance issues? storage assets? Finally, you need to make sure your data To choose a data storage solution, it’s storage solution is structured in a way vital to understand how you will manage that will enable any required governance your storage assets. Will it be through processes to take place. One of the most a traditional GUI interface, a standard frequent examples of this is European Command Line Interface (CLI), or through a businesses needing to comply with the more modern approach like an API? requirements of GDPR. 4. Describe Data Security and Privacy Data Security and Privacy is of utmost importance, following steps are recommended to secure your data 1. Build and Maintain a Secure Network (including use of confidential passwords and documenting an audit trail to capture changes to information). 2. Protect Data through Multifactor Authentication 3. Maintain a Vulnerability Management Program 4. Implement Strong Access Control Measures (e.g. restricting access and using unique IDs for each person who accesses data, criteria for using electronic signatures). 5. Regularly Monitor and Test Environments (e.g. tracking and monitoring access and testing security systems). 6. Maintain an Information Security Policy 7. Lay out a data Privacy Framework We typically recommend a range of questions based on the below framework to gauge your overall preparedness of Data Security and Privacy SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 7
GDPR – General Data Protection Regulation The EU General Data Protection Regulation (GDPR) is the biggest change in data privacy laws in over a decade. GDPR came into effect in May 2018 and is designed to harmonize data privacy laws across the EU , protect and empower the data privacy of all EU citizens and encourage organizations to rethink their approach to data protection. The regulation impacts all aspects of an organization’s operating model. WHAT’S HAS CHANGED WITH GDPR ▪ Increased right to be forgotten ▪ Harmonization of data ▪ New right to portability protection rules across the EU ▪ New accountability for 3rd parties ▪ Simplification of Legal landscape who process data, and increased ▪ Overseen by the European Data Individual´s Rights accountability for data Protection Board, and the local controllers Regulatory regulatory bodies . E.g- ICO (UK), Increased ▪ Unambiguous consent required Harmonization LDI (Germany) for data usage Wider Scope ▪ Wider definitions with tighter Stronger ▪ Civil Suits from Government Enforcement principles agencies, business entities and ▪ Protect all EU citizens, regardless individuals of where the organisation is ▪ Imposes direct obligation and based liabilities for Data Processes ▪ Data Protection Officer is required ▪ Data Protection Authorities have to govern the use of personal the right to assess and audit data, and monitor high risk processes companies ▪ New rules for ´special categories of data´, which include genetic , biometric data KEY THEMES FOR GDPR a. RECORDS AND CONDITIONS OF PROCESSING Firms are required to locate where personal data is held across the organization, maintain a data inventory and data processing record ( particularly retention , archiving, disposal, and audit trail of consent ) and establish the lawful basis of processing the data. All these feed into an “Article 30 report”, which is a summary report of all data processing within an organization. These are 6 lawful bases of processing, of which gaining consent from an individual to process their data is one of them . Consent requirements have now been enhanced under GDPR, which require you to amend consent capture and management to enable transparent use of personal data. Eg Consent opt in, explicit consent for special categories of personal data, storing copies of privacy notices and associated audit trail etc. b. DATA SUBJECT RIGHTS Firms are required to provide the following fundamental rights to both employees and customers; SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 8
I. Data Access II. Data Rectification III. Right to be forgotten IV. Right to restrict processing V. Right to Object VI. Data Portability Recommendations to drive compliance a. Review the existing processes and implement enhancements to provision the data subject rights. b. Deliver frontline staff training and communication to operationalize the new and or enhanced processes c. PRIVACY, SECURITY & BREACH MANAGEMENT Embedding privacy and security requires both a cultural change and proactive, which can reduce and mitigate risks Firms are now required to … ▪ Notify supervisory authority (the national data protection regulator) within 72 hours of discovering a data breach) ▪ Perform Data Protection Impact Assessment (DPIA) on business areas using personal data ▪ Embed privacy by design and default into business processes and systems ▪ Ensure appropriate organizational and technical security measures are in place for the protection of personal information d. DATA PROTECTION OFFICE & GOVERNANCE A data protection officer is integral in overseeing all aspects of data privacy and protection Firms are required to: Appoint a DPO to act as a first point of contact for supervisory authorities to monitor compliance, advise on data protection impact assessment , and inform board members and employees about their obligations to comply with the GDPR The DPO will require a dedicated team to execute its roles and responsibilities and be a second line of defense function e. THIRD PARTY DATA MANAGEMENT Under GDPR; data processors and controllers are subject to direct statutory obligations and penalties , rather than only being subject to obligations imposed on them by contractual agreement with the controller. Firms are required to ensure appropriate safeguards are in place for all data transfers, and that the data subject can be provided with information of with whom their data has been shared with via privacy notice. SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 9
5. Standardize data entry, checking and validation Data entry The data management plan should describe the routine for data entry, how missing variables are to be coded, how logic checks (as described in the data dictionary) will be implemented as data are entered, and the process for querying any inconsistencies. Updating data The data management plan should include procedures for alterations and updating of data. This should include details as to who is responsible for making these changes and how details should be logged. Cleaning and validation Data cleaning is an important part of quality assurance and control. Steps need to be taken to clean the data after it has been entered. It is important to be vigilant over your data at all times . 6. Outline a strategy for backing up data The data management plan should outline the processes of backing up data. For example, data may be stored on a server which is backed up daily. Details here may include the review and upgrades of data format Questions you may have for your system administrators include: • How frequently is the data being backed up? • How will disaster recovery be dealt with? 7. Auditing data Regular audits of the data should occur throughout data collection. Most often, these will be self-audits and should be periodical. Audits can also be external where some sponsors SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 10
will conduct audits, and of course there is always the chance of a formal/ regulatory audit for example from your ethics committee. The purpose of an audit is to ensure that research is being conducted as specified in the protocol approved by the ethics committee and to identify any errors that may be occurring and why they are occurring. Details in your data management plan may include how often an audit will be conducted, the types of checks which will be performed and what are the acceptable rates of error. Like every other aspect of data management, it is important to document audits (for example by having an audit report) and log activities. 8. Using the data – preparation for analyses and dissemination Considerations for preparation and dissemination include: o How do data cleaning decisions influence the distribution of variables? o Review of missing observations - are there many missing values, and does there appear to be any pattern in how the data are missing? o Review of outlying observations o Comparison and correction of differences in coding schemes. 9. Outline the plan for archiving, long term storage, and destruction of data The recommended length of time to retain data varies depending on the type of study and location; refer to funding body specific requirements when designing the data management plan. Archiving and long term storage Data should be stored in such a way that they are quickly and easily identified and retrieved when required,it must be possible to demonstrate that the data can be retrieved.Not all data needs to be archived. Data that is archived should be of value Destruction of data Data destruction must be recorded, noting the dates and authority on which this action was taken. Confidential data and records in paper format should be shredded. Confidential research data and records in electronic format should be destroyed by reformatting or overwriting. ‘Delete’ instructions are not sufficient to ensure that all systems pointers to the data incorporated in the system software have also been destroyed. Some data may be considered for permanent retention. This usually occurs for data that: • Is controversial or of high public interest • Would be costly or impossible to reproduce • Relates to the use of or supports the development of an innovative intervention • Has long-term heritage, historical or cultural value • Is of significant value to other areas or users SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 11
DATA LIFECYCLE MANAGEMENT CHECKLIST Type of data and records ▪ What is being collected and its purpose. ▪ The type of data and the format in which it is being collected. ▪ Each variable collected is defined in the data dictionary ▪ Consistent variable nomenclature as per template and guidance Capturing the data ▪ Development of data capture materials Storing data ▪ Mode of data capture. ▪ Timeline for data capture. ▪ How data will be handled. ▪ Data collection roles and responsibilities. ▪ Data entry roles and responsibilities. ▪ Versioning (including limit to the number of versions stored at any time). ▪ Responsibility for data. ▪ Data security ▪ Access to data/ secure networks (e.g. restricting access and using unique IDs for each person who accesses data, firewalls, encrypted transmission). ▪ Protection of data ▪ Vulnerability management SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 12
▪ Information security policies, where relevant. Standardizing data entry, ▪ Data entry process. checking and validation ▪ Managing/ coding missing variables. ▪ Implementation of logic checks and queries. ▪ Allocation of responsibilities for different tasks. ▪ Data cleaning. ▪ Mechanism for monitoring and testing data. Auditing data ▪ Auditing process, internal and external. ▪ Frequency of audits. Using the data ▪ Data preparation process. ▪ Data transformation process. ▪ Data analyses (usually reference to the statistical analysis plan). ▪ Data dissemination. Data backup ▪ Frequency of data backup. ▪ Duration that the backup is kept. ▪ Format of backup. Archiving, long term ▪ Legislative requirements for retention of data. storage, and destruction of ▪ Organisational requirements for retention of data. data ▪ Location of archived data. ▪ Format of archived data. ▪ Process of data destruction. SYDNEY | PRAGUE | LONDON | SINGAPORE | FRANKFURT | SAN DIEGO COMMERCIAL IN CONFIDENCE, PAGE 13
Search
Read the Text Version
- 1 - 13
Pages: