Guidelines for Validation Rule Developers

From PHUSE Wiki
Jump to: navigation, search

This Project is Complete

The purpose of this project is to outline a set of basic guidelines for validation rule developers. These guidelines will provide instructions, best practices, and examples for how to define, organize, and document validation rules to ensure consistency and compatibility with FDA data quality initiatives.


In order to review standardized clinical and nonclinical study data submissions, enhance the quality of the submitted data and improve the review process, there is a need for close collaboration between FDA and the biopharmaceutical industry, standards development organizations (SDO), academia and other federal government agencies conducting clinical research. To support the transition to a more advanced review process, FDA has initiated the Data Validation Project. The objective of the project is to evaluate existing validation rules, develop guidelines for new validation rules, and implement tools to enable systematic, automatic and regular evaluation of data quality and standards conformance at the time of product submission. This project will inform reviewers of characteristics of submitted data, allow for cross-submission assessment of potential problem areas, and help generate a better understanding of several key factors related to data standards adoption, including the rate of uptake of standardized data by sponsors and the degree of adherence to the specified standards. This project is aimed at enhancing the FDA’s regulatory review environment, by enabling the reviewers to utilize standard-based advanced review methods and tools.

In order to achieve the goals of this project, the following working groups were formed: 1. Develop guidelines for validation rule developers 2. Assess and improve existing validation rules 3. Develop and support change management process

This white paper will focus on the deliverables for a sub-team within working group 1 focusing on developing guidelines for validation rule developers.

Current State

Currently, there are various teams developing validation rules (OpenCDISC, CAB validation project and ADaM validation sub-group). It is important to evaluate, what guidelines are being followed by each of these teams and what metrics are utilized for rule effectiveness.

Existing approaches, common practices and challenges with existing tools

There are various developed tools in use, aiming at ensuring clinical trials study submission data compliance with CDISC standards, including Study Data Tabulation Model (SDTM), Analysis Data Model (ADaM), Standard for the Exchange of Nonclinical Data (SEND), Define.xml (submission Meta data) and others. Table 1 captures some of the tools currently in use:

Tool Description
1. OpenCDISC Validator An Open source, easy to download, install and use. The Validator is metadata driven and configurable with checks defined using XML-based validation framework with power to support a wide range of rules and data formats.
2. WebSDM Web-based submission data manager (WebSDM) tool allows users to load study data in the SDTM format as well as check and correct errors and inconsistencies, and browse data in a variety of tabular and graphical formats. The WebSDM tool was originally developed under a Cooperative Research and Development Agreement (CRADA) for the FDA to validate and review submission data in CDISC Study Data Tabulation Model (SDTM) format, and has been installed and in use at FDA since 2005.
3. SAS CDI Checks SAS Clinical Data Integration (CDI) organizes, standardizes and manages clinical research data and metadata. It Supports data standards and performs adherence checks.
4. Entimo SDTM Checker Entimo’s SDTM and ADaM Checkers are validated SAS macro solutions which verify compliance of clinical data with CDISC SDTM and ADaM. These tools inspect structure and controlled terminology, run officially published and additional checks and provide detailed reports including result statistics.
5. Octagon CheckPoint Octagon's CheckPoint Data Validation Services facilitate the early detection of potential data conversion and compliance issues. Basic validation includes 105 FDA checks and enhanced validation includes an additional 200 checks that verify SDTM compliance.
6. In house developed Tools These are custom checks developed by sponsors in house.

Table 1: Existing Tools for data validation

Some of the current challenges with the existing rules deal with the versioning of rules as well as lack of alignment of rules with versions of the CDISC standards. There is a need for defining better semantics in SDTM, SDTM IG, and SEND IG.

Proposed Guidelines for Rule Developers

The following guidelines for developing rules have been proposed by this working group:

A validation rule definition is recommended to contain the following attributes:

1. Rule ID (required)

Rule ID needs to be unique, however there are no specific formatting requirements at this moment. In addition, Rule numbers need to be unique.

2. Description (required)

Description defines the success scenario using common business terms, rather than by using any particular programming language. Descriptions are expected to be concise, complete and consistent.

Some examples to achieve consistency across Descriptions could be

1. Use “should” for Warnings and “must” for Errors
2. Use both Variable Labels and Variable Names
SD0007 (Error): “Standard Units (--STRESU) must be 
consistent for all records with same Short Name of Measurement, 
Test or Examination (--TESTCD), Category (--CAT), 
Specimen Type (--SPEC) and 
Method of Test or Examination (--METHOD)”
3. Use “Prescription, Condition” or “Variable Label [Variable Name]

SHOULD/MUST be something, Condition (WHEN or IF something)” structure

SD0016 (Warning): “Character Result/Finding in Std Format 
(--STRESC) value should not be NULL, 
when Derived Flag (--DRVFL) value is 'Y'”

3.  Message (required)

Message is a simple and short statement specifying what went wrong. Messages need to be concise, complete and consistent. The messages should be simple statement capturing the error (ex: invalid, missing, and inconsistent) Syntax: – Element name Should/SHALL – Variable Label [Variable Name] SHOULD/SHALL be something, Condition (when or IF something) Example: The value of Age (AGE) SHOULD be greater than 0.

SD0007 (Error): “Inconsistent value for Standard Units”
SD0016 (Warning): “Missing value for --STRESC, when --DRVFL='Y'”

4. Severity (required)

The severity of messages addresses the 3 categories below:

  1. Error - high impact on general data integrity, standard compliance or data quality
  2. Warning - medium level impact on data quality, which usually does not prevent reviewer from performing the intended task, but reduces the quality of expected results
  3. Informational/Notice - no direct impact on data quality; usually informational

5. Source

Source is a link to or/and a citation of the resources that were used to create the validation rule. 

6. Computational Algorithms/Implementation Details

In some cases the validation rule's Description may be not enough to ensure consistant implementation. For example, handling partially missing dates in ISO8601 format can be complex and depend on a particular programming environment or business case. To avoid misinterpretation of the validation rule, all computational algorithms and implementation details should be documented in this attribute.

7. Errors/Warnings related to controlled Terminologies

Rule developers are expected to refer to the SDTM, SDTMIG, and SENDIG documentation for details on each variable and the defined Controlled Terminology. It is expected that implementers will follow the guidelines defined by the CDISC Controlled Terminology team on the use of Extensible vs. Non-Extensible code lists. Not complying with extensible CT is considered a Warning, but not complying with non-extensible CT is considered an Error.


Include additional programming notes, if necessary, e.g. – Additional conditions may be added e.g., “when both Start and End Date/Time are not missing”, etc. – Imputation algorithms are expected to be specified for partially missing date/time values – Identifiers for the rules should include version of the CDISC standard and reference to the model or the implementation guide

Any other relevant information that was not captured by the previous attributes should be collected in this attribute.

Back to Working Group 1: Data Validation and Quality Assessment Homepage

Last revision by Mitrarocca on 03/19/2013