Challenges of Integrating and Converting Data across Studies

From PHUSE Wiki
Jump to: navigation, search


Data integration has always been a challenge both for industry and the agency. Inconsistency of approach when structuring data within a single organization, never mind across organizations, makes integration difficult if not impossible. With the advent of the CDISC standards, industry and the FDA at last have an opportunity to use those standards to facilitate the integration of data. As soon as one starts to think about ‘integration’ a number of questions arise. Integrate what? For what purpose? Also one stakeholders view of integration is different from the next. Working group three is there to answer these questions. The vision for the group is to address the challenges of analyzing and integrating data from retrospective studies from both an industry and regulatory perspective and provide best practice to those needing to integrate data.


The scope of the working group includes the conversion of legacy data, the effects of different study designs, and how the flexibility in standards leads to issues. The group will then work towards identifying best practices for topics that might also include standard protocol design elements, consistent data collection rules, and the application of other standards up front in the data collection process for all future studies, as well as the impact those would have on time, resources, and scope. The scope may be amended as the work progresses

2012 CSS Meeting

The initial face-to-face meeting for the working group took place at the 2012 FDA/PhUSE Computational Science Symposium. The working group meeting started with a set of 5 minutes presentations from the group’s co-leads about on what they saw the issues being. The work group then split into four break-out groups to address the issue of “what is integration” and what is meant by the term. On the second day the group started considering potential work streams and outputs. Some of the topics that were discussed include:

  • The mechanics of integration
  • Are we talking about just SDTM, ADaM or both
  • Traceability
  • Ability of analysis data sets to support key analyses in the CSRs
  • Ability to confirm the veracity of the analysis data sets using the SDTM
  • Ability to confirm content of SDTM data sets by referring back to the CRFs
  • The types of integration, use cases


As a result of the two day face-to-face the following projects have been formed

Integration Definition

Purpose The purpose of this sub-group is to provide a succinct and understandable definition of “data integration” to allow people to understand what the working group as a whole is working on. The discussion around the definition also allows the different viewpoints to be aired
Deliverables a) Short definition of what “integration” is
b) A longer white paper that elaborates on the definition to provide meat to the definition.
Deliverable Dates a) End of Q2 2012
b) End of Q2 2012
Lead(s) Kit Howard

Draft definition: Integration of Clinical Trials Data is a process whereby multiple input data sources (accommodating for the evolution of data standards, terminology, data definitions, and versioning) are combined in a consistent and traceable manner to address pre-specified and/or ad-hoc analyses to support the clinical research lifecycle for stakeholders.

Appropriateness for Data Integration

Purpose The purpose of this sub-group is to provide guidance on when it is and when it is not appropriate to integrate
Deliverable Dates Decision Tree and set of use cases
Deliverable Dates Q2 2012
Lead(s) TBD


Purpose The purpose of this sub-group is to examine the issue of traceability on integrated data and what is required so as to trace back from the integrated data to the source data.
Deliverables A short white paper of what is required, why it is required and how such traceability can and should be provided.
Deliverable Dates Q3 2012
Lead(s) Paul Bukowiec, Donna Danduone


Purpose The purpose of this sub-group is to provide guidance on the mechanics of integration.
Deliverables a) Decision Tree
b) Data Flow
c) Output Model
Deliverable Dates a) Q3 2012
b) Q3 2012
c) Q4 2012
Lead(s) a) Decision Tree: Karen Graham, Rachna Mittal
b) Data Flow: Yimei Wang, Tom Abernathy
c) Output Model: Sandy Lei, Natalie Reynolds

Working Group Leadership

Name Role Organization E-mail
Chuck Cooper FDA Co-lead FDA - CDER
Jingyee Kou FDA Co-lead FDA - CBER
Jim Johnson Industry Co-lead UCB Biosciences, Inc.
Sandra Minjoe Industry Co-lead Octagon Research Solutions
Dave Iberson-Hurst Steering Committee Liaison Assero Limited


FDA comments are an informal communication and represent the individual's best judgment. These comments do not bind or obligate FDA. The contents of this wiki are from the individual contributors and do not necessarily reflect the view and/or policies of the Food and Drug Administration, the employers of the individuals involved or any of their staff.

Last revision by DanBoisvert,04/25/2013