WG5 Project 09

From PhUSE Wiki
Jump to: navigation, search

Project Team

Project Lead(s):

  • Peter Schaefer (pschaefer (at) vca-plus.com)

Project Members, 2016:

  • Nancy Brucken, inVentiv Health
  • Cynthia Stroupe, UCB
  • Jessica Dai, Vertex
  • JJ Hantsch, Paraxel
  • Tania Walton, d-Wise

Additional team members are very welcome! Some thoughts about the required expertise:

  • Familiarity with study data and CDISC datasets
  • Programming knowledge, preferably R, but any SAS programmer willing to deal with R programmers is welcome
  • Enthusiasm about helping other to test systems and a basic understanding of system testing and qualification

If you want to participate, please simply contact Peter at pschaefer (at) vca-plus.com or the PhUSE office.

Project Description

Name:Test Data Factory

One of the projects under WG5 Standard Scripts.

Several CS Projects develop and specify medical research methods, features, or processes, and some even create software components or subsystems for common tasks in drug development. As part of these efforts, a variety of SDTM or ADaM test datasets are required. The typical fallback position of project teams is to use data from the CDISC pilot project and/or anonymized study data that are provided by project team members. The Test Data Factory project aims at providing test data formatted in SDTM and ADaM that support a more systematic and comprehensive testing of these concepts and scripts.

What we are currently working on:
The team decided in some initial meetings to work on the CDISC Pilot Data and to update the data to meet newer standards. This is considered "low hanging fruits" and will quickly produce useful results. However, we won't loose sight of the more generic approach of using simulation to generate test data.

General Approach:
The proposed project approach is to use a systematic approach to create test datasets consisting of the following activities:

  1. Define an initial scope of test data, i.e., which types of datasets are required (prioritized list) and what are the desired characteristics of these datasets (for example, incomplete, missing, or wrong data)
  2. Collect ‘real’ study data and identify existing test datasets
  3. Two options:
    1. If datasets already exist (like the CDISC Pilot project data): Update the datasets to comply with newer standards
    2. If datasets do not exist: Create scripts (preferably R scripts) that will create new test datasets with specified features through a simulation-based approach
  4. Define and implement a process to store and publish test datasets and the appropriate metadata through the existing channels (for example, PhUSE Wiki and the Github repository defined in the Standard Analyses and Code Sharing working group)
  5. Identify an infrastructure that enables users to easily use scripts to create test datasets.

Work in Progress and Meeting Notes

This section shows meeting notes, what the project team is working on, and the status.

Planned Work In Progress Done Comments
mh.xpt, relrec.xpt, sc.xpt ae.xpt, cm.xpt, lb.xpt, qs.xpt, suppae.xpt, supplb.xpt, sv.xpt dm.xpt, ds.xpt, ex.xpt, se.xpt, suppdm.xpt, suppds.xpt, ta.xpt, te.xpt, ti.xpt, ts.xpt, tv.xpt, vs.xpt Updating the CDISC Pilot xpt files
Reviewers Guide explain the remaining issues reported by Pinnacle 21 validation
define.xml not started or planned yet

Meeting Notes


  • Still working on the assigned domains: Nancy (AE, SUPPAE), Peter (QS), JJ (CM)
  • Cindy completed and uploaded DS, SUPPDS, EX
  • discussed Jessica's question about SV: descided to use decimal values for unplanned visits (VISITNUM) and to add the SVUPDES variable. Might required consistent updates to visit variables in other domains
  • Cindy picked LB, SUPPLB but won't start working on it immediately
  • Peter will fix dataset names to match the filename (issue with R-created xpt files) and run P21 on all completed domains (NOTE: This has been done and updated in Github)