SDTM Validation Minutes 2013

From PHUSE Wiki
Jump to: navigation, search

Working Group Leadership

Role Names
Project Leads Mike Molter, Majdoub Haloui
Industry Max Kanevsky, Hany Aboutaleb
FDA Doug Warfield
Steering Committee Liaison Anne Russotto
Project Manager Lisa Lyons

Dec 2, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Mike Molter, Minutes: Majdoub Haloui


  1. Discuss where we are with the review of our Best Practices document.
  2. Discuss the future of this group, and what tasks we should take on next.


Discussion


The Best Practices document that we’ve worked on for so long is now on the wiki and open for public comment until December 31, 2013. Here’s the link: http://www.phusewiki.org/wiki/index.php?title=Data_Sizing_Best_Practices_Recommendation. The PHuse December newsletter just went out and contains a blurb about this document, as well as a link to the site and an email address for comments. We’ll be accepting feedback through the end of the year.

We also discussed future projects. Anne suggested that if we decide to continue with this group, that rather than continuing with the same broad scope that has been around for a long time, to select specific issues coming out of specific OpenCDISC checks. This will require an assessment of issues that are still relevant among those that were developed under earlier versions of OpenCDISC.
The agenda for our next meeting will be to discuss ideas for our next project based on what we find out between now and then. So if you have just a few minutes, please give this some thought and come ready for discussion.

Action Items

Best Practice Document

  1. (ALL) Please take a couple minutes and review this page. Also, in an effort to get feedback started, if anything in the document is worthy of further discussion, please start a discussion. You can do this by clicking “Discuss this page” on the left side of the page.
  2. (ALL) Spread the news among your colleagues. Send the link around and encourage feedback.
  3. (HANY) Post a blurb (below) on LinkedIn.

The FDA/PhUSE CSS Data Quality Working Group has initiated a Best Practices document to address challenges faced by FDA staff when receiving CDISC data submissions containing large data sets that are difficult to open and navigate. This document can be found at http://www.phusewiki.org/wiki/index.php?title=Data_Sizing_Best_Practices_Recommendation . The project team will be collecting feedback
on this document through the end of 2013. Feedback can be provided by clicking on the “Discuss this page” link on the left side of the above web site, or by email at CSS-DataQuality@phusewiki.org.


Future Project Ideas

  1. (LISA) For future project for the group respond to this email with a link to the wiki site where the most current version of the spreadsheet containing compliance check issues resides http://www.phusewiki.org/wiki/images/6/6b/SDTMValidationRules.xlsx
  2. (ALL) Review the current spreadsheet (Lisa to send link to it) and come to next meeting (December 16) ready to discuss ideas for next project
  3. (MIKE M) Establish contact with Doug Warfield to get an idea of important issue from FDA standpoint.

Nov 18, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Mike Molter, Minutes: Majdoub Haloui


  1. Go over the most recent changes to the document
  2. Go over FAQs.
  3. Any other changes needed for the document?
  4. Where does it go from here? Starts with the Phuse steering committee, then the wiki. Let’s have discussion on how to advertise it and how we talk with others about.
  5. Discuss our next task and 2014 (e.g. lab tests without units?). Perhaps a recap of other topics that had been discussed before we started the creation of this document.


Discussion


Team reviewed reformatted document and liked the additional background detail and flow. Mike will send document to PhUSE Steering committee for review.

Nov 04, 2013


Meeting cancelled due to CDISC Interchange

Oct 21, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Mike Molter

Agenda

1 Overview of reformatted document
2 Review process flow section
3 Review steps to take for variable re-sizing
4 Revisit whether parent and split datasets should be submitted
5 Revisit FAQ about splitting of SUPPxx and determine if verbiage on FA should be added

Discussion


Team reviewed reformatted document and liked the additional background detail and flow. The following was discussed in each section;

Process flow – Need to make it clear approach should be to always resize. Right now document gives impression only resize if necessary.
Manage length of character values – Reviewed and need to modify bullet that references setting the length of the variable across all domains. Also provide the reference to the SDTM IG 3.1.3 section 4.1.2.9 that specifies that –TESTCD should be set to 8 and that the recommendation of this team is not to automatically set to 8. In the programming transformation section Carlo suggested some wording to add to reflect hesitation. We also added an action item to bring this to CDISC for consideration.
How to handle SAS xpt files – Lots of discussion on whether we should put in a recommendation for submitting both the combined and the split datasets, but group felt we really need to get input from FDA at next meeting since there is conflicting information between various documents and recent eData posts.
What to report in define – Decided to remove the bullet about the split and un-split domains
FAQs – discussed Q. If a domain which needs to be split has an associated supplemental qualifier domain, does the supplemental qualifier domain need to be split too even if it is within size limits? Regarding whether FA should be added. Mike will update to add text around FA.
General - There was discussion whether PhUSE has the right to supersede the published standards, e.g. IG states –TESTCD should be set to length of 8, versus working group recommendation. The group will indicate recommendations the group makes and point out the differences with the standards. The document will be posted for public review and we will get input from PhUSE, industry, FDA, etc. to see their stance on our position.


Actions

Carlo will provide information on FA to determine if it should be added in splitting of SUPP-- - Done SDTM v3.1.3 section 4.1.2.9
Carlo will provide a verbiage for programming transformations to reflect hesitation.
Mike will provide updates to the sections detailed above and provide the updated document to the group prior to the next meeting.

Oct 7, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Majdoub Haloui

Agenda

1 Final review of SAS xpt size document prepared by Carol Vaughn
2 Review define.xml document prepared by Lisa Lyons


General discussion


Team discussed changing order for the document for better flow 1) Process flow for managing the recommended solutions, 2) How to manage the length of character values to avoid wasted space, 3) How to handle SAS xpt files when they exceed recommended length, 4) What to report in the define documentation.

Team finalized Carol's section on Handling SAS xpt files including the FAQs with exception of two action items. First is inclusion of FA in splitting of SUPP-- and determine if there is anything in CDISC documentation that supports including both split and non-split domains in define.

Team reviewed What to report in define documentation and moved statement about define documenation from Carols section to this section. Did not get to review the FAQ. This will be reviewed at next meeting, followed by managing character variable lengths.

Actions


Carlo will provide information on FA to determine if it should be added in splitting of SUPP--.
Anne will look to see if any source documenation to support both split and non-split domains get included in define. During meeting she identified in define v2.0 section 4 page 16.
Mike will reorganize the document based on discussion and send out draft timelines for group to finalize document and post to Wiki for comments.

Sept 23, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Mike Molter

Agenda

1 Follow-up on status of Sept 9th action items
2 Review SAS xpt size document prepared by Carol Vaughn
2 Review define.xml document prepared by Lisa Lyons


General discussion


Team reviewed Carol's section on Handling SAS xpt files when they exceed the maximum size allowed and provided feedback. Clarification was added to check with the reviewer if size is between 1 and 1.25 gb. There was no time left to review the section on What to report in define documentation. This will be reviewed at next meeting.

Actions


No actions identified.

Sept 9, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Majdoub Haloui

Agenda

1 Continue drafting best practice recommendations on variable padding and data set size. We can utilize the document that Carlo put together citing different publications and comparing what they stated about these topics (link http://www.phusewiki.org/wiki/images/a/a2/Sizing_Discussion_Document.docx).


General discussion


1 Team continued to discuss the scope and decided to divide sections among volunteers to draft recommended solution and FAQs for each of the items defined in the scope. The following individuals volunteered to address the items identified in the scope and have a draft for team review by Friday, September 13th.
  • How to handle SAS xpt files when they exceed the maximum size allowed (Carol Vaughn)
  • How to manage the length of character values to avoid wasted space within datasets (Majdoub Haloui)
  • Process flow for managing the recommended solutions (Mike Molter)
  • What to report in define.xml (Lisa Lyons)
  • Were in the eCTD module 5 to put split datasets (Carlo Radovsky)


Actions


  • Team members assigned above (Carol, Majdoub, Mike, Lisa and Carlo) draft section of document and send out for group review by September 13th.
  • All review draft sections and be ready to provide input at next meeting on September 23rd.


Aug 26, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Mike Molter

Agenda

1 Review “best practice” template and gather feedback.

2 Begin drafting best practice recommendations on variable padding and data set size. We can utilize the document that Carlo put together citing different publications and comparing what they stated about these topics (link http://www.phusewiki.org/wiki/images/a/a2/Sizing_Discussion_Document.docx).


General discussion


1 Team reviewed the best practice template and provided feedback on current structure. Since not all team members were present we are open to additional suggestions. Lisa mentioned in order to obtain public feedback on the best practice document, we will need to convert it to a wiki page rather than keep it as a Word document.

2 Team started drafting the document and discussed following assumptions;
  • Discussed where in process this recommendation should occur. Intent of document is to discuss best practice for submission purpose and not necessarily on front end
  • General consensus length in define should reflect actual data length and not the values before resizing.
  • Split dataset lengths should be the same across (resize before split).
  • Discussion needs to continue regarding if there should be exceptions on code list lengths (drive by code list max value or data driven) - to be continued.


Actions


  • Team members to review draft recommendations and add any additional detail as well as come with additional FAQs based on your expertise.


Aug 12, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Mike Molter

Agenda

1 Anne Russotto and Majdoub Haloui to discuss the CDISC Intrachange and particularly, how CDISC will support the efforts of this group.
2 Continue discussing recommendations on variable padding and data set size. Last week we went over a document that Carlo put together citing different publications and comparing what they stated about these topics. http://www.phusewiki.org/wiki/images/a/a2/Sizing_Discussion_Document.docx


General discussion


Updates from CDISC Intra-change – 7/31/13 by Anne Russotto

1 What should CDISC teams do regarding validation rules when publishing new standards?
a. Use the word “Conformance”, not validation, to describe the rules
b. CDISC teams should publish conformance requirements on their models
c. Tag the conformance rules in the IG
d. Link the tags to a separate area where all conformance rules can be seen together
e. Can use an addendum or separate document
f. Decide on a common metadata for the conformance rules. See ADaM and define.xml as examples
2. How does PhUSE work with CDISC and FDA
a. PhUSE Data Quality WG is the place where rule discussions can take place
b. CDISC will start to participate on a regular basis in PhUSE meeting and take discussion items back to their leadership team and the appropriate sub teams for discussion
c. PhUSE WG will continue to discuss industry issues around SDTM
i. If an issue applies to CDISC conformance the WG will ensure that CDISC provides a response
ii. If an issue applies to ADaM or define.xml the WG will reply to those teams
iii. If the issue applies to other issues, the WG will develop best practices until a business rule is issued from CDISC or FDA
d. FDA will continue to participate in PhUSE meeting to help develop best practices
3. Anne to provide updates as they become available

Additional Group Discussion:

1. Publish the Agenda at least week before the meeting
2. The FDA Fuse Working Group will come up with a recommendation and publish it to the wiki for public comments
3. Variable Sizing discussion:
a. No padding allowed
b. Resize before splitting
c. Can split dataset variable have two differnet lenghts? According to 3.1.2 IG variables should have the same length
d. Can Opencdis run checks on the longest variable length
e. Variable length triming (xpt transport files versus xml)

July 29, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Majdoub Haloui

Agenda

  1. Discuss status on previous action items for ID#1 Wasted Space, ID2 Dataset sizes, ID4 Rule # SD0026 Missing vlaue for --ORRESU, when --ORRES is provided and ID6 Rule # SD0003 ISO Dates
  2. Review Rules
  • ID7 Rule # SD0064 Invalid Subject,
  • ID8 Rule # SD0038 Study Day 0
  • ID9 Rule # SD0080 AE start date is after the latest Disposition date

General discussion

The entire meeting was centered around the document that Carlo put together and most of this time was spent discussing variable padding, although some was spent on data set size. Carlo’s document split the variable padding discussion into three categories:

  • variable lengths restricted by CDISC constraints, which are restricted by V5 transport file constraints;
  • variables subject to controlled terminology;
  • and everything else.

Much discussion was around different methods and considerations for assigning variable lengths, and their pros and cons. Questions:

  • Should all character fields be truncated to the maximum number of characters found in data?
  • Should the length of variables subject to controlled terminology be set to the length of its longest allowable value?
  • Can variables whose maximum length is explicitly stated by CDISC (e.g. –TESTCD) always be set to that max length, or should their length also be driven by data?
  • Should split domains use consistent lengths across data sets or should lengths be truncated according to longest value in the data set?
  • What lengths should go in define.xml?

Opinion was divided on many of these questions. FDA doesn’t use the length stated in the define.xml so there was no strong need to make sure this is in sync with the final data set lengths. FDA has tools for handling split domains that have different lengths for the same variable in different data sets. Complexity introduced to industry processes was also discussed.

It was decided that we the group wanted to hear more industry opinions. For that reason, we would ask others to attend the next meeting prepared to discuss the issue further. At the end of the day, the group needs to make a recommendation and take measures to make sure that all the documentation mentioned in Carlo’s document is updated appropriately and consistently.

We did not get to discuss ID4 Rule # SD0026 Missing vlaue for --ORRESU, when --ORRES is provided and ID6 Rule # SD0003 ISO Dates or additional rules in the agenda.

ACTION: Read Carlos document and come prepared to discuss the issues and make recommendations http://www.phusewiki.org/wiki/images/a/a2/Sizing_Discussion_Document.docx

July 15, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Mike Molter

Agenda

  1. Discuss previous action items for ID#1 Wasted Space, ID2 Dataset sizes, ID4 Rule # SD0026 Missing vlaue for --ORRESU, when --ORRES is provided
  2. Review ID6 Rule # SD0003 ISO Dates
  3. Time permitting review ID7 Rule # SD0064 Invalid Subject, ID8 Rule # SD0038 Study Day 0

General discussion

ID1 Wasted Space/ID2 Dataset Sizes: There was some confusion on what was needed to document regarding ID#1 Wasted Space and ID#2 Dataset Sizes. It was decided to document the gaps in the CDER Common Issues document, note what the issues are and provide recommendations made by the group. Carlo Radovsky will draft and provide for the group to review.

ID4 Rule # SD0026 Missing vlaue for --ORRESU, when --ORRES is provided: Group reviewed the list provided by Anthony Chow and Carol Vaughn and agreed the list was not all inclusive, but a good starting point for a recommendation of tests that do not need units, but we need more input from others, not just in Lab but other domains and other therapeutic areas. Hany suggested to send an email to the CSS-WG-Data-Quality lisbox requesting a list of results that do not have units.

ID6: Rule # SD0003 ISO Dates: There seem to be confusion why sponsors would be populate date fields with duration and some felt the IG is unclear on the issue. Anthony Chow volunteered to provide a write up on the issue for the group to review.

Not enough time to discuss item 3 on agenda. Will discuss at next meeting.

Action Items

  • Carlo Radovsky - Draft recommendation for group review for ID1 Wasted Space and ID2 Dataset Sizes.
  • Anthony Chow - Draft write up to provide clarity on ID6 Rule # SD0003 ISO Dates for next group discussion.
  • Hany Aboutaleb - Send an call for information for ID4 Rule # SD0026 Missing vlaue for --ORRESU, when --ORRES is provided for lab tests with known missing units to the CSS-WG-Data-Quality listbox.

July 1, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Lisa Lyons (Mike Molter, Majdoub Haloui on vacation)

Agenda

  1. Discuss recent meeting with the PhUSE Steering committee/FDA regarding the focus and model of the SDTM Validation Rules Project
  2. Revisit Actions based on refocus of project for ID#1 Wasted Space, ID#2 Dataset sizes, and ID#4 SD0026 Missing value for --ORRESU, when --ORRES is provided
  3. Time permitting review ID6 Rule # SD0003 ISO Dates, ID7 Rule# SD0064 Invalid Subject, ID8 Rule #SD0038 Study Day 0

General discussion

Concerns were raised that some of the action items were written in a way that appeared the work group was directing regulatory decision making and that was not the intent. Most of the action items to date were for FDA or OpenCDISC and the intention of this workgroup was for industry to bring problem validation checks with proposed solutions that they had identified as well as identify new or needed checks.

It was discussed that many of the action items were due to the need for clarification on many of the issues that were carried over from the 2012 projects (CBER TOP 20 and CAB/VP). The CBERTOP20 was based on legacy data conversions and older versions of the OpenCDISC checks so the decision was to remove all the checks from those two project and just focus on the new items discussed at the March conference as a starting point. The project will ensure focus remains as an industry led model.

Based on the concerns raised three items previously discussed had to be revisited, ID#1 Wasted Space, ID#2 Dataset sizes, and ID#4 SD0026 Missing value for --ORRESU, when --ORRES is provided.

  • ID1: Wasted Space - Need a volunteer to draft a document to communicate to FDA the lack of clarity around padding data for their evaluation - No one volunteered, Lisa will reach out to members who could not join to see if anyone is interested.


  • ID2: Dataset Sizes - Need a volunteer to draft a document to communicate the gaps between CDER Common Data Standards Issues and Study Data Specifications and provide a summary of the issues with the current language for their evaluation. - No one volunteered, Lisa will reach out to members who could not join to see if anyone is interested.


  • ID4: Rule # SD0026 Missing value for --ORRESU, when --ORRES is provided. Group discussed at previous meeting having common tests without units excluded from check will reduce the amount of false positives, but we need examples. Anthony Chow and Carol Vaughn volunteered to provide a list for the group to review at the next meeting. A proposal can be sent to OpenCDISC forum for evaluation.



Not enough time to discuss item 3 on agenda. Will discuss at next meeting.

Action Items

  • Anthony/Carol - provide list of lab tests without units for next group discussion.
  • Lisa to reach out to members who did not join to try to get a volunteer to draft a document for ID1, ID2 above.

June 3, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Mike Molter, Majdoub Haloui

Agenda

  1. Revisit OpenCDISC previous Action Items
  2. Continue reviewing rules

General discussion

Reviewed following ID #s from the SDTM Validation Rule Spreadsheet. See SDTM Validation Rule spreadsheet for details and actions.

  • ID6/ID7: Rule # SD0026/SD0029 Missing value for –ORRESU/--STRESU, when –ORRES/--STRESC is provided
  • ID13: Rule # SD0070 No Exposure record found for subject
  • ID15/ID16: Rule# SD0087/SD0088 RFSTDTC/RFENDTC is not provided for a randomized subject


Action Items

May 6, 2013


When: 02:00pm-3:00pm (EST) , Bi-weekly Mondays
Place: Teleconference
Facilitator: Mike Molter, Majdoub Haloui

Agenda

  1. Revisit OpenCDISC previous Action Items
  2. Continue reviewing rules

General discussion

Reviewed following ID #s from the SDTM Validation Rule Spreadsheet. Note: Revisited items ID1, ID2 since we had OpenCDISC representation.

  • ID1: Wasted space/padded variables
  • ID2: Dataset size
  • ID3: Duplicate USUBJID/DECOD/--STDTC
  • ID5: Rule # SD0006 No baseline result in [Domain] for subject
  • Following rules were discussed together due to similar issue;
  • ID7: Rule # SD0029 Missing value for --STRESU, when --STRESC is provided
  • ID8: Rule # SD0031 Missing values for --STDTC and --STRF, when --ENDTC or --ENRF is provided
  • ID10: Rule # SD0035 Missing value for --DOSU, when --DOSE, --DOSTXT or --DOSTOT is provided
  • ID11: Rule # SD0046 All values of Qualifier Variable Label (QLABEL) should be the same for a given value of Qualifier Variable Name (QNAM)
  • ID12: Rule # SD0063 SDTM/dataset variable label mismatch


Action Items

Following ID's discussed at the meeting (ID1, ID2, ID3, ID5, ID7, ID8, ID10, ID12) need to be revisited with OpenCDISC/FDA at next meeting

ID11: Mike/Majdoub follow-up with Ann Russotto to see who she sent ID11 to at SDS and see if any progress has been made on clarifying IG.

April 22, 2013


When: 02:00pm-3:00pm (EST) , Bi-Weekly Monday
Place: Teleconference
Facilitator: Mike Molter, Majdoub Haloui

Agenda

  1. Introductions
  2. Review action items discussed at March conference and follow up on status

General discussion

Introductions:

  • Introduced new co-leads and provided overview of how future meetings will be run
  • Meeting invitee quick introduction

Review of Conference Action items

  • Reviewed following ID #s from the SDTM Validation Rule Spreadsheet
  • ID1: Wasted space/padded variables
  • ID2: Dataset size
  • ID4: Rule ID # SD0002 (NULL Values in variables marked as required)
  • ID6: Rule ID# SD0026 (Missing value for --ORRESU, when --ORRES is provided)

Action Items

  • ID1:
  • Working Group to communicate to FDA lack of definition around padding data and summary of issue for their evaluation.
  • OpenCDISC needs to update check based on decision regarding no padding of data
  • ID2:
  • Working Group to communicate gaps between CDER Common Data Standards Issues and Study Data Specifications and provide a summary of the issues with the current language for their evaluation.
  • Majdoub needs to update the SDTM Validation Rule spreadsheet to not give impression sponsors should be applying the resize macro avail from FDA. Change needs to be "sponsors should resize the data first".
  • ID4:
  • Working Group requests examples from FDA regarding this issue so the industry can discuss solutions.
  • ID6:
  • Working Group requests that FDA provide a list of variables that have the issue so the industry can discuss possible solutions.

Follow up


Last revision by LisaLyons,12/2/2013