Demystifying Define-XML Codelist Handling for Nonclinical Studies
- 1 Project Outcomes
- 2 Project Overview
- 3 Project Leads
- 4 Project Members
- 5 Project Updates
- 6 Objectives and Timelines
- 7 Project Activities
- 8 Published Resources
- 9 Definitions
- 10 Questions about Define.xml Codelists
- 11 Use Cases for Codelists in a Nonclinical Define.xml
- 12 Meeting Minutes
- 13 Archived Content
The team has identified 5 priority questions about codelists that the team will research and provide 'best practice' recommendations; those questions are listed below.
- What terms should be included in a codelist? What are the criteria for determining what terms are included in a codelist?
- What variables, other than those that have associated CDISC codelists in the SENDIG, must/should have user-defined codelists in define.xml? What are the criteria for determining this?
- How is the define.xml file currently being used in nonclinical (by FDA, Sponsors, CROs, system vendors)? What are possible uses (and thus valid reasons for providing codelists in define.xml)? What guidance for industry for their use?
- When should a Display Value (Decode) be included with each codelist term? What should be in the Decode?
- When and how can you use a codelist for multiple variables?
Once best practices for each question have been agreed to, each question/recommendation will be reported on the [Nonclinical Implementation FAQ, Define.XML Section.] The questions with recommendations are shown above in bold text.
The goal of this project is to provide recommendations for specific problems/questions encountered, when fulfilling the define.xml codelist section for a nonclinical study.
The project will be initiated by defining a specific list of issues/concerns/questions/challenges related to nonclinical codelist in define.xml to be explored.
It will then include an exploration of published general information (clinical or nonclinical) about implementing codelists in define.xml and relating this to specific challenges expressed by nonclinical data stakeholders. It will also include exploration of how codelists are used by the FDA as a consumer of the nonclinical define.xml file.
The result of this project is expected to be recommendations to the public on best practices for fulfilling codelist section of the define.xml for SEND study submissions.
Deliverables are expected to be poster(s) for the 2018 CSS; published recommendation on the SEND Implementation Wiki and a white paper may be warranted as well.
This project will not include education in the define-xml standard, generally how to create define.xml files, or any survey or analysis of define-xml tools. It will focus solely on codelist content. Participants should have a working knowledge of the define-xml standard.
|Anne Cecilie Engsig||Nordiskfirstname.lastname@example.org|
|Audrey Harris||Charles River||Audrey.Harris@crl.com|
|Christy Kubin||MPI Researchemail@example.com|
|Gitte Koch Ullman||Novo Nordisk||GKU@novonordisk.com|
|Susie Lendal Antvorskov||Data Standards Decisionsfirstname.lastname@example.org|
The following questions and recommendations have been posted to the [Nonclinical Implementation FAQ, Define.XML Section.]:
In a define.xml CodeList, when should a Display Value (Decode) be included with a term?
Define.xml 2.0 codelists can include either EnumeratedItem or CodeListItem. Use of CodeListItem allows both a term and its display value (decode) to be included. All terms in a CodeList must use either EnumeratedItem or CodeListItem.
- Use EnumeratedItem elements in a CodeList when the terms themselves are sufficient for data interpretation.
- Use CodeListItem elements with a Decode in a CodeList when a decode facilitates data interpretation - when the code value is an abbreviation, acronym or short code that represents a word or phrase.
In a define.xml Codelist, what should be in the Display Value (Decode) entry for a term?
Decode for a CodeListItem element should contain the following: When the coded value has a definition (decode) in a paired variable in the data (whether or not the variables used CDISC Controlled Terminology), use the value of the paired variable in the decode. For example, value of LBTESTCD has its decode in LBTEST. While this is not a recommendation on which variables should have codelists, the instances of paired code/decode variables in SENDIG 3.0 are ARMCD/ARM, CLTESTCD/CLTEST, ETCD/ELEMENT, PCTESTCD/PCTEST, QNAM/QLABEL, SETCD/SET, --TOXGR/--TOX, --TPTNUM/--TPT.
When the coded value does not have a decode in a paired variable in the data:
- if the coded value is CDISC Controlled Terminology, use the value in the "NCI Preferred Term" column of the published Controlled Terminology version in the associated datasets
- if the coded value is NOT CDISC Controlled Terminology and the coded value is in an associated study report table, there is likely a key or footnote or other explanatory text in the study report tables that include the code value. The decode should match the information explaining the value in the study report table. If this information is not in the study report, discuss this with the report author.
- if the coded value is NOT CDISC Controlled Terminology and the coded value is NOT in an associated study report table (for example, if it is metadata collected but not reported, or if it is from SOP information associated with the study), the decode should contain a short unambiguous word or phrase that explains the coded value.
See SENDIG 3.1 Section 4.3.4 regarding use of coded result values in SEND datasets.
Note that some sponsors have received a comment from the FDA that the decode should not contain the full definition of a term.
Objectives and Timelines
List proposed project deliverables and timelines.
|Call for Participation||13 Sep 2017|
|Kick-Off Meeting||29 Sep 2017|
|Establish list of issues to be addressed||10 Nov 2017|
|Identify Research Material||Timeline 4|
|Establish subteams to identify potential recommendations for each issue||Timeline 5|
|Full Team Review/Acceptance of all recommendations complete||Timeline 6|
|Coordinate with SEND Implementation Wiki subteam for publication of recommendations||Timeline 7|
|Complete Poster for 2018 PhUSE CSS||Timeline 8|
This section can document project activities or serve as a jumping off point to other pages in the project.
CDISC Define-XML Specification Version 2.0
CDISC Define.xml Implementation Wiki - Working with Controlled Terminology
FDA Study Data Business Rules
Note specifically FDAB035 The definition of datasets, variables, and codelists in define.xml should reflect the actual study data.
Understanding the define.xml File and Converting It to a Relational Database
Lex Jansen, Octagon Research Solutions, Wayne, PA
SAS Global Forum 2010
PharmaSUG 2016 - Paper DS16 Codelists Here, Versions There, Controlled Terminology Everywhere
Shelley Dunn, Regulus Therapeutics, San Diego, California
From the November 2017 Study Data Technical Conformance Guide: "For variables for which no standard terms exists, or if the available terminology is insufficient, the sponsor should propose its own terms. The sponsor should provide this information in the define.xml file and in the SDRG."
From CDISC Define-XML Specification Version 2.0, Section 4.3
The term “Controlled Terminology” in the context of a study refers to the set of all allowable values across all variables that have finite sets of allowable values in the study. A “Codelist” is a unique subset of the controlled terminology to which one or more variables are subject. Beginning with SDTM Version 1.2, the SDTM-IG requires controlled terminology for many SDTM variables. For some variables, sponsor-specific controlled terminology is recommended. All controlled terminology used in a study must be provided within the Define-XML document. Each codelist referenced by a study item shall be represented in the Define-XML document using a CodeList element.
Questions about Define.xml Codelists
A. General Questions to Address
1. How is the define.xml file currently being used in nonclinical (by FDA, Sponsors, CROs, system vendors)? What are possible uses (and thus valid reasons for providing codelists in define.xml)?
2. What variables, other than those that have associated CDISC codelists in the SENDIG, must/should have user-defined codelists in define.xml? What are the criteria for determining this?
3. What are the criteria for determining what terms are included in a codelist? Is the criteria different when a published codelist is referenced?
- When are all possible terms included? Consider scoring scales such as are used in FOBs, dermal and ocular observations; pH scale; semi-quantitative urinalysis results; severities that use CDISC CT like MISEV; severities that do not use CDISC CT such as CLSEV; subsets of the NY codelist.
- When are only terms used on the study included?
- If some codelists contain all possible terms (rather than just the terms used on the study) should the rationale be documented somewhere (define.xml comments and/or nSDRG)?
- Is there a case where only user-defined extensions of CDISC terminology are included, and published CDISC terms are excluded?
- Should the codelist included in define.xml align with the menu selections available during data collection (which would be the "allowed" terms?
- Is a free-text term added during collection represented any differently than a selected from a menu during collection?
- Can you include user-define terms that are used on the associated study, but may be used on a different study, in a codelist? It may be that it is more convenient when creating the define.xml to always include a specific list of terms whether or not they were used on the study.
- When should user defined codelists (nonCDISC codelists) be included in the define file?
- Are the criteria "shoulds" or "musts"?
4. How should published CDISC CT codelists associated with multiple variables be referenced in define.xml (UNIT, NY for example)? More specifically, when can a codelist be shared across multiple variables and when should a unique codelist be referenced?
- For example, if body weights, body weight gains and food consumption are all in grams, can BWORRESU, BWSTRESU, BGORRESU, BGSTRESU and FWORRESU all reference a codelist UNIT_G that contains only "g"?
- If the protocol list of required tissues for gross and micro are different, should there be two different codelists referenced, each with only the list of required tissues for the type of observation? If so, how are required tissues related to a specific sex and death status represented? How are tissues added to the required tissue list (due to findings present) represented? Since the CDISC SPEC codelist contains tissues for any species and sex, presumably the entire SPEC codelist can never be used on a stud?
- When a variable entry contains more than one value (separated by a semi-colon) do you include two entries in the codelist? or one entry with both terms?
5. Is there any specific naming convention that should be used for codelists? Are there any restrictions on naming (special characters)?
- for codelists that contains a subset of published controlled terminology?
- for user-defined codelists
6. When should a Display Value (Decode) be included with each codelist term? Codelist entries that have a decode are included in the define.xml with CodeListItem rather than EnumeratedItem
- When should CodelistItem be used and when should EnumeratedItem be used?
- What should be in the Decode field?
7. If a term is entered via free text during data collection (i.e., it was not an "allowable" term predefined for selection) is it represented in the codelist any different than terms that were in an allowable list?
8. Should user defined codes use the CDISC code if it exists in a future CT package, or should it use a nonstandard code? It is understood that this would still be an extended term to the CT package used to generate the dataset.
9. If abbreviations are included as findings in a report, should be abbreviations be in ORRES, then included in a codelist with a definition (decode), which would be similar to how the report is presented (abbreviations in the body of the report with a separate explanations page/key)? Or should the abbreviations be replaced with the meanings in the ORRES? What about abbreviations in modifiers that do not have controlled terminology or in comments?
10. Is there a list of know issues or help with troubleshooting Pinnacle21 checker results?
11. What is the appropriate way to indicate that no external codelists were used?
12. What guidance for industry can we give for use of information in the define.xml file that is associated with a SEND dataset?
13. Can an original term that has been translated to controlled terminology, be included in the define file to aid in traceability to the study report? Is that a decode or something else?
Use Cases for Codelists in a Nonclinical Define.xml
1. To see what codes, terms and acronyms mean.
2. To see the full scale associated with qualitative or semi-quantitative results
3. To be able to look at a unique list of user-defined terms for a variable to ensure that they have been properly harmonized (not different just based on case, not more than one with the same meaning, etc)
4. To identify extensions to published terminology
5. To provide definitions for user-defined terms (in user-defined codelists and extensions to published codelists) in human and computer-readable form rather than in an nSDRG document.
6. To check conformance, comparing the terms in a define.xml codelist with terms in a published codelist to identify errors such as failure to use a required codelist, systematic capitalization errors, or extending a nonextensible list (Elaine)
7. When codelist information is in computer-readable form, it can be integrated with views of the study data. For example, when looking at one result on a scale, it could be possible to pop-up a list of values on the scale and their meanings. Another example is that when an abbreviation is shown, it would be possible to pop-up the definition.
8. A codelist for VISITDY/--NOMDY could be used to show the schedule of data collection for a particular test or collection of tests. It would then be easier to tell if the days in the dataset matched the study plan.
29 September 2017 Minutes
13 October 2017 Minutes
27 October 2017 Minutes
10 November 2017 Minutes
8 December 2017 Minutes
05 January 2018 Minutes
02 February 2018 Minutes
02 March 2018 Minutes
16 March 2018 Minutes
13 April 2018 Minutes
27 April 2018 Minutes
25 May 2018 Minutes
06 July 2018 Minutes