Difference between revisions of "Validation in epidemiological studies"

From PHUSE Wiki
Jump to: navigation, search
(Validation in epidemiological studies)
((HEADER 1))
Line 38: Line 38:
 
The following chapers will brief discuss rverification and validation, ways to verify and validate programming in epidemiology.
 
The following chapers will brief discuss rverification and validation, ways to verify and validate programming in epidemiology.
  
== (HEADER 1) ==
+
== first data selection ==
This is a main topic in the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body.
+
Programming based on claims data you have in addition to the usual consider the run time on the large tables.
 +
Find a simple first step to reduce your data. Use a piece of programming which simply trace once over the entire data and subset it as much as possible. Thus you will gain the following advantages:
 +
* Simple and fast reviewable reduction of data.
 +
* Easy traceable first time critical step. Even if you want do double program on the whole set you have a good chance to meet the first programming. You may compare this data before further processing with double programming.
 +
* A better chance to obtain reuseable program code.
 +
* A better chance to end up with some standard access types.
  
 
If you need to include source code:
 
<!-- you should have a blank line prior and after each code section.
 
You also should add two blanks in the start of each line if you use code -->
 
 
<syntaxhighlight lang="sas">
 
  data one;
 
  set two;
 
  if mix(var1, var2) > 0 then do;
 
    a=25;
 
  end;
 
  run;
 
</syntaxhighlight>
 
 
Continuation of body – after source code.
 
  
 
=== SUBHEAD (HEADER 2) ===
 
=== SUBHEAD (HEADER 2) ===

Revision as of 10:54, 24 September 2012

Validation in epidemiological studies

Raimund Storb

ABSTRACT

Provide objective evidence that a program fulfill the requirements whilst running on huge datasets with mixed quality might become a challenge. E.g. the quality of data may lead to inopportune efforts for double programming; unforeseen exceptions in the data may cause this. Beforehand sample drawing may help reducing run time whilst programming and may also help to limit exceptions. To address all issues with certain relevance we have to define a proper sample size. An alternative approach is to define artificial test data. Thus the correct result is defined. Double programming (of the analysis datasets) would be obsolete. However, some of the data issues may not be addressed regardless of their relevance. But keep in mind that a proper Data Definition Table is the first step to ensure that is done what was intended in your analysis plan.

INTRODUCTION

At pressent there is a lot of experience how to handle the need of validating statistical programming in clinical trials and submissions, abstracts based on clinical trial data, signal detection based on clinical trial data etc. The well controlled quality of data, the explanation of the data obtained from an annotated case report form(s), study protocol(s) together with a defined analysis described in protocols and analysis plans lead to a clear understandig what has to be programmed and hopefully to a Data Definition Table defining all the derivations to be done. There are several possible ways to verify the correctness and validity of programming and there is also the obligation to do so in clinical trial reporting.

Analysing claims databases is differend because

  • The data is like it is delivered, no queries are possible. Data has to be used like it is delivered.
  • The data is like it is used/collected in real live, to serve the needs of real live but not for needs of analyses.
  • You may have to work with "links by meening" in stead of links established by design of the database. For example you may link an observed claim from a pharmacy with one observed claim from a practioneer
  • The amount of data you have to deal with at the begin of your program/analysis depends on the organisation/purpose the data is collected for but not on any kind of estimate of statistical power.
    • it may happen that at the end there are too few usable observations left
    • depending on the database and table you are working with the number of observations may be beyond the millions. This will have an impact on run time and need of computational power.
    • If there is the possibility of an certain inconsistence/error in the way the data is collected it will be there.
  • The data collection is not done by some traine sides and a central laboratory but by thousends of doctors, hospitals, laboratories and pharmacies. All entering the data at the their knowledge.

What are the consequence for verification and validation of programming?

  • You may have to distinguish between (primary) data selection, derivation and tabulation of data.
  • Set your focus on the logic you have to implemtend.
  • Consider which data exception are worth to pay your attention on.
    • Any single data issue in observatiosn that most likely will not contribute to your results are worthless your attention.
  • Consider the run time when you decide how to verify and validate your programming.
  • Consider to reuse proofen program code. You may decide to validate often used program code and macros
  • consider to develop and verify only on a subset of data. Given a proper sample most importand data issues shall be present.

The following chapers will brief discuss rverification and validation, ways to verify and validate programming in epidemiology.

first data selection

Programming based on claims data you have in addition to the usual consider the run time on the large tables. Find a simple first step to reduce your data. Use a piece of programming which simply trace once over the entire data and subset it as much as possible. Thus you will gain the following advantages:

  • Simple and fast reviewable reduction of data.
  • Easy traceable first time critical step. Even if you want do double program on the whole set you have a good chance to meet the first programming. You may compare this data before further processing with double programming.
  • A better chance to obtain reuseable program code.
  • A better chance to end up with some standard access types.


SUBHEAD (HEADER 2)

This is subtopic for the above. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body.

If you need to include source code:

  data one;
  set two;
  if mix(var1, var2) > 0 then do;

Continuation of body – after source code.

(HEADER 1)

This is a main topic in the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body.

If you need to include source code:

  data one;
  set two;
  if mix(var1, var2) > 0 then do;

Continuation of body – after source code.

SUBHEAD (HEADER 2)

This is subtopic for the above. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body.

If you need to include source code:

  data one;
  set two;
  if mix(var1, var2) > 0 then do;

Continuation of body – after source code.

SUBHEAD (HEADER 2)

This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body.

If you need to include source code:

  data one;
  set two;
  if mix(var1, var2) > 0 then do;

Continuation of body – after source code.

SUBHEAD (HEADER 2)

This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body.

If you need to include source code:

  data one;
  set two;
  if mix(var1, var2) > 0 then do;

Continuation of body – after source code.

SUBHEAD (HEADER 2)

This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body. This is the paper body.

If you need to include source code:

  data one;
  set two;
  if mix(var1, var2) > 0 then do;

Continuation of body – after source code.

CONCLUSION (HEADER 1)

The conclusion summarizes your paper and ties together any loose ends. You can use the conclusion to make any final points such as recommendations predictions, or judgments. REFERENCES (HEADER 1)

References go at the end of your paper. This section is not required.

ACKNOWLEDGMENTS (HEADER 1)

Acknowledgments go after your references. This section is not required.

RECOMMENDED READING (HEADER 1)

Recommended reading lists go after your acknowledgments. This section is not required.

REFERENCES

www.phuse.eu [1]