LOCF

From PHUSE Wiki
Jump to: navigation, search

Introduction

LOCF or Last Observation Carried Forward is a data imputation technique that not surprisingly, carries the last assessed value forward to visits where a value is missing. The LOCF method is often used for longitudinal data which is assessed over several months or years. Despite all the effort to collect complete patient data for each single time point these data eventually contain missing values. Missing values could for example occur when a patient is lost to follow up or a patient does no show up for a single visit.

The LOCF method is also the technique which is mentioned in ICH E9 "Statistical principles for clinical trials" where it says "[...] Imputation techniques, ranging from the carrying forward of the last observation to the use of complex mathematical models, may also be used in an attempt to compensate for missing data [...]"

There are multiple ways to program the LOCF method. Some of these methods are explained in the following.

Example

The following example shows how a RETAIN statement can be used to carry forward the last observed value forward. This method works quite fine for cases where the dataset contains observations for all visits with missing values in some of the results.

DATA lab;
  INPUT usubjid $ lbtestcd $ avisitn aval;
  DATALINES;
101-1001 WBC 0 10
101-1001 WBC 1 20
101-1001 WBC 2 30
101-1001 WBC 3 . 
101-1001 WBC 4 . 
101-1001 RBC 0 100
101-1001 RBC 1 200
101-1001 RBC 2 300
101-1001 RBC 3 . 
101-1001 RBC 4 400
;
RUN;

PROC SORT DATA=lab;
  BY usubjid lbtestcd avisitn;
RUN;

************************************************;
**** When using RETAIN be absolutely sure   ****;
****  not to drag records between patients  ****;
****  or testcds.                           ****;
****                                        ****;
**** In this example keep aval and avallocf ****;
**** separate to show what is going on.     ****;
**** Generally, aval would be overwritten   ****;
**** with the LOCFed values.                ****;
************************************************;
DATA locf;
  LENGTH dtype $15;
  RETAIN retaval;
  SET lab;
  BY usubjid lbtestcd avisitn;

  IF FIRST.lbtestcd THEN retaval=.;

  IF aval NE . THEN retaval=aval;
   
  IF aval=. THEN DO;
    avallocf=retaval;
    dtype='LOCF';
  END;
  ELSE avallocf=aval;
  
  LABEL dtype = 'Derivation Type';
RUN;

If the dataset is not complete, i.e. it does not contain a complete visit structure for each patient, the dataset has to be made complete first. One could for example prepare a dummy dataset including all visits for all patients first and then merge it with the incomplete dataset.