InformalGuidanceOnGoogleCode

From PHUSE Wiki
Jump to: navigation, search

Some informal guidance regarding the Google Code environment for Standard Script Development

  • Like Ockham or Einstein or somebody said: keep your code as simple as possible, but no simpler. Try not to dictate too much of a structure at first - let it evolve with time and developer input.
  • Keep discussions about scripts within the project site - try not to use e-mail to log questions/answers. Instead, try to use the "Issues" page which links right on the front page of "phuse-scripts."
  • Remember that those discussions, as well as any code you might contribute, could be seen by virtually anyone - including colleagues, current bosses, future bosses, etc. Be careful out there!
  • Because many users will be using a browser to read code and content, try not to make the files too long. Instead, break them up and use "include"-type programming methods.
  • There don't seem to be ANY Google Code projects which use SAS a programming language. A good example of a project which uses R is:
               http://code.google.com/p/chainladder/

Be sure to check out how this project organized itself, it's code trunk, downloads, Wiki, etc.

  • Here's a suggested technique for using the browser interface to work with code. This is offered because the author can't install SVN clients at his work PC, and his home PC doesn't have SAS.

a. When deciding to work on some script, first "select all" to get the content, then copy/paste into the scripting environment (usual, SAS or R). Edit as needed, then repeat the process to get the updated code back into Google Code (lather, rinse, repeat). Note that the "diff" command still works, you'll still be able to see differences among versions.

b. This approach doesn't handle the "checkout" function. I don't think that will derail use of Google Code, it seems unlikely that many users will be scripting simultaneously.


  • Try to keep your data in a file separate from the main code file. This way, once the source/test data is set and won't change, the work can be concentrated on the code of interest.
  • Along with the point above, use simple approaches to reading in data - approaches that can be modified to users' computing environments. Some simple suggestions are pasted below. Fancy macros in the script to do input/output may not be worthwhile if hard to understand and adapt to user-level environments.

SAS:

data kmdata ;

 infile "/bdm/myfolder/mcarniel/kmdata.csv" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
 length arm $8 censor day 8 ;
 input arm $ censor day ;
 run;


R: kmdata<-read.csv("m:/mcarniel/kmdata.csv")

  • At some point, we'll need to reach a consensus on how to write the "output/print/report" parts of the scripts. Until then, here's suggestions with which to start:

SAS:

ods pdf file="/bdm/myfolder/mcarniel/kmoutput.pdf" ; proc gplot data=kmdata ; plot censor*day=arm ;

 title1 "KM Data Bad Plot" ;
 run ;

ods pdf close ;

R:

> dev.print(device=pdf,"c:/users/mcarniel/desktop/kmoutput.pdf") > hist(kmdata$day) > dev.off()

  • We could come up with our own style guidelines, but why invent the wheel? Let's steal/borrow these in the public domain:
               http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html
               http://www.sascommunity.org/wiki/Style_guide_for_writing_and_polishing_programs
  • When depositing a non-text file into a project site (for example, fancy reports and graphs, user manuals), please consider these rules:

1. Create wiki page when possible

2. Upload PDF file when possible

3. Use a Word file when necessary (.docx is discouraged as Google Code handles those files as .zip files!)