Note: If a PI does not have multiple individual projects under the same topic, the ‘Common’ folder can be skipped. However, the parent directory should still start with ‘Study Name’.

\Study Name

This should be the PI’s last name and the ‘study name’, separated by underscores (no spaces). The study name should be general instead of project specific since certain topics under a PI can have multiple projects.

Example: “Hu_PIRADs”

\Common

This folder should contain all common files across the individual projects.

\Data

\Source

Original, unchanged common datasets amongst individual projects. This folder should only include source datasets which are common across the individual projects. This is only applicable when the individual projects all work from the same parent dataset/s. In the scenario that the individual projects are related by theme, but not because they work from the same dataset/s, datasets should be stored under the respective project folders. If some individual projects end up sharing datasets, the datasets should be kept under the individual project folders for consistency. This folder is only for studies for which all individual projects work from the same parent dataset/s.

\Derived

This folder should include the constructed datasets from the common source data after data manipulation/cleaning. Give meaningful file names for derived datasets and date the files for the date they are created. If using version control, the date is not needed, but be careful.

\Documentation

All documentation related to the common data should go here.

Ex.: Codebook, Study instruments

A codebook ideally should reference each dataset and include variable names, labels, and explanations. If a variable is continuous, it should mention the range. If a variable is categorical, all levels should be identified. (i.e. 1=‘Male’, 2=‘Female’).

Reference to creating a good codebook:

http://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/CodebookCookbook/CodebookCookbook.pdf

\Syntax

Any SAS, SPSS, STATA, R scripts, etc. that clean the common source data and produce derived data files. Should be well commentated and include necessary information: Name of who produced, date produced, software used, and version of software used. Use version control such as git for syntax files.

\Log

This should include a work log for the data manipulation/cleaning of the common source data. The log should be detailed. It should have dated entries describing tasks performed and why. Details from correspondences such as emails, phone calls, and meetings should be referenced here. Ideally, a universal log template should be used.

** Version control is great for keeping a running log of changes made to files being tracked; however, a manual log in a predefined template format is a very useful tool to have as a snapshot of the entirety of the research process.

\Project Specific Name

This should have the PI’s last name and the ‘project specific name’, separated by underscores (no spaces). This should be project specific instead of general. There should be a separate ‘Project Specific Name’ folder for each individual project.

Example: “Hu_Predicting_CSPC_in_PIRADs_3”

\Data

\Source

This should include the original, unchanged source datasets specific to this project.

\Derived

This folder should include the constructed datasets from the project-specific source data after data manipulation/cleaning. Give meaningful file names for derived datasets and date the files for the date they are created (or use version control, carefully).

\Documentation

Any documentation related to the specific project.

Ex.: Codebook, Study instruments, journal articles

Reference to creating a good codebook:

http://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/CodebookCookbook/CodebookCookbook.pdf

(May be very helpful to share with investigators prior to data collection!)

\Syntax

Any SAS, SPSS, STATA, R scripts, etc. that produce output/reports. Should be well commentated and include necessary information: Name of who produced, date produced, software used, and version of software used. Use version control such as git for syntax files.

\Log

This should include a work log for the data manipulation/cleaning of the project specific source data along with all details relevant to the project analysis. The log should be detailed. It should have dated entries describing tasks performed and why. Details from correspondences such as emails, phone calls, and meetings should be referenced here. Ideally, a universal log template should be used.

\Reports

This should include all formal reports sent to investigators.

Reports should always include necessary information (same as Syntax files): Name of who produced, date produced, software used, and version of software used. This information should be automated when possible.

For R Markdown Users:

This code can be used in the YAML to send RMD output to a different folder and to add a date to the report name:

knit: (function(inputFile, encoding) {
     out_dir <- "../../Reports";
     sd=format(Sys.Date(), "%Y-%m-%d");
     out_file <- paste("descriptives_",sd, ".html", sep = "");
     rmarkdown::render(inputFile,
                       encoding=encoding,
                       output_file=file.path(dirname(inputFile), out_dir, out_file)) })

\Output

This should include any output produced except for the formal reports to investigators. For example, sometimes investigators need a figure in high resolution, or sometimes they want a table formatted in a specific way for the manuscript. These items can go here.

\End_Products

This folder should include the end products of the analysis: poster presentation, powerpoint, abstract, manuscript, response to reviewers, etc. End products should be linked up with the relevant syntax files/reports if possible.

\Scratch

This can be used for miscellaneous items. For instance, you may want to check the results of a function in SAS compared to R. This is not integral to the analysis, but should be kept somewhere.

Session Information:

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 16299)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.5.1  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
##  [5] tools_3.5.1     htmltools_0.3.6 yaml_2.2.0      Rcpp_0.12.19   
##  [9] stringi_1.1.7   rmarkdown_1.10  knitr_1.20      stringr_1.3.1  
## [13] digest_0.6.18   evaluate_0.12