check_recode {finalfit} | R Documentation |
This was written a few days after the retraction of a paper in JAMA due to an error in recoding the treatment variable (https://jamanetwork.com/journals/jama/fullarticle/2752474). This takes a data frame or tibble, fuzzy matches variable names, and produces crosstables of all matched variables. A visual inspection should reveal any miscoding.
check_recode( .data, dependent = NULL, explanatory = NULL, include_numerics = TRUE, ... )
.data |
Data frame or tibble. |
dependent |
Optional character vector: name(s) of depdendent variable(s). |
explanatory |
Optional character vector: name(s) of explanatory variable(s). |
include_numerics |
Logical. Include numeric variables in function. |
... |
Pass other arguments to |
List of length two. The first is an index of variable combiations. The second is a nested list of crosstables as tibbles.
library(dplyr) data(colon_s) colon_s_small = colon_s %>% select(-id, -rx, -rx.factor) %>% mutate( age.factor2 = forcats::fct_collapse(age.factor, "<60 years" = c("<40 years", "40-59 years")), sex.factor2 = forcats::fct_recode(sex.factor, # Intentional miscode "F" = "Male", "M" = "Female") ) # Check colon_s_small %>% check_recode(include_numerics = FALSE) out = colon_s_small %>% select(-extent, -extent.factor,-time, -time.years) %>% check_recode() out # Select a tibble and expand out$counts[[9]] # Note this variable (node4) appears miscoded in original dataset survival::colon. # Choose to only include variables that you actually use. # This uses standard Finalfit grammar. dependent = "mort_5yr" explanatory = c("age.factor2", "sex.factor2") colon_s_small %>% check_recode(dependent, explanatory)