merge_rows_omv {jmvReadWrite} | R Documentation |
Merges two .omv-files for the statistical spreadsheet 'jamovi' (https://www.jamovi.org) by adding the content of the second, etc. file(s) as rows to the first file
merge_rows_omv( fleInp = c(), fleOut = "", typMrg = c("all", "common"), colInd = FALSE, rstRwN = TRUE, rmvDpl = FALSE, varSrt = c(), usePkg = c("foreign", "haven"), selSet = "", ... )
fleInp |
Vector with file names (including the path, if required) of the data files to be read (c("FILE1.omv", "FILE2.omv"); default: c()); can be any supported file type, see Details below |
fleOut |
Name of the data file to be written (including the path, if required; "FILE_OUT.omv"; default: ""); if empty, the data frame with the added columns is returned as variable (but not written) |
typMrg |
Type of merging operation: "all" (default) or "common"; see also Details |
colInd |
Add a column with an indicator (the basename of the file minus the extension) marking from which input data set the respective rows are coming (default: FALSE) |
rstRwN |
Reset row names (i.e., do not keep the row names of the original input data sets but number them consecutively - one to the row number of all input data sets added up; default: TRUE) |
rmvDpl |
Remove duplicated rows (i.e., rows with the same content as a previous row in all columns; default: FALSE) |
varSrt |
Variable(s) that are used to sort the data frame (see Details; if empty, the order after merging is kept; default: c()) |
usePkg |
Name of the package: "foreign" or "haven" that shall be used to read SPSS, Stata and SAS files; "foreign" is the default (it comes with base R), but "haven" is newer and more comprehensive |
selSet |
Name of the data set that is to be selected from the workspace (only applies when reading .RData-files) |
... |
Additional arguments passed on to methods; see Details below |
The different types of merging operations: "all" keeps all existing variables / columns that are contained in any of the input data sets and fills them up with NA where the variable / column doesn't
exist in a input data set. "common" only keeps the variables / columns that are common to all input data sets (i.e., that are contained in all data sets).
The ellipsis-parameter can be used to submit arguments / parameters to the functions that are used for merging or reading the data. The merging operation uses rbind
. When reading the data, the
functions are: read_omv
(for jamovi-files), read.table
(for CSV / TSV files; using similar defaults as read.csv
for CSV and read.delim
for TSV which both are based upon read.table
but with
adjusted defaults for the respective file types), readRDS
(for rds-files), read_sav
(needs R-package "haven") or read.spss
(needs R-package "foreign") for SPSS-files, read_dta ("haven") /
read.dta ("foreign") for Stata-files, read_sas ("haven") for SAS-data-files, and read_xpt ("haven") / read.xport ("foreign") for SAS-transport-files. If you would like to use "haven", it may be needed
to install it manually (i.e., install.packages("haven", dep = TRUE)
).
a data frame (if fleOut is empty) with where the rows of all input data sets (i.e., the files given in the fleInp-argument) are concatenated
## Not run: library(jmvReadWrite); dtaInp <- bfi_sample2; nmeInp <- paste0(tempfile(), "_", 1:3, ".rds"); nmeOut <- paste0(tempfile(), ".omv"); for (i in seq_along(nmeInp)) saveRDS(dtaInp[-i - 1], nmeInp[i]); # save dtaInp three times (i.e., the length of nmeInp), removing one data columns in # each data set (for demonstration purposes, A1 in the first, A2 in the second, ...) merge_rows_omv(fleInp = nmeInp, fleOut = nmeOut, colInd = TRUE); cat(file.info(nmeOut)$size); # -> 10767 (size may differ on different OSes) dtaOut <- read_omv(nmeOut, sveAtt = FALSE); # read the data set where the three original datasets were added as rows and show # the variable names cat(names(dtaInp)); cat(names(dtaOut)); # compared to the input data set, we have the same variable names; fleInd (switched # on by colInd = TRUE and showing from which data set the rows are coming from) is # new and A1 is moved to the end of the list (the "original" order of variables may # not always be preserved and columns missing from at least one of the input data # sets may be added at the end) cat(dim(dtaInp), dim(dtaOut)); # the first dimension of the data sets (rows) is now three times of that of the input # data set (250 -> 750), the second dimension (columns / variables) is increased by 1 # (for "fleInd") merge_rows_omv(fleInp = nmeInp, fleOut = nmeOut, typMrg = "common"); # the argument typMrg = "common" removes the columns that are not present in all of # the input data sets (i.e., A1, A2, A3) dtaOut <- read_omv(nmeOut, sveAtt = FALSE); # read the data set where the three original datasets were added as rows and show # the variable names cat(names(dtaInp)); cat(names(dtaOut)); # compared to the input data set, the variables that were missing in at least one # data set (i.e., "A1", "A2" and "A3") are removed cat(dim(dtaInp), dim(dtaOut)); # the first dimension of the data sets (rows) is now three times of that of the # input data set (250 -> 750), the second dimension (columns / variables) is # reduced by 3 (i.e., "A1", "A2", "A3") unlink(nmeInp); unlink(nmeOut); ## End(Not run)