Merging Data Frames in R by Row Names

3 minute read

I was trying to read in different features for a cell line, e.g., mutations and expression from separate files and reduce them to a single design matrix.

Major Lessons

Merge Data Frames

I needed to concatenate two data frames which had row names. The regular method is cbind which just put the data frames beside each other ignoring the row names. So, if you have different row orders, you get the wrong answer. The way around it is the merge function which accepts a by= argument.

Merge is a binary operator which accepts only data frames. So if you have used operations like transpose which gave you matrices, you need to read them as data frame into the merge using merge(as.dataframe(t(A)), as.dataframe(t(B))). There are more useful arguments like sort and suffixes:

> A
       one    two  
second "amir" "mom"
first  "eli"  "dad"
> B
       one two three four five
first    1   3     5    7    9
second   2   4     6    8   10
> C <- merge(as.data.frame(B), as.data.frame(B), by="row.names", sort=TRUE, suffixes = c("-string", "-integer"))
> C
  Row.names one-string two-string one-integer two-integer three four five
1     first        eli        dad           1           3     5    7    9
2    second       amir        mom           2           4     6    8   10

As you see, the major problem with merge is that it leaves the merge/joint key as a column in the data.frame. I assume that this is what we want if the merge key is some feature (column) in the data frames, but when you use row names as the merge key, that generates an extra column which you need to remove or put back as the row names:

> row.names(C) <- C[["Row.names"]] 
> C[["Row.names"]] <- NULL 
> C
  one-string two-string one-integer two-integer three four five
1        eli        dad           1           3     5    7    9
2       amir        mom           2           4     6    8   10

Leave a Comment