using hash to determine whether 2 dataframes are identical (PART 02)

2023-04-03 00:59 问答作者：

I refer to the question I asked yesterday and have followup questions:

Since I realize the difference of the 2 dataframes are caused by the ordering of the rows, I added the following:

ddd.old <- 开发者_Go百科ddd.old[order(ddd.old[,"adm_route"]),]
ddd.old <- ddd.old[order(ddd.old[,"ddd"]),]
ddd.old <- ddd.old[order(ddd.old[,"atc_code"]),]
ddd.old <- data.frame(ddd.old,stringsAsFactors=FALSE)

ddd.new <- ddd.new[order(ddd.new[,"adm_route"]),]
ddd.new <- ddd.new[order(ddd.new[,"ddd"]),]
ddd.new <- ddd.new[order(ddd.new[,"atc_code"]),]
ddd.new <- data.frame(ddd.new,stringsAsFactors=FALSE)

Which gives me something like this:

> digest(ddd.old)
[1] "e76d3d519f3a8c066597654ae312d68d"
> digest(ddd.new)
[1] "813a68bde6840e9798db771272584e7c"
> all.equal(ddd.old, ddd.new,check.attributes=TRUE)
[1] "Attributes: < Component 2: Mean relative difference: 0.006306306 >"

Two questions:

why digest still fails?
what does the output for all.equal means?

all.equal tells you that attributes are different. I guess that are row names.

Check attributes(ddd.old)[[2]] vs attributes(ddd.new)[[2]]. Sorting don't change row names so you got them in different order.

You could wipe out them by:

rownames(ddd.old) <- NULL
rownames(ddd.new) <- NULL

Or step earlier by adding argument to data.frame:

ddd.old <- data.frame(ddd.old, stringsAsFactors=FALSE, row.names=NULL)

After that hash should be equal too.

Alternatively use arrange from plyr package it will remove rownames:

ddd.new <- read.table("ddd.table.new.txt",header=TRUE,stringsAsFactors=FALSE)
ddd.old <- read.table("ddd.table.old.txt",header=TRUE,stringsAsFactors=FALSE)

ddd.new <- arrange(ddd.new, atc_code, ddd, adm_route)
ddd.old <- arrange(ddd.new, atc_code, ddd, adm_route)
all.equal(ddd.new, ddd.old)
# TRUE

继续阅读：hash

using hash to determine whether 2 dataframes are identical (PART 02)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？