R: Finding patterns across multiple columns- possibly duplicated()?

2023-01-24 21:48 问答作者：

I am trying to isolate entries in a dataframe which share common values: see below to reconstruct a portion of my df:

Stand<-c("MY","MY","MY","MY","MY")
Plot<-c(12,12,12,12,12)
StumpNumber<-c(1,2,3,3,7)
TreeNumber<-c(1,2,3,4,8)
sample<-data.frame(Stand,Plot,StumpNumber,Tree开发者_运维百科Number)
sample

And get an output that tells me which entries have common values. In other words, to quickly isolate situations where there is more than one TreeNumber (or more than one row) for a given Stand,Plot,StumpNumber combination. In the example code that would be that StumpNumber 3 has TreeNumber 3 and TreeNumber 4.

My understanding of duplicated() is that can find instances where duplicated values occur within a single column- what can I do to find situations where a common combination of columns occurs?

Thanks.

The Description of ?duplicated indicates that it works on rows of data.frames and the fourth paragraph of the Details section says:

 The data frame method works by pasting together a character
 representation of the rows separated by ‘\r’, so may be imperfect
 if the data frame has characters with embedded carriage returns or
 columns which do not reliably map to characters.

How did you come to understand that it only works on single columns?

Assuming TreeNumber is unique within Stand, Plot, and StumpNumber you just need to exclude it from the call to duplicated.

> duplicated(sample[,1:3])
[1] FALSE FALSE FALSE  TRUE FALSE
> duplicated(sample[,1:3], fromLast=TRUE)
[1] FALSE FALSE  TRUE FALSE FALSE

Update - If you would like all the duplicated rows, you could do something like:

> allDups <- duplicated(sample[,1:3],fromLast=TRUE) | duplicated(sample[,1:3])
> sample[allDups,]
  Stand Plot StumpNumber TreeNumber
3    MY   12           3          3
4    MY   12           3          4

For convenience, I'm going to assume you have a nesting scheme going on. So, let's say Trees are nested in Stumps, Stumps in Plots, and Plots in Stands. I also assumed the problem you're trying to solve is that some trees are attached to the same stump, which means the problematic entries are those where Stand/Plot/Stump identifiers are repeated for different TreeNumbers

What I did was:

Order the data
Wrap a slightly customized function around duplicated()
Use ddply() (in the plyr package) to split and analyze your data
Print out the problematic entries

Ordering the Data

I ordered first by Stand, then Plot, and finally StumpNumber

    sampleOrdered <- sample[order(sample$Stand, sample$Plot, sample$StumpNumber)]

Wrapping my own `duplicated()` function

Assuming the issue is that some trees are attached to the same stump, we can write the following function:

    findTreesAttachedToTheSameStump <- function(data) {
        x <- duplicated(data[ , "StumpNumber"])
        data[x, ]
    }

This function will select out and return (implicitly) whatever entries pass the duplicated() test.

Using ddply

I did a bit of split-apply-combine here. I instruct ddply to break the dataset by Stand and Plot (since your data is nested, and StumpNumber might only be unique within a plot). Then, I apply the function we created above:

    sampleDuplicated <- ddply(sampleOrdered, .(Stand, Plot), findTreesAttachedToTheSameStump)

Print out the problematic stumps

Now all we need to do is call sampleDuplicated, which contains the entries for every Plot/Stand/Stump combination that was repeated.

继续阅读：r

R: Finding patterns across multiple columns- possibly duplicated()?

Ordering the Data

Wrapping my own `duplicated()` function

Using ddply

Print out the problematic stumps

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Ordering the Data

Wrapping my own duplicated() function

Using ddply

Print out the problematic stumps

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Wrapping my own `duplicated()` function

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？