Compare two user defined curves and score their similarity

2023-03-29 15:25 问答作者：

I have a set of 2 curves (each with a few hundreds to a couple thousands datapoints) that I want to compare and get some similarity "score". Actually, I have >100 of those sets to compare... I am familiar with R (or at least bioconductor) and would like to use it.

I tried the ccf() function but I'm not too happy about it.

For example, if I compare c1 to the following curves:

c1 <- c(0, 0.8, 0.9, 0.9, 0.5, 0.1, 0.5)

c1b <- c(0, 0.8, 0.9, 0.9, 0.5, 0.1, 0.5) # perfect match! ideally score of 1

c1c 开发者_JAVA技巧<- c(1, 0.2, 0.1, 0.1, 0.5, 0.9, 0.5) # total opposite, ideally score of -1? (what would 0 be though?)

c2 <- c(0, 0.9, 0.9, 0.9, 0, 0.3, 0.3, 0.9) #pretty good, score of ???

Note that the vectors don't have the same size and it needs to be normalized, somehow... Any idea? If you look at those 2 lines, they are fairly similar and I think that in a first step, measuring the area under the 2 curves and subtracting would do. I look at the post "Shaded area under 2 curves in R" but that is not quite what I need.

A second issue (optional) is that for lines that have the same profile but different amplitude, I would like to score those as very similar even though the area under them would be big:

c1 <- c(0, 0.8, 0.9, 0.9, 0.5, 0.1, 0.5)

c4 <- c(0, 0.6, 0.7, 0.7, 0.3, 0.1, 0.3) # very good, score of ??

I hope that a biologist pretending to formulate problem to programmer is OK...

I'd be happy to provide some real life examples if needed.

Thanks in advance!

They don't form curves in the usual meaning of paired x.y values unless they are of equal length. The first three are of equal length and after packaging in a matrix the rcorr function in HMisc package returns:

> rcorr(as.matrix(dfrm))[[1]]
    c1 c1b c1c
c1   1   1  -1
c1b  1   1  -1
c1c -1  -1   1   # as desired if you scaled them to 0-1

The correlation of the c1 and c4 vectors:

> cor( c(0, 0.8, 0.9, 0.9, 0.5, 0.1, 0.5),
  c(0, 0.6, 0.7, 0.7, 0.3, 0.1, 0.3) )
[1] 0.9874975

I do not have a very good answer, but I did face similar question in the past, probably on more than 1 occasion. My approach is to answer to myself what makes my curves similar when I subjectively evaluate them (the scientific term here is "eye-balling" :). Is it the area under the curve? Do I count linear translation, rotation, or scaling (zoom) of my curves as contributing to dissimilarity? If not, I take out all the factors that I do not care about by selected normalization (e.g. scale the curves to cover the same ranges in x and y).

I am confident that there is a rigorous mathematical theory for this topic, I would search for the words "affinity" "affine". That said, my primitive/naive methods usually sufficed for the work I was doing.

You may want to ask this question on some math forum.

If the proteins you compare are reasonably close orthologs, you should be able to obtain alignments for either each pair you want to score the similarity of, or a multiple alignment for the entire bunch. Depending on the application, I think the latter will be more rigorous. I would then extract the folding score of only those amino acids that are aligned so that all profiles have the same length, and calculate correlation measures or squared normalized dot-products of the profiles as a similarity measure. The squared normalized dot product or the spearman rank correlation will be less sensitive to amplitude differences, which you seem to want. That will make sure you are comparing elements which are reasonable paired (to the extent the alignment is reasonable), and will let you answer questions like: "Are corresponding residues in the compared proteins generally folded to a similar extent?".

继续阅读：computational-geometry curve geometry

Compare two user defined curves and score their similarity

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？