Off by one with sliding?

2023-03-26 18:15 问答作者：

One of the advantages of not handling collections through indices is to avoid off-by-one errors. That's certainly not the only advantage, but it is one of them.

Now, I often use sliding in some algorithms in Scala, but I feel that it usually results in something very similar to the off-by-one errors, because a sliding of m elements in a collection of size n has size n - m + 1 elements. Or, more trivially, list sliding 2 is one element shorter than list.

The feeling I get is that there's a missing abstraction in this pattern, something that would be part sliding, part something more -- like foldLeft is to reduceLeft. I can't think of what that might be, however. Can anyone help me find enlightenment here?

UPDATE

Since people are not clear one what I'm talking, let's con开发者_Python百科sider this case. I want to capitalize a string. Basically, every letter that is not preceded by a letter should be upper case, and all other letters should be lower case. Using sliding, I have to special case either the first or the last letter. For example:

def capitalize(s: String) = s(0).toUpper +: s.toSeq.sliding(2).map {
  case Seq(c1, c2) if c2.isLetter => if (c1.isLetter) c2.toLower else c2.toUpper
  case Seq(_, x) => x
}.mkString

I’m taking Owen’s answer as an inspiration to this.

When you want to apply a simple diff() to a list, this can be seen as equivalent to the following matrix multiplication.

a = (0 1 4 3).T

M = ( 1 -1  0  0)
    ( 0  1 -1  0)
    ( 0  0  1 -1)

diff(a) = M * a = (1 3 1).T

We may now use the same scheme for general list operations, if we replace addition and multiplication (and if we generalise the numbers in our matrix M).

So, with plus being a list append operation (with flatten afterwards – or simply a collect operation), and the multiplicative equivalent being either Some(_) or None, a slide with a window size of two becomes:

M = (Some(_) Some(_) None None)
    (None Some(_) Some(_) None)
    (None None Some(_) Some(_))

slide(a) = M “*” a = ((0 1) (1 4) (4 3)).T

Not sure, if this is the kind of abstraction you’re looking for, but it would be a generalisation on a class of operations which change the number of items.

diff or slide operations of order m for an input of length n will need to use Matrixes of size n-m+1 × n.

Edit: A solution could be to transform List[A] to List[Some[A]] and then to prepend or append (slideLeft or slideRight) these with None. That way you could handle all the magic inside the map method.

list.slideLeft(2) {
  case Seq(Some(c1), Some(c2)) if c2.isLetter => if (c1.isLetter) c2.toLower else c2.toUpper
  case Seq(_, Some(x)) => x
}

I run into this problem all the time in python/R/Matlab where you diff() a vector and then can't line it up with the original one! It is very frustrating.

I think what's really missing is that the vector only hold the dependent variables, and assumes that you, the programmer, are keeping track of the independent variables, ie the dimension that the collection ranges over.

I think the way to solve this is to have the language to some degree keep track of independent variables; perhaps statically through types, or dynamically by storing them along with the vector. Then it can check the independent axes, make sure they line up, or, I don't know if this is possible, shuffle things around to make them line up.

That's the best I've thought of so far.

EDIT

Another way of thinking about this is, why does your collection have order? Why is it not just a Set? The order means something, but the collection doesn't keep track of that -- it's basically using sequential position (which is about as informative as numerical indices) to proxy for the real meaning.

EDIT

Another consequence would be that transformations like sliding actually represent two transformations, one for the dependent variables, and one for their axis.

In your example, I think the code is made more complex because, you basically want to do a map but working with sliding which introduces edge conditions in a way that doesn't work nicely. I think a fold left with an accumulator that remembers the relevant state may be easier conceptually:

def capitalize2(s: String) = (("", true) /: s){ case ((res, notLetter), c) => 
  (res + (if (notLetter) c.toUpper else c.toLower), !c.isLetter)
}._1

I think this could be generalized so that notLetter could remember n elements where n is the size of the sliding window.

The transformation you're asking for inherently reduces the size of the data. Sorry--there's no other way to look at it. tail also gives you off-by-one errors.

Now, you might say--well, fine, but I want a convenience method to maintain the original size.

In that case, you might want these methods on List:

initializedSliding(init: List[A]) = (init ::: this).sliding(1 + init.length)
finalizedSliding(tail: List[A]) = (this ::: tail).sliding(1 + tail.length)

which will maintain your list length. (You can envision how to generalize to non-lists, I'm sure.)

This is the analog to fold left/right in that you supply the missing data in order to perform a pairwise (or more) operation on every element of the list.

The off by one problem you describe reminds me in the boundary condition issue in digital signal processing. The problem occurs since the data (list) is finite. It doesn't occur for infinite data (stream). In digital signal processing the issues is remedied by extending the finite signal to an infinite one. This can be done in various ways like repeating the data or repeating the data and reversing it on every repetition (like it is done for the discrete cosine transform).

Borrowing from these approached for sliding would lead to an abstraction which does not exhibit the off by one problem:

(1::2::3::Nil).sliding(2)

would yield

(1,2), (2,3), (3,1)

for circular boundary conditions and

(1,2), (2,3), (3,2)

for circular boundary conditions with reversal.

Off-by-one errors suggest that you are trying to put the original list in one-to-one correspondence with the sliding list, but something strange is going on, since the sliding list has fewer elements.

The problem statement for your example can be roughly phrased as: "Uppercase every character if it (a) is the first character, or (b) follows a letter character". As Owen points, the first character is a special case, and any abstraction should respect this. Here's a possibility,

def slidingPairMap[A, B](s: List[A], f1: A => B, f2: (A, A) => B): List[B] = s match {
  case Nil => Nil
  case x :: _ => f1(x) +: s.sliding(2).toList.map { case List(x, y) => f2(x, y) } 
}

(not the best implementation, but you get the idea). This generalizes to sliding triples, with off-by-two errors, and so on. The type of slidingPairMap makes it clear that special casing is being done.

An equivalent signature could be

def slidingPairMap[A, B](s: List[A], f: Either[A, (A, A)] => B): List[B]

Then f could use pattern matching to figure out if it's working with the first element, or with a subsequent one.

Or, as Owen says in the comments, why not make a modified sliding method that gives information about whether the element is first or not,

def slidingPairs[A](s: List[A]): List[Either[A, (A, A)]]

I guess this last idea is isomorphic to what Debilski suggests in the comments: pad the beginning of the list with None, wrap all the existing elements with Some, and then call sliding.

I realize this is an old question but I just had a similar problem and I wanted to solve it without having to append or prepend anything, and where it would handle the last elements of the sequence in a seamless manner. The approach I came up with is a slidingFoldLeft. You have to handle the first element as a special case (like some others mentioned, for capitalize, it is a special case), but for the end of the sequence you can just handle it like other cases. Here is the implementation and some silly examples:

def slidingFoldLeft[A, B] (seq: Seq[A], window: Int)(acc: B)(
    f: (B, Seq[A]) => B): B = {
  if (window > 0) {
    val iter = seq.sliding(window)
    iter.foldLeft(acc){
      // Operate normally
      case (acc, next) if iter.hasNext => f(acc, next)
      // It's at the last <window> elements of the seq, handle current case and 
      // call recursively with smaller window
      case (acc, next) =>
        slidingFoldLeft(next.tail, window - 1)(f(acc, next))(f)
    }
  } else acc
}

def capitalizeAndQuestionIncredulously(s: String) =
  slidingFoldLeft(s.toSeq, 2)("" + s(0).toUpper) {
    // Normal iteration
    case (acc, Seq(c1, c2)) if c1.isLetter && c2.isLetter => acc + c2.toLower
    case (acc, Seq(_, c2))  if c2.isLetter                => acc + c2.toUpper
    case (acc, Seq(_, c2))                                => acc + c2
    // Last element of string
    case (acc, Seq(c)) => acc + "?!"
  }

def capitalizeAndInterruptAndQuestionIncredulously(s: String) =
  slidingFoldLeft(s.toSeq, 3)("" + s(0).toUpper) {
    // Normal iteration
    case (acc, Seq(c1, c2, _)) if c1.isLetter && c2.isLetter => acc + c2.toLower
    case (acc, Seq(_, c2, _))  if c2.isLetter                => acc + c2.toUpper
    case (acc, Seq(_, c2, _))                                => acc + c2
    // Last two elements of string
    case (acc, Seq(c1, c2)) => acc + " (commercial break) " + c2
    // Last element of string
    case (acc, Seq(c)) => acc + "?!"
  }

println(capitalizeAndQuestionIncredulously("hello my name is mAtthew"))
println(capitalizeAndInterruptAndQuestionIncredulously("hello my name is mAtthew"))

And the output:

Hello My Name Is Matthew?!
Hello My Name Is Matthe (commercial break) w?!

I would prepend None after mapping with Some(_) the elements; note that the obvious way of doing it (matching for two Some in the default case, as done in the edit by Debilski) is wrong, as we must be able to modify even the first letter. This way, the abstraction respects the fact that simply sometimes there is no predecessor. Using getOrElse(false) ensures that a missing predecessor is treated as having failed the test.

((None +: "foo1bar".toSeq.map(Some(_))) sliding 2).map {
   case Seq(c1Opt, Some(c2)) if c2.isLetter => if (c1Opt.map(_.isLetter).getOrElse(false)) c2.toLower else c2.toUpper
   case Seq(_, Some(x)) => x
}.mkString
res13: String = "Foo1Bar"

Acknowledgments: the idea of mapping the elements with Some(_) did come to me through Debilski's post.

I'm not sure if this solves your concrete problem, but we could easily imagine a pair of methods e.g. slidingFromLeft(z: A, size: Int) and slidingToRight(z: A, size: Int) (where A is collection's element type) which, when called on e.g.

List(1, 2, 3, 4, 5)

with arguments e.g. (0, 3), should produce respectively

List(0, 0, 1), List(0, 1, 2), List(1, 2, 3), List(2, 3, 4), List(3, 4, 5)

and

List(1, 2, 3), List(2, 3, 4), List(3, 4, 5), List(4, 5, 0), List(5, 0, 0)

This is the sort of problem nicely-suited to an array-oriented functional language like J. Basically, we generate a boolean with a one corresponding to the first letter of each word. To do this, we start with a boolean marking the spaces in a string. For example (lines indented three spaces are inputs; results are flush with left margin; "NB." starts a comment):

   str=. 'now  is  the    time'    NB. Example w/extra spaces for interest
   ]whspc=. ' '=str                NB. Mark where spaces are=1
0 0 0 1 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0

Verify that (*.-.) ("and not") returns one only for "1 0":

   ]tt=. #:i.4                     NB. Truth table
0 0
0 1
1 0
1 1
   (*.-.)/"1 tt                    NB. Apply to 1-D sub-arrays (rows)
0 0 1 0                            NB. As hoped.

Slide our tacit function across pairs in the boolean:

   2(*.-.)/\whspc                  NB. Apply to 2-ples
0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0

But this doesn't handle the edge condition of the initial letter, so force a one into the first position. This actually helps as the reduction of 2-ples left us one short. Here we compare lengths of the original boolean and the target boolean:

   #whspc
20
   #1,2(*.-.)/\whspc
20
   1,2(*.-.)/\whspc
1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0

We get uppercase by using the index into the lowercase vector to select from the uppercase vector (after defining these two vectors):

   'lc uc'=. 'abcdefghijklmnopqrstuvwxyz';'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
   (uc,' '){~lc i. str
NOW  IS  THE    TIME

Check that insertion by boolean gives correct result:

       (1,2(*.-.)/\whspc) } str,:(uc,' '){~lc i. str
Now  Is  The    Time

Now is the time to combine all this into one statement:

   (1,2(*.-.)/\' '=str) } str,:(uc,' '){~lc i. str
Now  Is  The    Time

继续阅读：collections functional-programming scala

Off by one with sliding?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？