An interview question - Split text into sub-strings according to rules

2023-01-07 04:52 问答作者：

Split text into sub-strings according to below rules:

a) The length of each sub-string should less than or equal to M
b) The length of sub-string should less than or equal to N (N < M) if the sub-string contains any numeric char
c) The total number of sub-strings should be as small as possible

I have no clue how to solve this question, I guess it is related to "dynamic programming". Can anybody 开发者_运维问答help me implement it using C# or Java? Thanks a lot.

Idea

A greedy approach is the way to go:

If the current text is empty, you're done.
Take the first N characters. If any of them is a digit then this is a new substring. Chop it off and go to beginning.
Otherwise, extend the digitless segment to at most M characters. This is a new substring. Chop it off and go to beginning.

Proof

Here's a reductio-ad-absurdum proof that the above yields an optimal solution. Assume there is a better split than the greedy split. Let's skip to the point where the two splits start to differ and remove everything before this point.

Case 1) A digit among the first N characters.

Assume that there is an input for which chopping off the first N characters cannot yield an optimal solution.

Greedy split:   |--N--|...
A better split: |---|--...
                      ^
                      +---- this segment can be shortened from the left side

However, the second segment of the putative better solution can be always shortened from the left side, and the first one extended to N characters, without altering the number of segments. Therefore, a contradiction: this split is not better than the greedy split.

Case 2) No digit among the first K (N < K <= M) characters.

Assume that there is an input for which chopping off the first K characters cannot yield an optimal solution.

Greedy split:   |--K--|...
A better split: |---|--...
                      ^
                      +---- this segment can be shortened from the left side

Again, the the "better" split can be transformed, without altering the number of segments, to the greedy split, which contradicts the initial assumption that there is a better split than the greedy split.

Therefore, the greedy split is optimal. Q.E.D.

Implementation (Python)

import sys

m, n, text = int(sys.argv[1]), int(sys.argv[2]), sys.argv[3]
textLen, isDigit = len(text), [c in '0123456789' for c in text]

chunks, i, j = [], 0, 0
while j < textLen:
   i, j = j, min(textLen, j + n) 
   if not any(isDigit[i:j]):
      while j < textLen and j - i < m and not isDigit[j]:
         j += 1
   chunks += [text[i:j]]
print chunks

Implementation (Java)

public class SO {
   public List<String> go(int m, int n, String text) {
      if (text == null)
         return Collections.emptyList();
      List<String> chunks = new ArrayList<String>();

      int i = 0;
      int j = 0;
      while (j < text.length()) {
         i = j;         
         j = Math.min(text.length(), j + n);
         boolean ok = true;
         for (int k = i; k < j; k++) 
            if (Character.isDigit(text.charAt(k))) {
               ok = false;              
               break;                   
            }                   
         if (ok)        
            while (j < text.length() && j - i < m && !Character.isDigit(text.charAt(j)))
               j++;                     
         chunks.add(text.substring(i, j));
      }         
      return chunks;
   }    

   @Test
   public void testIt() {
      Assert.assertEquals(
         Arrays.asList("asdas", "d332", "4asd", "fsdxf", "23"),
         go(5, 4, "asdasd3324asdfsdxf23"));
   }    
}

Bolo has provided a greedy algorithm in his answer and asked for a counter-example. Well, there's no counter-example because that's perfectly correct approach. Here's the proof. Although it's a bit wordy, it often happens that proof is longer than algorithm itself :)

Let's imagine we have input of length L and constructed an answer A with our algorithm. Now, suppose there's a better answer B. I.e., B has less segments than A does.

Let's say, first segment in A has length la and in B - lb. la >= lb because we've choosen first segment in A to have maximum possible length. And if lb < la, we can increase length of first segment in B without increasing overall number of segments in B. It would give us some other optimal solution B', having same first segment as A.

Now, remove that first segment from A and B' and repeat operation for length L' < L. Do it until there's no segments left. It means, answer A is equal to some optimal solution.

The result of your computation will be a partitioning of the given text into short sub-strings containing numerics and long substrings not containing numerics. (This much you knew already).

You will essentially be partitioning off short subs around the numerics and then breaking everything else down into long subs as often as needed to fulfill the length criteria.

Your freedom, i.e. what you can manipulate to improve your result, is to select which characters to include with a numeric. If N = 3, then for every numeric you get the choice of XXN, XNX or NXX. If M is 5 and you have 6 characters before your first numeric, you'll want to include at least one of those characters in your short sub so you won't end up with two "long" strings to the left of your "short" one when you could have just one instead.

As a first approximation, I'd go with extending your "short" strings leftwise far enough to avoid redundant "long" strings. This is a typical "greedy" approach, and greedy approaches often yield optimal or almost-optimal results. To do even better than that would not be easy, and I'm not going to try to figure out how to go about that.

继续阅读：algorithm

An interview question - Split text into sub-strings according to rules

Idea

Proof

Implementation (Python)

Implementation (Java)

更多精彩内容

精彩评论

最新问答

黑神话悟空战斗姿势有几个?？

鸣潮福伯在哪?？

什么是生殖中心？

电视投屏记录在哪找？

咸鱼之王玩具升星需要多少扳手?？

问答排行榜

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

太快了能不能慢一点好爽~好大~不要拔出来了？

二年级家长回音怎么写大全简短的（二年级家长回音怎么写）？

Idea

Proof

Implementation (Python)

Implementation (Java)

更多精彩内容

精彩评论

最新问答

黑神话悟空战斗姿势有几个?？

鸣潮福伯在哪?？

什么是生殖中心？

电视投屏记录在哪找？

咸鱼之王玩具升星需要多少扳手?？

问答排行榜

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？

太快了能不能慢一点 好爽~好大~不要拔出来了？

二年级家长回音怎么写大全简短的（二年级家长回音怎么写）？

太快了能不能慢一点好爽~好大~不要拔出来了？