shortest digest of a string

2022-12-21 15:19 问答作者：

[Description] Given a string of char type, find a shortest digest, which is defined as: a shortest sub-string which contains all the characters in the original string.

[Example]

A = "aaabedacd"

B = "bedac" is the answer.

[My solution]

Define an integer 开发者_运维技巧table with 256 elements, which is used to record the occurring times for each kind of character in the current sub-string.
Scan the whole string, statistic the total kinds of character in the given string by using the above table.
Use two pointers, start, end, which are initially pointing to the start and (start + 1) of the given string. The current kinds of character is 1.
Expand sub-string[start, end) at the end until it contains all kinds of character. Update the shortest digest if possible.
Contract sub-string[start, end] at the start by one character each time, try to restore its digest property if necessary by step 4.

The time cost is O(n), and the extra space cost is constant.

Any better solution without extra space?

That works very poorly, when you consider the fact that characters are not in fact limited to 256; there are closer to 2^32 codepoints in Unicode; and if you try what you are planning on an UTF-8 string, it is going to blow up. In a big way.

A better approach would be to use a digest algorithm like MD5 or FNV, or doing what you are doing, but rather with a ~~linked list~~ sparse array; adding the codepoints of the characters as you encounter them, and concatenating the codepoints afterwards, converting to, say, UTF-8 as you go.

EDIT:

Counterexample: "På japansk heter regn '雨'."

I don't think your algorithm is correct. Consider the string: "baaabedacdc". The correct answer is still "bedac", but your algorithm will advance the start pointer forward until it finds the "e" (the only character with an occurrence count of 1), and then the end pointer backward until it finds the "e" (the only character with an occurrence count of 1), for a result of "e".

I may have misunderstood the algorithm, though.

Why not use a simpler algorithm for this question, may not be most time efficient, but it works ;) :

step 1. (Asssuming we are dealing with character set of 26 alphabets), create a boolean array of size 26 and scan through the string and check the boolean corresponding to the character. For instance, set elem[0] = true when you run into a, elem[1] = true when you run into b etc.

step 2. create a string using the characters where elem[x] = true. so, the string for this case would be "abcde" and its length = 5.

step 3. traverse through the given string a second time while extracting sub-strings of length 5, sorting them in ascending order and matching them with the string from step 2.

继续阅读：algorithm

shortest digest of a string

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？