开发者

Bash script frequency analysis of unique letters and repeating letter pairs how should i build this script?

Ok,first post..

So I have this assignment to decrypt cryptograms by hand,but I also wanted to automate the process a little if not all at least a few parts,so i browsed around and found some sed and awk one liners to do some things I wanted done,but not all i wanted/needed.

There are some websites that sort of do what I want, but I really want to just do it in bash for some reason,just because I want to understand it better and such :)

The script would take a filename as parameter and output another file such as solution$1 when done.

if [ -e "$PWD/$1" ]; then
 echo "$1 exists"
else
  echo "$1 doesnt exists"   
fi

Would start the script to see if the file in param exists..

Then I found this one liner

sed -e "s/./\0\n/g" $1 | while read c;do echo -n "$c"  ; done

Which works fine but I would need to have the number of occurences per letter, I really don't see how to do that.

Here is what I'm trying to achieve more or less http://25yearsofprogramming.com/fun/ciphers.htm for the counting unique letter occurences and such.

I then need to put all letters in lowercase.

After this I see the script doing theses things.. -a subscript that scans a dictionary file for certain pattern and size of words the bigger words the better. For example: let's say the solution is the word "apparel" and the crypted word is "zxxzgvk" is there a regex way to express the pattern that compares those two words and lists the word "apparel" in a dictionnary file because "appa" and "zxxz" are similar patterns and "zxxzgvk" is of similar length with "apparel"

  1. Can this be part done and is it realistic to view the problem like this or is this just far fetched ?

    • Another subscript who takes the found letters from the previous output word and that swap letters in the cryptogram.

The swapped letters will be in uppercase to differentiate them over time.

I'll have to figure out then how to proceed to maybe rescan the new found words to see if they're found in a dictionnary file partly or fully as well,then swap more letters or not.

  1. Did anyone see this problem in the past and tried to solve it with the patterns in words like i described it,or is this just too complex ?

  2. Should I log any of the swaps ?

Maybe just scan through all the crypted words and swap as I go along then do another sweep with having for constraint in the first sweep to not change uppercase letters(actually to use them as more precise patterns..!)

Anyone did some similar script/program in another langage? If so which one? Maybe I can relate somehow :)

Maybe we can use your insight as to how you thought out your code.

I will happily include the cryptograms I have decoded and the one I have yet to decode :)

Again, the focus of my assignment is not to do this script but just to resolve the cryptograms. But doing scripts or at least trying to see how I would do this script does help me understand a little more how to think in terms of code. Feel free to point me in the right directions!

The cryptogram itself is based on simple alphabetic substitution.

I have done a pastebin here with the code to be :) http://pastebin.com/UEQDsbPk

In pseudocode the way I see it is :

  • call program with an input filename in param and optionally a second filename(dictionary)
  • verify the input file exists and isnt empty
  • read the file's content and echo it on screen
  • transform to lowercase
  • scan through the text and count the amount of each letter to do a frequency analysis
  • ask the user what langage is the text supposed to be开发者_如何学编程 (english default)
  • use the response to specify which letter frequencies to use as a baseline
  • swap letters corresponding to the frequency analysis in uppercase..
  • print the changed document on screen
  • ask the user to swap letters in the crypted text
  • if user had given a dictionary file as the second argument
  • then scan the cipher for words and find the bigger words
  • find words with a similar pattern (some letters repeating letters) in the dictionary file
  • list on screen the results if any
  • offer to swap the letters corresponding in the cipher
  • print modified cipher on screen
  • ask again to swap letters or find more similar words

More or less it the way I see the script structured.

  1. Do you see anything that I should add,did i miss something?

I hope this revised version is more clear for everyone!


Tl,dr to be frank. To the only question i've found - the answer is yes:) Please split it to smaller tasks and we'll be happy to assist you - if you won't find the answer to these smaller questions before.

If you can put it out in pseudocode, it would be easier. There's all kinds of text-manipulating stuff in unix. The means to employ depend on how big are your texts. I believe they are not so big, or you would have used some compiled language.

For example the easy but costly gawk way to count frequences:

awk -F "" '{for(i=1;i<=NF;i++) freq[$i]++;}END{for(i in freq) printf("%c %d\n", i, freq[i]);}'

As for transliterating, there is tr utility. You can forge and then pass to it the actual strings in each case (that stands true for Caesar-like ciphers).


grep -o . inputfile | sort | uniq -c | sort -rn

Example:

$ echo 'aAAbbbBBBB123AB' | grep -o . | sort | uniq -c | sort -rn
  5 B
  3 b
  3 A
  1 a
  1 3
  1 2
  1 1
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜