Code golf: find all anagrams

2022-12-25 17:51 问答作者：

A word is an anagram if the letters in that word can be re-arranged to form a different word.

Task:

The shortest source code by character count to find all sets of anagrams given a word list.
Spaces and new lines should be counted as characters
Use the code ruler

---------10--------20--------30--------40--------50--------60--------70--------80--------90--------100-------110-------120

Input:

a list of words from stdin with each word separated by a new line.

e.g.

A
A's
AOL
AOL's
Aachen
开发者_如何学编程Aachen's
Aaliyah
Aaliyah's
Aaron
Aaron's
Abbas
Abbasid
Abbasid's

Output:

All sets of anagrams, with each set separated by a separate line.

Example run:

./anagram < words
marcos caroms macros
lump's plum's
dewar's wader's
postman tampons
dent tend
macho mocha
stoker's stroke's
hops posh shop
chasity scythia
...

I have a 149 char perl solution which I'll post as soon as a few more people post :)

Have fun!

EDIT: Clarifications

Assume anagrams are case insensitive (i.e. upper and lower case letters are equivalent)
Only sets with more than 1 item should be printed
Each set of anagrams should only be printed once
Each word in an anagram set should only occur once

EDIT2: More Clarifications

If two words differ only in capitalization, they should be collapsed into the same word, and it's up to you to decide which capitalization scheme to use for the collapsed word
sets of words only have to end in a new line, as long as each word is separated in some way, e.g. comma separated, or space separated is valid. I understand some languages have quick array printing methods built in so this should allow you to take advantage of that if it doesn't output space separated arrays.

Powershell, 104 97 91 86 83 chars

$k=@{};$input|%{$k["$([char[]]$_|%{$_+0}|sort)"]+=@($_)}
$k.Values|?{$_[1]}|%{"$_"}

Update for the new requirement (+8 chars):

To exclude the words that only differ in capitalization, we could just remove the duplicates (case-insensitvely) from the input list, i.e. $input|sort -u where -u stands for -unique. sort is case-insenstive by default:

$k=@{};$input|sort -u|%{$k["$([char[]]$_|%{$_+0}|sort)"]+=@($_)} 
$k.Values|?{$_[1]}|%{"$_"}

Explanation of the `[char[]]$_|%{$_+0}|sort` -part

It's a key for the hashtable entry under which anagrams of a word are stored. My initial solution was: $_.ToLower().ToCharArray()|sort. Then I discovered I didn't need ToLower() for the key, as hashtable lookups are case-insensitive.

[char[]]$_|sort would be ideal, but sorting of the chars for the key needs to be case-insensitive (otherwise Cab and abc would be stored under different keys). Unfortunately, sort is not case-insenstive for chars (only for strings).

What we need is [string[]][char[]]$_|sort, but I found a shorter way of converting each char to string, which is to concat something else to it, in this case an integer 0, hence [char[]]$_|%{$_+0}|sort. This doesn't affect the sorting order, and the actual key ends up being something like: d0 o0 r0 w0. It's not pretty, but it does the job :)

Perl, 59 characters

chop,$_{join'',sort split//,lc}.="$_ "for<>;/ ./&&say for%_

Note that this requires Perl 5.10 (for the say function).

Haskell, 147 chars

prior sizes: ~~150~~ ~~159~~ chars

import Char
import List
x=sort.map toLower
g&a=g(x a).x
main=interact$unlines.map unwords.filter((>1).length).groupBy((==)&).sortBy(compare&).lines

This version, at 165 chars satisifies the new, clarified rules:

import Char
import List
y=map toLower
x=sort.y
g&f=(.f).g.f
w[_]="";w a=show a++"\n"
main=interact$concatMap(w.nubBy((==)&y)).groupBy((==)&x).sortBy(compare&x).lines

This version handles:

Words in the input that differ only by case should only count as one word
The output needs to be one anagram set per line, but extra punctuation is acceptable

Ruby, 94 characters

h={};(h[$_.upcase.bytes.sort]||=[])<<$_ while gets&&chomp;h.each{|k,v|puts v.join' 'if v.at 1}

Python, 167 characters, includes I/O

import sys
d={}
for l in sys.stdin.readlines():
 l=l[:-1]
 k=''.join(sorted(l)).lower()
 d[k]=d.pop(k,[])+[l]
for k in d:
 if len(d[k])>1: print(' '.join(d[k]))

Without the input code (i.e. if we assume the wordlist already in a list w), it's only 134 characters:

d={}
for l in w:
 l=l[:-1]
 k=''.join(lower(sorted(l)))
 d[k]=d.pop(k,[])+[l]
for k in d:
 if len(d[k])>1: print(' '.join(d[k]))

AWK - 119

{split(toupper($1),a,"");asort(a);s="";for(i=1;a[i];)s=a[i++]s;x[s]=x[s]$1" "}
END{for(i in x)if(x[i]~/ .* /)print x[i]}

AWK does not have a join function like Python, or it could have been shorter...

~~It assumes uppercase and lowercase as different.~~

C++, 542 chars

#include <iostream>
#include <map>
#include <vector>
#include <boost/algorithm/string.hpp>
#define ci const_iterator
int main(){using namespace std;typedef string s;typedef vector<s> vs;vs l;
copy(istream_iterator<s>(cin),istream_iterator<s>(),back_inserter(l));map<s, vs> r;
for (vs::ci i=l.begin(),e=l.end();i!=e;++i){s a=boost::to_lower_copy(*i);
sort(a.begin(),a.end());r[a].push_back(*i);}for (map<s,vs>::ci i=r.begin(),e=r.end();
i!=e;++i)if(i->second.size()>1)*copy(i->second.begin(),i->second.end(),
ostream_iterator<s>(cout," "))="\n";}

Python, O(n^2)

import sys;
words=sys.stdin.readlines()
def s(x):return sorted(x.lower());
print '\n'.join([''.join([a.replace('\n',' ') for a in words if(s(a)==s(w))]) for w in words])

继续阅读：anagram code-golf

Code golf: find all anagrams

Task:

Input:

Output:

EDIT: Clarifications

EDIT2: More Clarifications

Powershell, 104 97 91 86 83 chars

Update for the new requirement (+8 chars):

Explanation of the `[char[]]$_|%{$_+0}|sort` -part

Perl, 59 characters

Haskell, 147 chars

Ruby, 94 characters

Python, 167 characters, includes I/O

AWK - 119

C++, 542 chars

Python, O(n^2)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Task:

Input:

Output:

EDIT: Clarifications

EDIT2: More Clarifications

Powershell, 104 97 91 86 83 chars

Update for the new requirement (+8 chars):

Explanation of the [char[]]$_|%{$_+0}|sort -part

Perl, 59 characters

Haskell, 147 chars

Ruby, 94 characters

Python, 167 characters, includes I/O

AWK - 119

C++, 542 chars

Python, O(n^2)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Explanation of the `[char[]]$_|%{$_+0}|sort` -part

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？