Finding out what characters a given font supports

2023-01-31 02:05 问答作者：

How do I extract the list of s开发者_开发知识库upported Unicode characters from a TrueType or embedded OpenType font on Linux?

Is there a tool or a library I can use to process a .ttf or a .eot file and build a list of code points (like U+0123, U+1234, etc.) provided by the font?

Here is a method using the fontTools Python library (which you can install with something like pip install fonttools):

#!/usr/bin/env python
from itertools import chain
import sys

from fontTools.ttLib import TTFont
from fontTools.unicode import Unicode

with TTFont(
    sys.argv[1], 0, allowVID=0, ignoreDecompileErrors=True, fontNumber=-1
) as ttf:
    chars = chain.from_iterable(
        [y + (Unicode[y[0]],) for y in x.cmap.items()] for x in ttf["cmap"].tables
    )
    if len(sys.argv) == 2:  # print all code points
        for c in chars:
            print(c)
    elif len(sys.argv) >= 3:  # search code points / characters
        code_points = {c[0] for c in chars}
        for i in sys.argv[2:]:
            code_point = int(i)   # search code point
            #code_point = ord(i)  # search character
            print(Unicode[code_point])
            print(code_point in code_points)

The script takes as arguments the font path and optionally code points / characters to search for:

$ python checkfont.py /usr/share/fonts/**/DejaVuSans.ttf
(32, 'space', 'SPACE')
(33, 'exclam', 'EXCLAMATION MARK')
(34, 'quotedbl', 'QUOTATION MARK')
…

$ python checkfont.py /usr/share/fonts/**/DejaVuSans.ttf 65 12622  # a ㅎ
LATIN CAPITAL LETTER A
True
HANGUL LETTER HIEUH
False

The X program xfd can do this. To see all characters for the "DejaVu Sans Mono" font, run:

xfd -fa "DejaVu Sans Mono"

It's included in the x11-utils package on Debian/Ubuntu, xorg-x11-apps on Fedora/RHEL, and xorg-xfd on Arch Linux.

The fontconfig commands can output the glyph list as a compact list of ranges, eg:

$ fc-match --format='%{charset}\n' OpenSans
20-7e a0-17f 192 1a0-1a1 1af-1b0 1f0 1fa-1ff 218-21b 237 2bc 2c6-2c7 2c9
2d8-2dd 2f3 300-301 303 309 30f 323 384-38a 38c 38e-3a1 3a3-3ce 3d1-3d2 3d6
400-486 488-513 1e00-1e01 1e3e-1e3f 1e80-1e85 1ea0-1ef9 1f4d 2000-200b
2013-2015 2017-201e 2020-2022 2026 2030 2032-2033 2039-203a 203c 2044 2070
2074-2079 207f 20a3-20a4 20a7 20ab-20ac 2105 2113 2116 2120 2122 2126 212e
215b-215e 2202 2206 220f 2211-2212 221a 221e 222b 2248 2260 2264-2265 25ca
fb00-fb04 feff fffc-fffd

Use fc-query for a .ttf file and fc-match for an installed font name.

This likely doesn't involve installing any extra packages, and doesn't involve translating a bitmap.

Use fc-match --format='%{file}\n' to check whether the right font is being matched.

fc-query my-font.ttf will give you a map of supported glyphs and all the locales the font is appropriate for according to fontconfig

Since pretty much all modern linux apps are fontconfig-based this is much more useful than a raw unicode list

The actual output format is discussed here http://lists.freedesktop.org/archives/fontconfig/2013-September/004915.html

Here is a ~~POSIX~~[1] shell script that can print the code point and the character in a nice and easy way with the help of fc-match which is mentioned in Neil Mayhew's answer (it can even handle up to 8-hex-digit Unicode):

#!/bin/bash
for range in $(fc-match --format='%{charset}\n' "$1"); do
    for n in $(seq "0x${range%-*}" "0x${range#*-}"); do
        n_hex=$(printf "%04x" "$n")
        # using \U for 5-hex-digits
        printf "%-5s\U$n_hex\t" "$n_hex"
        count=$((count + 1))
        if [ $((count % 10)) = 0 ]; then
            printf "\n"
        fi
    done
done
printf "\n"

You can pass the font name or anything that fc-match accepts:

$ ls-chars "DejaVu Sans"

Updated content:

I learned that subshell is very time consuming (the printf subshell in my script). So I managed to write a improved version that is 5-10 times faster!

#!/bin/bash
for range in $(fc-match --format='%{charset}\n' "$1"); do
    for n in $(seq "0x${range%-*}" "0x${range#*-}"); do
        printf "%04x\n" "$n"
    done
done | while read -r n_hex; do
    count=$((count + 1))
    printf "%-5s\U$n_hex\t" "$n_hex"
    [ $((count % 10)) = 0 ] && printf "\n"
done
printf "\n"

Old version:

$ time ls-chars "DejaVu Sans" | wc
    592   11269   52740

real    0m2.876s
user    0m2.203s
sys     0m0.888s

New version (the line number indicates 5910+ characters, in 0.4 seconds!):

$ time ls-chars "DejaVu Sans" | wc
    592   11269   52740

real    0m0.399s
user    0m0.446s
sys     0m0.120s

End of update

Sample output (it aligns better in my st terminal

继续阅读：fonts opentype truetype

Finding out what characters a given font supports

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？