유니 코드 한자의 전체 범위는 무엇입니까?
U + 4E00..U + 9FFF는 전체 세트의 일부이지만 전부는 아닙니다.
CJK 유니 코드 FAQ ( "중국어, 일본어 및 한국어"문자 포함)를 통해 전체 목록을 찾을 수 있습니다.
" 동아시아 문자 "문서는 다음과 같이 언급합니다.
한 표어를 포함하는 블록
한 표의 문자는 표 12-2에 표시된대로 유니 코드 표준의 5 개 주요 블록에 있습니다.
표 12-2. 한 표어를 포함하는 블록
Block Range Comment
CJK Unified Ideographs 4E00-9FFF Common
CJK Unified Ideographs Extension A 3400-4DBF Rare
CJK Unified Ideographs Extension B 20000-2A6DF Rare, historic
CJK Unified Ideographs Extension C 2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D 2B740–2B81F Uncommon, some in current use
CJK Unified Ideographs Extension E 2B820–2CEAF Rare, historic
CJK Compatibility Ideographs F900-FAFF Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants
참고 : 블록 범위는 시간이 지남에 따라 발전 할 수 있습니다. 최신 정보는 CJK Unified Ideographs에 있습니다.
Wikipedia 참조 :
- CJK 통합 표의 문자 확장 A
- CJK 통합 표의 문자 확장 B
- CJK 통합 표의 문자 확장 C
- CJK 통합 표의 문자 확장 D
- CJK 통합 표의 문자 확장 E
- CJK Unified Ideographs Extension F (유니 코드 10)
유니 코드에는 현재 74605 CJK 문자가 있습니다. CJK 문자에는 중국어에서 사용되는 문자뿐만 아니라 일본어 한자, 한자, 베트남어 Chu Nom도 포함 됩니다. 일부 CJK 문자는 중국어 문자가 아닙니다 .
1) CJK Unified Ideographs 블록 의 20941 자 .
코드 포인트 U + 4E00 ~ U + 9FCC.
2) CJKUI Ext A 블록 의 6582 자 .
코드 포인트 U + 3400 ~ U + 4DB5 . 유니 코드 3.0 (1999).
3) CJKUI Ext B 블록 의 42711 자 .
코드 포인트 U + 20000 ~ U + 2A6D6. 유니 코드 3.1 (2001).
- U + 20000-U + 215FF
- U + 21600-U + 230FF
- U + 23100-U + 245FF
- U + 24600-U + 260FF
- U + 26100-U + 275FF
- U + 27600-U + 290FF
- U + 29100-U + 2A6DF
3) CJKUI Ext C 블록 의 4149 자 .
코드 포인트 U + 2A700 ~ U + 2B734 . 유니 코드 5.2 (2009).
4) CJKUI Ext D 블록 에서 222 자 .
코드 포인트 U + 2B740 ~ U + 2B81D . 유니 코드 6.0 (2010).
5) CJKUI Ext E 블록.
위의 내용이 스파게티가 아닌 경우 알려진 문제를 살펴보세요 . 재미 =)
중국어 문자 (확장자 제외) 의 정확한 범위 는 [\u2E80-\u2FD5\u3190-\u319f\u3400-\u4DBF\u4E00-\u9FCC\uF900-\uFAAD]
.
CJK Radicals Supplement is a Unicode block containing alternative, often positional, forms of the Kangxi radicals. They are used headers in dictionary indices and other CJK ideograph collections organized by radical-stroke.
Kanbun is a Unicode block containing annotation characters used in Japanese copies of classical Chinese texts, to indicate reading order.
CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideographs.
CJK Unified Ideographs is a Unicode block containing the most common CJK ideographs used in modern Chinese and Japanese.
CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings.
For the details please refer to here, and the extensions are provided in other answers.
Unicode version 11.0.0
In Unicode the Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters.
These ranges often contain non-assigned or reserved code points(suck as U+2E9A , U+2EF4 - 2EFF),
Chinese characters
bottom top reference(also have a look at wiki page) block name
4E00 9FEF http://www.unicode.org/charts/PDF/U4E00.pdf CJK Unified Ideographs
3400 4DBF http://www.unicode.org/charts/PDF/U3400.pdf CJK Unified Ideographs Extension A
20000 2A6DF http://www.unicode.org/charts/PDF/U20000.pdf CJK Unified Ideographs Extension B
2A700 2B73F http://www.unicode.org/charts/PDF/U2A700.pdf CJK Unified Ideographs Extension C
2B740 2B81F http://www.unicode.org/charts/PDF/U2B740.pdf CJK Unified Ideographs Extension D
2B820 2CEAF http://www.unicode.org/charts/PDF/U2B820.pdf CJK Unified Ideographs Extension E
2CEB0 2EBEF https://www.unicode.org/charts/PDF/U2CEB0.pdf CJK Unified Ideographs Extension F
3007 3007 https://zh.wiktionary.org/wiki/%E3%80%87 in block CJK Symbols and Punctuation
- In CJK Unified Ideographs block, I notice many answers use upper bound 9FCC, but U+9FCD(鿍) is indeed a chinese char. And all characters in this block are Chinese characters(also used in Japanese or Korean etc.).
- Most of characters in CJK Unified Ideograohs Ext (Except Ext F, only 17% in Ext F are chinese characters), are traditional chinese characters, which are rarely used in China.
- 〇 is the chinese character form of zero and still in use today
Therefore the range is
[0x3007,0x3007],[0x3400,0x4DBF],[0x4E00,0x9FEF],[0x20000,0x2EBFF]
CJK characters but never used in chinese
They are Common Han used only for compatibility.
It is almost impossible to see them appear in any chinese book, article , writings etc.
all characters here has one corresponding glyph-identical chinese characters. Such as 金(U+F90A) and 金(U+91D1), they are identical in Glyph.
F900 FAFF https://www.unicode.org/charts/PDF/UF900.pdf CJK Compatibility Ideographs
2F800 2FA1F https://www.unicode.org/charts/PDF/U2F800.pdf CJK Compatibility Ideographs Supplement
CJK related symbols
2E80 2EFF http://www.unicode.org/charts/PDF/U2E80.pdf CJK Radicals Supplement
2F00 2FDF http://www.unicode.org/charts/PDF/U2F00.pdf Kangxi Radicals
2FF0 2FFF https://unicode.org/charts/PDF/U2FF0.pdf Ideographic Description Character
3000 303F https://www.unicode.org/charts/PDF/U3000.pdf CJK Symbols and Punctuation
3100 312f https://unicode.org/charts/PDF/U3100.pdf Bopomofo
31A0 31BF https://unicode.org/charts/PDF/U31A0.pdf Bopomofo Extended
31C0 31EF http://www.unicode.org/charts/PDF/U31C0.pdf CJK Strokes
3200 32FF https://unicode.org/charts/PDF/U3200.pdf Enclosed CJK Letters and Months
3300 33FF https://unicode.org/charts/PDF/U3300.pdf CJK Compatibility
FE30 FE4F https://www.unicode.org/charts/PDF/UFE30.pdf CJK Compatibility Forms
FF00 FFEF https://www.unicode.org/charts/PDF/UFF00.pdf Halfwidth and Fullwidth Forms
1F200 1F2FF https://www.unicode.org/charts/PDF/U1F200.pdf Enclosed Ideographic Supplement
- some blocks such as Hangul Compatibility Jamo are abandoned because of no relation to Chinese.
- Kangxi Radicals is not Chinese characters, it's graphical component of a Chinese charaters, it are used specially to express radicals, .e.g. ⼻(U+2F3B) and 彳(U+5F73), ⻜(U+2EDC) and 飞 (U+98DE)
Other common punctuation appears in chinese
This is a wide range, some punctuation maybe never used, some punctuations such as ……”“
are used so much in chinese.
0000 007F https://unicode.org/charts/PDF/U0000.pdf C0 Controls and Basic Latin
2000 206F https://unicode.org/charts/PDF/U2000.pdf General Punctuation
……
There are also many chinese-related symbols, such as Yijing Hexagram Symbols or Kanbun, but it's off-topic anyway. I write non-chinese-characters in CJK to have a better explaination of what are chinese characters. And ranges above already covers almost all of chars appear in Chinese writing except math and other specialty notation.
Supplementary
CJK Symbols and Punctuation
、。〃〄々〆〇〈〉《》「」『』【】〒〓〔〕〖〗〘〙〚〛〜〝〞〟〠〡〢〣〤〥〦〧〨〩〪〭〮〯〫〬〰〱〲〳〴〵〶〷〸〹〺〻〼〽 〾 〿
Halfwidth and Fullwidth Forms
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~⦅⦆。「」、・ヲァィゥェォャュョッーアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワン゙゚ᄀᄁᆪᄂᆬᆭᄃᄄᄅᆰᆱᆲᆳᆴᆵᄚᄆᄇᄈᄡᄉᄊᄋᄌᄍᄎᄏᄐᄑ하ᅢᅣᅤᅥᅦᅧᅨᅩᅪᅫᅬᅭᅮᅯᅰᅱᅲᅳᅴᅵ¢£¬ ̄¦¥₩│←↑→↓■○
Refer
- https://zh.wikipedia.org/wiki/%E6%B1%89%E5%AD%97 (in chinese language, notice the right side bar)
- https://zh.wikipedia.org/wiki/%E4%B8%AD%E6%97%A5%E9%9F%93%E7%9B%B8%E5%AE%B9%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97 (notice the bottom table)
- http://www.unicode.org
The Unicode code blocks that the others answers gave certainly cover most of the Chinese Unicode characters, but check out some of these other code blocks, too.
CJK_UNIFIED_IDEOGRAPHS
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_E
CJK_COMPATIBILITY
CJK_COMPATIBILITY_FORMS
CJK_COMPATIBILITY_IDEOGRAPHS
CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT
CJK_RADICALS_SUPPLEMENT
CJK_STROKES
CJK_SYMBOLS_AND_PUNCTUATION
ENCLOSED_CJK_LETTERS_AND_MONTHS
ENCLOSED_IDEOGRAPHIC_SUPPLEMENT
KANGXI_RADICALS
IDEOGRAPHIC_DESCRIPTION_CHARACTERS
See my fuller discussion here. And this site is convenient for browsing Unicode.
To summarize, it sounds like these are them:
var blocks = [
[0x3400, 0x4DB5],
[0x4E00, 0x62FF],
[0x6300, 0x77FF],
[0x7800, 0x8CFF],
[0x8D00, 0x9FCC],
[0x2e80, 0x2fd5],
[0x3190, 0x319f],
[0x3400, 0x4DBF],
[0x4E00, 0x9FCC],
[0xF900, 0xFAAD],
[0x20000, 0x215FF],
[0x21600, 0x230FF],
[0x23100, 0x245FF],
[0x24600, 0x260FF],
[0x26100, 0x275FF],
[0x27600, 0x290FF],
[0x29100, 0x2A6DF],
[0x2A700, 0x2B734],
[0x2B740, 0x2B81D]
]
'developer tip' 카테고리의 다른 글
"module"package.json 필드는 무엇입니까? (0) | 2020.09.25 |
---|---|
나무가없는 이유 (0) | 2020.09.24 |
SVN-새 svn 경로를 가리 키도록 작업 복사본 변경 (0) | 2020.09.24 |
Facebook Graph API, 사용자 이메일을받는 방법? (0) | 2020.09.24 |
치명적인 오류 : Visual Studio의 "대상 아키텍처 없음" (0) | 2020.09.24 |