Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are no spaces between words in Chinese or Japanese.

Pressing space confirms the current selection in the Japanese IME, which is expected behavior. Where some Linux implementations get it wrong is they also insert a space after the word, meaning the user has to select the desired word in the IME with the space bar and then remove the erroneous inserted space.

Edit: Correction based on feedback below. Previously stated that Hangul does not have spaces.



No, Korean is normally written with spaces between words nowadays (perhaps that wasn't always the case?).

https://www.omniglot.com/writing/korean.htm


> There are no spaces between words in Hangul

Wrong, there are spaces between words in Korean. It’s in Japanese and Chinese that there isn’t. And in Vietnamese there are spaces between everyone syllables, even in words.


Thanks for the fact check.

Not sure your comment on Vietnamese is accurate though. I work in a company with ~35% native Vietnamese speakers and I’ve seen plenty of multi-syllable words.

Are you talking about traditional Vietnamese (when it still used Chinese characters) or modern Vietnamese (post-French-colonialism) which uses the Latin alphabet with accents?


It is accurate, there are spaces between each syllable in modern written Vietnamese, except in foreign words. The syllables can have as many as 7 characters, and you need an IME to type the tone marks. The written language looks like this: https://vi.wikipedia.org/wiki/Vi%E1%BB%87t_Nam


Wow, you’re right.

I never noticed (even while studying the basics myself) that syllables are space-separated.

I always saw the words (e.g. “thanh pho” = city - don’t have the keyboard on my phone) as independent units. Didn’t even recognize the spaces.

Amazing how something can be right in front of you without noticing it.

So much makes sense now. Thanks again.


> you need an IME to type the tone marks

The standard Vietnamese keyboard layout works without an IME layer. However, apparently most people who write Vietnamese _prefer_ to use an IME.


Vietnamese does not really belong in CJK group because it's written with Latin alphabet.


That was not my point. I mentioned Vietnamese because the spacing it uses is interesting.

Also like dfcowell said, Vietnamese used to be written with han & chu nôm characters (respectively Chinese characters and Chinese-like characters created by Vietnamese), a lot of which are encoded in Unicode. Hence the existence of the CJKV acronym.


It is actually CJKV to deal with historical Vietnamese.

From Unicode spec:

> Although the term “CJK”—Chinese, Japanese, and Korean is used throughout this text to describe the languages that currently use Han ideographic characters, it should be noted that earlier Vietnamese writing systems were based on Han ideographs. Consequently, the term “CJKV” would be more accurate in a historical sense. Han ideographs are still used for historical, religious, and pedagogical purposes in Vietnam.


Traditional Vietnamese (before French colonialism) used Chinese characters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: