Nôm :  Diễn Đàn Viện Việt Học
Diễn đàn thảo luận về các đề tài chữ Nôm. 
Goto Thread: PreviousNext
Goto: Forum ListMessage ListNew TopicSearch
Non-chinese-rooted word frequency survey
Posted by: Nguyen Anh Tuan (---.nrockv01.md.comcast.net)
Date: November 20, 2004 07:34AM
Has there been any survey done on non-chinese-root word frequency ? I believe this would help greatly in Nom studies... If there was one it would give us the ability to isolate the most commonly used non-chinese-rooted words in order to figure out which Nom words are most important to learn. As many of us know Han already, I don't believe we need to include chinese-rooted words. How many words to include to make it effective is one problem though. There are also words to which there is some confusion such as chữ and tự(), which some sorces site chữ as being chinese rooted(and in fact is phonetically closer than tữ)... well, my words may seem a jumble, but any input is appreciated.

Re: Non-chinese-rooted word frequency survey
Posted by: Toan (---.246.dial.accessv.com)
Date: November 20, 2004 10:04AM
Hi Nguyễn Anh Tuấn,

Nếu em viết chữ Việt thì tôi có thể trả lờ ngắn gọn hơn, nhưng em viết tiếng Anh mà hỏi Hán tự, Hán Việt, Nôm, Nôm gốc Hán, thuần Nôm rồi điều tra về tỷ xuất thuần Nôm trong tiếng Việt thì tôi không biết nên góp ý như thế nào ! Nói một năm cũng không hết !

I assume when you said 'Nom' or 'Nom word', you meant 'Nom characters' but not the 'sound' of it.
>Has there been any survey done on non-chinese-root word frequency ?
I assume you meant "ratio" instead of "frequency" because "ratio" will be relatively easier to quantify (eventhough there is not yet a concrete number on ratio of non-Chinese-rooted words* in Vietnamese); On the other hand, "frequency" of non-Chinese-rooted words depends on 'who' writes the sentence/literature (it means the writer can 'choose' to weigh more Hán Việt words in his/her writing or more non-Chinese-rooted words in his/her writing WITHOUT sacrifying the meaning at all).
* Reason: Because there is still research to clarify which Nôm as we know nowadays are actually Chinese rooted eventhough it sounds exactly like Nôm (please see topic on "Nôm Nho đồng nghiã" for sampling).

If you meant "ratio", my quick answer is in the range of 10-40% non-Chinese (other people may have a different number). This 40% Nom assumes we can easily exclude identifiable Hán Việt from Hán Việt dictionarỵ However, how many Nôm but actually Chinese root within this 10-40% is still unknown.

>I believe this would help greatly in Nom studies
It is not as simple as it appears my friend. When you study Nôm, you cannot avoid Hán or you will not be able to express yourself (other may tell you otherwise, but believe me, taking out Hán means you take out more than 60% of your overall expression whether it is daily conversation or writing an essay or writing a report or writing a novel or writing literature etc etc) !

>isolate the most commonly used non-chinese-rooted words in order to figure out which Nom words are most important to learn.
You may pick the Nôm to learn, but you will learn less than 40% of the language (this 40%, however, depends on how one weighs/emphasizes on Nôm).
It's relatively easier to explain by using analogy. If you pick out all Latin & Greek based from English, can you effectively make a simple sentence (just use my response as an example, pick out all Latin/Greek based words if you know how to do it. Tips: If you know French, then it helps).

>many of us know Han already
I assume you meant Hán Việt. Hán means Hán characters; Hán Việt means "Hán characters" but pronounced in Vietnamese and written in Quốc Ngữ nowadays.

>many of us know Han already, I don't believe we need to include chinese-rooted words.
If you know how to pick out Hán Việt, then try to pick out Hán Việt and see if you can communicate effectively without Hán Việt. Open up a Vietnamese newspaper, pick out all the Hán Việt and see what's left in the newspaper. You can verify by yourself !

>How many words to include to make it effective is one problem though
Answer: you can't communicate effectivelỵ. Note: Communication means everything including writing essay, report, literature, novel, government document, laws, medical journal, economic journal, artistic paper, history, research etc etc.

>There are also words to which there is some confusion such as chữ and tự(), which some sorces site chữ as being chinese rooted(and in fact is phonetically closer than tữ)...
It is likely the case ! "Chữ"(Nôm), "Tự"(Hán Việt), "Zi4"(Pinyin), "Chi"(Cantonese).
This is a big research topic though !

I hope I have provided you with some idea, and if you or others think otherwise, try to verify by picking out all the Hán Việt and see for yourself, but first...look at your name (no offense) Xin đừng giận !

Regards
Toàn

Re: Non-chinese-rooted word frequency survey
Posted by: Toan (---.246.dial.accessv.com)
Date: November 20, 2004 10:49AM
Hi Nguyễn Anh Tuấn,

The articles I saw on the web cited 40% non Hán in one case and 10% non Hán in the other case which I think stretch too much (the 10% non Hán), but I gave you both numbers.

Since language is live, so my rule of thumb is as follows:
If the content used is more of a daily casual conversatiơn, I will "swing" the 40% non Hán to a bigger number, ie over 40% to whichever you think it is !
If the content used is more "profound", then I will "swing" the percentage the other way from the "peak daily casual conversation", and nobody knows what that % is because there is no official survey !

Or you may want to use 50% as the mid range, then swing back & forth !

Regards
Toàn

Re: Non-chinese-rooted word frequency survey
Posted by: nnt (---.w82-121.abo.wanadoo.fr)
Date: November 20, 2004 11:27AM
In a nutshell:
excluding Chinese-rooted words means you cannot speak/write Vietnamese at all (except perhaps baby talk)

Re: Non-chinese-rooted word frequency survey
Posted by: Nguyen Anh Tuan (---.nrockv01.md.comcast.net)
Date: November 20, 2004 02:53PM
Your comments are greatly appreciated, but I think you misunderstood. Actually by word frequency I meant how frequently the words are used... like "what are 500 of the most commonly used Viet-rooted words?" Chinese, Japanese, and Taiwanese have done a lot of these . I know that vietnamese in unintelligible without Han loanwords, but I was hoping for a list of (possibly 500 or so)Vietnamese words ordered by frequency (of usage) so that I could figure out which words were the most common used so that I may have some sort of idea which words were important for study. I wasn't describing or prescribing any sort of curricula but I was hoping for this list to aide me in my Nom studies. Of course I know you need Han(Viet) in Vietnamese, otherwise I wouldn't have studied it.

Re: Non-chinese-rooted word frequency survey
Posted by: Toan (---.246.dial.accessv.com)
Date: November 20, 2004 04:01PM
Hi Anh Tuấn,

Ok, you need to know that you will need Han characters PLUS Nom characters to write what we sometimes call 'write Nom' because it does not work if you take out Han characters, be it in Quốc Ngữ or 'write Nôm'

>I meant how frequently the words are used... like "what are 500 of the most commonly used Viet-rooted words?

Part 1 Han characters 'most commonly' used or studied list (roughly as follows):
Japan: under 600-900 in elementary school, about 1865 high school graduation.
Korea: Korea is moving to re-introduce Han characters (after stopping teaching since the 1970's for 20 years) in the school system (as an elective or starting at grade 4 I am not sure). I assume the same 600-900, then 1865 for high school graduation. Koreans call it "Trung Quốc Ngữ Tập Huấn" extending all the way to university. Korea already re-introduced Han characters in all the road signs (Hangul first, Han characters second then Latin pronunciation).
China & Taiwan: More than what Japan & Korea's numbers.
Note: For Japan & Korea, if people want to be more 'profound' in their writing, then they will learn the same numbers of Han characters as China & Taiwan.

Part 2 Nôm characters in addition to Han characters:
Vietnam: No official numbers of 'commonly used' Nôm characters yet, but if you get a hold of the Nôm book with 'Thiên trời, Địa đất, Cử cất, Tồn còn......' (I don't have it but our Nôm experts readers at Viện Việt Học should have it), add up all the characters and divide it by 2. This would roughly give you an idea how many Nôm characters Vietnamese learned before 1945 (in the Nom introduction level), and afterwards in a much smaller scale (number of people) because of the civil war etc, etc !

Personally, I even don't have a statistics of how many I know ! I just keep on learning because there are 50,000 Han characters & 15,000 Nom characters, it's a lifetime commitment and the reward is: You will be amazed by how deep the phylosophy is behind it, that seperates between East and West ! Westerners admire us because they want to know our philosophy and our philosophy is vastly carved in the Han Nom characters !

You may also refer to Nôm I wrote in this forum and do a rough count (I had both Han & Nôm characters in my writing) but I would call it 'writng Nôm' because the grammar is in Vietnamese. My Nôm writing in the forum is Quốc Ngữ/Nôm (Quốc Ngữ Nôm đối chiếu), so you might want to learn from that too because my Nôm writing is a common daily conversation (you can see from Quốc Ngữ).

I wanted to make up a list of Nôm 500 for you but why don't you try to look at my Quốc Ngữ Nôm đối chiếu and learn from there first because it is already written in a daily conversation. After you learn all of it and still want more, we can talk !

There are so many books about learning Nom, 1 of them is:
"Giúp đọc Nôm và Hán Việt" by Trần văn Kiệm. It is a good book.

By the way, what is your Hán Nôm character level ?

Regards
Toàn

Re: Non-chinese-rooted word frequency survey
Posted by: nnt (---.w82-121.abo.wanadoo.fr)
Date: November 20, 2004 06:11PM
Nguyen Anh Tuan wrote:

> Actually by word frequency I meant how
> frequently the words are used... like "what are 500 of the most
> commonly used Viet-rooted words?" Chinese, Japanese, and
> Taiwanese have done a lot of these .


I think Vietnamese are just a little lazier smiling smiley , and I don't know if such work has been done . The question has been raised at another post by another person about Nôm flash cards.

The difficulty also lies in the notion of "Viet rooted word" itself, as linguistic studies have pointed out that many seemingly "pure" Viet words are actually Han words which have been assimilated into Vietnamese with their old Han pronunciation (Han period) even before the constitution of Han Viet pronunciation ( based on middle Chinese i.e. Chinese from Tang/Song time), because Chinese "standard pronunciation" has evolved between these periods.

I think that this list is something you must do yourself. Even if it's only a clue, you can start with basic words : "tai, mắt , mũi , mẹ , em , ruột , thịt ", etc ... which are the core of what in pre-Han Vietnamese language has "resisted" Chinese language invasion.

Re: Non-chinese-rooted word frequency survey
Posted by: Nguyen Anh Tuan (---.nrockv01.md.comcast.net)
Date: November 21, 2004 04:31AM
To both Toan and nnt thank you for your input.

I guess I'll take a look at some books and see which (currently considered) Viet-rooted words are used and try to come up with a survey there.

Toan: My level is that I can proficiently read Quoc Ngu and Han Viet(I also studied Chinese, Japanese, and Korean)... my Nom skills are very much lacking though.

On the question of Vietnamese words which were assimilated before the institution of official HanViet pronunciation, in modern Nom writing would we use the original Han character instead of the constructed Nom(multiple pronunciation per character based on usage like Japanese) or not?

Re: Non-chinese-rooted word frequency survey
Posted by: Quoc Trung (---.dip0.t-ipconnect.de)
Date: November 21, 2004 05:01AM
Hi Nguyen Anh Tuan ,

I, the one who raised this similar question in that post about flash cards, have read alot in this forum and learnt that I have to learn alot before I can ask questions here. For the word frequency I give my input. As a start just count the words. I have once written a script that counts the words in a given text and output them in descending frequency order. If you want the list to contain 500 Nom words and assume that the sample text will consist of 60% Han and 40% Nom/Other then take the first 1250 words of that list. Then from the first to the last word in the list you must check whether it is a Han or a Nom/Other word. If you have a sample of VIQR texts I can let the words be counted for you.

Quoc Trung

Re: Non-chinese-rooted word frequency survey
Posted by: nnt (---.w82-121.abo.wanadoo.fr)
Date: November 21, 2004 05:18AM
Nguyen Anh Tuan wrote:


> On the question of Vietnamese words which were assimilated
> before the institution of official HanViet pronunciation, in
> modern Nom writing would we use the original Han character
> instead of the constructed Nom(multiple pronunciation per
> character based on usage like Japanese) or not?

As Now writing was not standardized and also had much evolved along its history, there was no obligation to use a single form for each Vietnamese "word", even within a single text from the same writer. Not without reason it was said that "Nôm Na là cha mách qué" , even though you could say it also meant freedom and creativity in the invention of characters and pronunciations . A single Han character may have a dozen Nôm pronunciations along its proper Han/Viet pronunciation , so you cannot rely on character representations of Vietnamese words to distinguish whether it was "Viet rooted" or Chinese rooted. And for a single Nôm pronunciation, there could be a dozen characters (Han and made-up characters) representing it.

Quoc Ngu representation of Vietnamese words has an overwhelming advantage over Nôm representation: 1 sound = 1 writing . Any study about a Vietnamese "word" should use it's Quoc Ngu writing as a reference, not its numerous Nôm representations.

Re: Non-chinese-rooted word frequency survey
Posted by: Toan (---.247.dial.accessv.com)
Date: November 21, 2004 05:58AM
Hi Anh Tuan,

Because there were more than 1 Nom characters created over 1000 years representing the same meaning, I pick out the one I normally use, hopefully this might aid you in the learning process.

tôi, là, ai 碎, 羅, 埃
đã, sẽ (right portion only),
làm, việc 爪, 役
đi, ra, về (去多), (羅出), ()
được, chưa 特, 諸
thì, sao 時, 牢
anh, em 英, (male; substitute on left for female)
con, cháu (子昆), (子召)
mà, cái, gì 麻, 丐, 夷
nếu, có 裊, 固
của, nó (固有),
còn, không 群, 空
xin, cho (口千),

You probably already knew this website www.nomfoundation.com, then go to Nom look up and learn the Nôm you like !

Regards
Toàn

Re: Non-chinese-rooted word frequency survey
Posted by: Toan (---.246.dial.accessv.com)
Date: November 21, 2004 06:17AM
Hi Tuan Anh,

Tuan Anh wrote:
>Toan: My level is that I can proficiently read Quoc Ngu and Han Viet(I also studied Chinese, Japanese, and Korean)... my Nom skills are very much lacking though.
Can you clarify ? You meant you "studied Chinese, Japanese & Korean" characters, ie Hán characters, Japanese phiến giả danh & bình giả danh, Korean Hangul ? Or did you mean you studied C/J/K history etc ?
Can you write Quốc Ngữ and chữ Hán proficiently ? Please clarify so that I know what input to share with you ! Because you write it in English, I will somehow have to assume you don't WRITE Quốc Ngữ and Han characters !

Please note: "read" and "write" are distinctly different and it takes effort to move from "read" to "write" Han/Nom characters !

Tuấn Anh xin nói rõ đi để tôi biết đường trả lời cho hữu hiệu ! Vậy thôi chứ không có ý khác !

Regards
Toàn

Re: Non-chinese-rooted word frequency survey
Posted by: Nguyen Anh Tuan (---.nrockv01.md.comcast.net)
Date: November 21, 2004 08:08AM
I can read and write Quoc Ngu and Han Viet, but my composition skills in the two are lacking compared to my English. That's why I type in English, so as to avoid confusion. It seems, though, that I should be more clear when I ask questions here. I can read both scripts (quoc Ngu and Han Viet) with a thorough understanding though. I studied both C/J/K history and language(Han and their respective individual scripts).

I was born in the US, if that gives you any bearing as to why it's easier for me to type in English.

Re: Non-chinese-rooted word frequency survey
Posted by: Toan (---.247.dial.accessv.com)
Date: November 21, 2004 10:17AM
Hi Anh Tuan,

Now it is clear and here are my suggestions:

(1) Since you can write Quốc Ngữ (Quốc Ngữ = thuần Nôm + Hán Việt), individual words in Quốc Ngữ is fine, it is, I believe, efficient to go directly to nomfoundtion.com to learn Nôm and get a feel of Nôm.

(2) Then when you see a list of 3, 4, 5...Nôm characters for 1 single word in Quốc Ngữ, then you come back and ask to see which Nôm character is more "common". "Common" means it was used more often or appeared more in ancient literature (if we know), then our Nom expertwill let you know which single one to learn first (obviously you can't learn 5 single Nôm characters for 1 meaning but ignoring learning other new Nôm words; This saves you time, ie you spread out your effort to learn more Nôm characters for different corresponding Quốc Ngữ words).

(3) After you pick up sufficient individual Nôm characters, then you might want to learn several "duplicate" Nôm characters for a single Quốc Ngữ words.

The above assumes you already have the foundation of Hán characters (as you said).

I don't know how many Han characters you already acquired/learned, but my suggestion is to set a target for yourself, perhaps 1 character/day, then 365 characters/year. Or you might want to double/triple your "vocabularies" to make up 1000 in a year. I would consider this 1000/year would be very very efficient considering elementary students in Japan/Korea learn less than 1000 Han characters in 6 years.

With that approximately 2000 characters for high school graduation, that's the bare minimum. I personally think it is not enough because that 2000 characters only give you knowledge to read regular text, ie not philosophical texts. Since all philosophical texts in these 4 countries are in "văn ngôn" written thousands of years ago, well, you will need to build up more "vocabularies", the difficult ones !

The above Han is on top of Nôm.

Since there are different approaches to learn Hán & Nôm, there might be some other suggestions that might suit your individual needs !

Yes I suggest you be clearer when asking questions so that we can give you the right answer instead of 'assuming', 'guessing' what you want to ask such as:
>I don't believe we need to include chinese-rooted words
Our interpretation was: You don't need Han Viet to speak/write Vietnamese. Then we gave you our views in paragraphs only to find out you meant 'commonly used Nom characters'

But as it says "where there is a will, there is a way" or "hữu chí cánh thành" !

Regards
Toàn



Your Name: 
Your Email: 
Subject: 
Việt mode:     Off   Telex   VNI   VIQR   Combination
Powered by phpWebSite ©.       Theme design © Sharondippity