Feature dimension reduction method for automatic classification of Chinese text |
Title: |
Feature dimension reduction method for automatic classification of Chinese text |
|
Application Number: |
200410000721 |
Application Date: |
2004/01/16 |
Announcement Date: |
2004/12/29 |
Pub. Date: |
2006/04/19 |
Publication Number: |
1558367 |
Announcement Number: |
1252635 |
Grant Date: |
2006-4-19 |
Granted Pub. Date: |
2006-4-19 |
ApplicationType: |
Invention |
State/Country: |
11[China|beijing] |
IPC: |
G06K 9/80 |
Applicant(s): |
Tsinghua University |
Inventor(s): |
Sun Maosong, Xue Dejun |
Key Words: |
Feature dimension reduction method, automatic classification, Chinese text |
Abstract: |
The present invention features that one characteristic selecting method is first selected to lowering the dimension of original characteristic set to obtain intermediate characteristic set; the intermediate characteristic set is then analyzed to find out 'high superposed binary string' and 'high deviated binary string'; merging the high superposed binary strings into corresponding ternary string and deleting high deviated binary strings to obtain the learning characteristic set for machine to learn; and finally obtaining classifier for use in classifying stage. The present invention makes best use of the characteristics of language, and lowers the dimensions greatly on the basis of intermediate characteristic set to ensure that the selected characteristic possesses high classifying capacity and description capacity, being superior to characteristic selection adopting statistic amount only. |
Claim: |
|
Priority: |
|
PCT: |
|
LegalStatus: |
|
Relevancy information |
|
|
|
|
Last Update |
|
|
 |
2008-4-17 |
|
Selected patents owned by Tsinghua University filed in 2005 are loaded. |
 |
2008-3-31 |
|
Selected patents owned by Tsinghua University filed in 2006 and 2007 are load. |
|
|