Final Year Project
From Ben Cheng Personal Wiki
Contents |
[edit] Concept Notes
- Term Extraction
- English
- By: Repeated Occurance
- Stop Word List
- Chinese
- By: Repated Occurance + Mutual Information Formula for Bigrams
- Stop Word List
- Problems
- Repeated Occurance not worked for short paragraph
- Google the terms?
- Wiki the terms?
- Clustering? (Shui, Lee 99')
- Very Large Frequency Table for Mutual Information
- Really a problem? (Bigrams only!)
- English extraction by simple symbol is not enough
- What about P&G?
- Tag Recommendation
- Spelling
- Google / Yahoo Spell Check?
- Should be very easy to solve by some statistical method to look for outliner
- Weight
- Factors: Freq + Wikipedia Article Length + Search Engine + Existing Tags Stats.
[edit] Programming
- Python
- Oracle
- PostgreSQL
[edit] Interesting
- Yahoo Tag Cloud
[edit] Topics
[edit] Chinese Term Extraction
- http://technology.chtsai.org/cscanner/
- PAT-tree-based (SIGIR’95)
[edit] Tag Clustering
- Query session log mining (JASIST 2002)
- Anchor text mining
- Search result page mining
- Term clustering (ICDM’02)
[edit] Tag Recommend
- Taxonomy generation (CIKM’04, TOIS’05)
[edit] Recommender System
[edit] Resources
- Topic Modeling
- Community Website
- Internet Application
