Final Year Project

From Ben Cheng Personal Wiki

Jump to: navigation, search

Contents

[edit] Concept Notes

  • Term Extraction
  • English
  • By: Repeated Occurance
  • Stop Word List
  • Chinese
  • By: Repated Occurance + Mutual Information Formula for Bigrams
  • Stop Word List
  • Problems
  • Repeated Occurance not worked for short paragraph
  • Google the terms?
  • Wiki the terms?
  • Clustering? (Shui, Lee 99')
  • Very Large Frequency Table for Mutual Information
  • Really a problem? (Bigrams only!)
  • English extraction by simple symbol is not enough
  • What about P&G?
  • Tag Recommendation
  • Spelling
  • Google / Yahoo Spell Check?
  • Should be very easy to solve by some statistical method to look for outliner
  • Weight
  • Factors: Freq + Wikipedia Article Length + Search Engine + Existing Tags Stats.

[edit] Programming

  • Python
  • Oracle
  • PostgreSQL

[edit] Interesting

  • Yahoo Tag Cloud

[edit] Topics

[edit] Chinese Term Extraction

[edit] Tag Clustering

  • Query session log mining (JASIST 2002)
  • Anchor text mining
  • Search result page mining
  • Term clustering (ICDM’02)

[edit] Tag Recommend

  • Taxonomy generation (CIKM’04, TOIS’05)

[edit] Recommender System

[edit] Resources

  • Topic Modeling
  • Community Website
  • Internet Application
Personal tools