<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6297867</id><updated>2011-06-07T06:45:11.704+07:00</updated><title type='text'>whatwewant.www</title><subtitle type='html'>people don't want to search it. they just want to get it.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://whatwewant.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>burlight</name><uri>http://www.blogger.com/profile/16319624372838515804</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>22</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6297867.post-108202846758034484</id><published>2004-04-15T18:20:00.000+07:00</published><updated>2004-04-15T18:31:45.046+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;A9 : Search Engine from Amazon.com&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Not only web, it searches you books!&lt;br /&gt;Amazon.com know what they have -- books.&lt;br /&gt;&lt;br /&gt;Not fancy enough? A9 also keeps history of search results and site visits.&lt;br /&gt;&lt;br /&gt;Try. &lt;a href="http://a9.com/"&gt;a9.com&lt;/a&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-108202846758034484?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/108202846758034484'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/108202846758034484'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_04_01_archive.html#108202846758034484' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107998277422442711</id><published>2004-03-23T02:10:00.000+07:00</published><updated>2004-03-23T02:17:37.060+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Yahoo! Search&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;"New" web search service from Yahoo! with minimalism look.&lt;br /&gt;And when look inside its functionalities, sometimes I just thinking of Google :)&lt;br /&gt;(it also comes with "cache" functionality)&lt;br /&gt;&lt;br /&gt;Anyway, try it yourself, you may find that it may be more suitable to you than Google.&lt;br /&gt;&lt;a href="http://search.yahoo.com/"&gt;http://search.yahoo.com/&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107998277422442711?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107998277422442711'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107998277422442711'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_03_01_archive.html#107998277422442711' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107662951644061757</id><published>2004-02-13T06:42:00.000+07:00</published><updated>2004-02-13T06:47:48.013+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Who is this guy dude?&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Who is the first author krub? ;)&lt;br /&gt;&lt;br /&gt;Ratanachai Sombatsrisomboon, Yutaka Matsuo, and Mitsuru Ishizuka (2003). &lt;i&gt;&lt;a href="http://www.miv.t.u-tokyo.ac.jp/papers/ratchai-AM2003.pdf"&gt;Aquisition of Hypernyms and Hyponyms from the WWW&lt;/a&gt;&lt;/i&gt;, in Proceedings of 2nd Int'l Workshop on Active Mining (AM2003), pp.7-13, Maebashi, Japan (in conjunction with Int'l Sympo. on Methodologies for Intelligent Systems), October, 2003.&lt;br /&gt;&lt;br /&gt;What does it about?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107662951644061757?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107662951644061757'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107662951644061757'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_02_01_archive.html#107662951644061757' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107593176441522742</id><published>2004-02-05T04:51:00.000+07:00</published><updated>2004-02-07T20:36:58.593+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Rush Hour Intro to IR&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;by Mirella Lapata&lt;br /&gt;(slides for &lt;a href="http://www.dcs.shef.ac.uk/~mlap/teaching/com3110.html"&gt;COM3110 Text Processing&lt;/a&gt; class, Department of Computer Science, University of Sheffield)&lt;br /&gt;&lt;br /&gt;breifly explains Google search, IR, issues in IR, indexing, inverted file, boolean model, vector space model, TF/IDF, term weighting, evaluation, precision, recall, and F-measure.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.dcs.shef.ac.uk/~mlap/teaching/lecture11_handout.pdf"&gt;introduction&lt;/a&gt; | &lt;a href="http://www.dcs.shef.ac.uk/~mlap/teaching/lecture12_handout.pdf"&gt;term manipulation &amp; evaluation&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color:#666666"&gt;dude: &lt;a href="http://www.tcnj.edu/%7Emmmartin/CMSC485/Papers/Google/icde.pdf"&gt;Web Information Retrieval&lt;/a&gt;, cool tutorial by google's research director, Monika Henzinger&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107593176441522742?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107593176441522742'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107593176441522742'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_02_01_archive.html#107593176441522742' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107582213156834013</id><published>2004-02-03T22:25:00.000+07:00</published><updated>2004-02-04T06:26:50.186+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Managing Gigabytes&lt;/strong&gt; (Book)&lt;br /&gt;&lt;br /&gt;Sometimes it's more than just 'search'. We may want it 'faster', and many times we want it 'smaller'.&lt;br /&gt;(And for the case of database/index size, smaller one is probably the faster one -- less things to looking for.)&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;a href="http://www.cs.mu.oz.au/mg/"&gt;Managing Gigabytes: Compressing and Indexing Documents and Images&lt;/a&gt;&lt;/i&gt; by Ian H. Witten, Alistair Moffat, and Timothy C. Bell. (&lt;a href="http://www.cs.waikato.ac.nz/~singlis/mg.html"&gt;read reviews&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;From the authors of the book, &lt;a href="http://www.mds.rmit.edu.au/mg/"&gt;MG&lt;/a&gt;, an open-source indexing and retrieval system for text, images, and textual images.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107582213156834013?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107582213156834013'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107582213156834013'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_02_01_archive.html#107582213156834013' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107582182968889964</id><published>2004-02-03T22:16:00.000+07:00</published><updated>2004-02-04T06:27:07.500+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Google File System&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;How to search things from a collection is one problem.&lt;br /&gt;How to keep things (in a collection) for a searching is another problem.&lt;br /&gt;&lt;br /&gt;And the latter one could be a really big problem, if you have to keep "3,307,998,701 web pages" like Google does.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;a href="http://www.cs.rochester.edu/sosp2003/papers/p125-ghemawat.pdf"&gt;Google File System: Technical paper&lt;/a&gt;&lt;/i&gt;, by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. This is a technical paper that explains Google's custom scalable cluster filesystem for storing their gigantic database of the entire Web across thousands of low-cost PCs.&lt;br /&gt;&lt;br /&gt;From &lt;a href="http://google.blogspace.com/archives/001040"&gt;Google Weblog&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107582182968889964?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107582182968889964'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107582182968889964'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_02_01_archive.html#107582182968889964' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107580758368856592</id><published>2004-02-03T18:15:00.000+07:00</published><updated>2004-02-03T18:30:10.623+07:00</updated><title type='text'></title><content type='html'>&lt;b&gt;Hypertext&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The idea of Hypertext first recognized in &lt;a href="http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm"&gt;As we may think&lt;/a&gt;, an article by Vannevar Bush in The Atlantic Monthly, July 1945. &lt;br /&gt;Now WWW is the largest hypertext system created in 1991 by Tim Berners-Lee, (the first web page, web server, browser).&lt;br /&gt;&lt;br /&gt;&lt;a href="http://en.wikipedia.org/wiki/Timeline_of_hypertext_technology"&gt;Timeline of Hypertext technology&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107580758368856592?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107580758368856592'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107580758368856592'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_02_01_archive.html#107580758368856592' title=''/><author><name>burlight</name><uri>http://www.blogger.com/profile/16319624372838515804</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107563661104670509</id><published>2004-02-01T18:29:00.000+07:00</published><updated>2004-02-03T22:18:38.983+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Vector Space Model and TF-IDF&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Boolean model&lt;/strong&gt; -terms in a document is equally weighted as 1 (exist) or 0 (not exist), and documents that satisfy a input query are returned without ranking.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Vector Space Model&lt;/strong&gt; -view each document in a database as a vector in a vector space where number of dimensions is a number of terms in all documents in a database. The length of vector in each dimension is determined by weighting algorithm (TF-IDF is most used for this). Input query also viewed as a vector in that space, and documents near the query vector are returned and ranked (by distance; closer higher rank) as a result.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;TF-IDF&lt;/strong&gt; -weighting algorithm widely used in IR. Stands for &lt;u&gt;t&lt;/u&gt;erm &lt;u&gt;f&lt;/u&gt;requency - &lt;u&gt;i&lt;/u&gt;nverse &lt;u&gt;d&lt;/u&gt;ocuments &lt;u&gt;f&lt;/u&gt;requency. The idea is terms which appears frequently in one document, but less-frequently in other documents (in database or corpus) are considered as important terms in that document (high TF-IDF weight).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107563661104670509?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107563661104670509'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107563661104670509'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_02_01_archive.html#107563661104670509' title=''/><author><name>burlight</name><uri>http://www.blogger.com/profile/16319624372838515804</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107563468328596271</id><published>2004-02-01T18:15:00.000+07:00</published><updated>2004-02-03T17:23:19.590+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Information Need&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;About our slogan "people don't want to search it. they just want to get it".&lt;br /&gt;There are a number of studies concerning "Information need and Information seeking".&lt;br /&gt;&lt;a href="http://choo.fis.utoronto.ca/FIS/Courses/LIS1325/QuestionNego.pdf"&gt;This &lt;/a&gt; seems to be a good tutorial.&lt;br /&gt;&lt;br /&gt;One of the most famous is Taylor's 4 levels of Information need. &lt;br /&gt;&lt;br /&gt;Taken from page 10 of the slides.&lt;br /&gt;&lt;br /&gt;Q1- Visceral need&lt;br /&gt;Actual, but unexpressed need for information &lt;br /&gt;Feeling of unease, doubt, uncertainty &lt;br /&gt;Vague sense of dissatisfaction &lt;br /&gt;Hard to express in words &lt;br /&gt;&lt;br /&gt;or in short, .... "(sometimes) people don't know what they want. (but still) they just want to get it!"&lt;br /&gt;&lt;br /&gt;&lt;span style="color:#666666;"&gt;&lt;b&gt;bact':&lt;/b&gt; This book [R. Belew. &lt;i&gt;&lt;a href="http://www.cs.ucsd.edu/~rik/foa/"&gt;Finding Out About: A Cognitive Perspective on Search Engines and the WWW.&lt;/a&gt;&lt;/i&gt; Cambridge University Press, Cambridge, 2000.] investigates and try to describes IR from the cognitive perspective (what human/user think, percept, behave, ..).&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107563468328596271?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107563468328596271'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107563468328596271'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_02_01_archive.html#107563468328596271' title=''/><author><name>burlight</name><uri>http://www.blogger.com/profile/16319624372838515804</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107546439527295231</id><published>2004-01-30T19:05:00.000+07:00</published><updated>2004-01-30T19:08:48.623+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Google Weblog&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;bact': "Everything you want and don't want to know about Google" :)&lt;br /&gt;&lt;br /&gt;&lt;a href="http://google.blogspace.com/"&gt;http://google.blogspace.com/&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107546439527295231?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107546439527295231'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107546439527295231'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107546439527295231' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107461915354078690</id><published>2004-01-21T00:14:00.000+07:00</published><updated>2004-01-21T00:21:13.280+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Multi-Documents Summarization&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.summarization.com/"&gt;Text Summarization&lt;/a&gt;, a home of &lt;a href="http://www.summarization.com/mead/"&gt;MEAD&lt;/a&gt; (a public domain portable multi-document summarization system).&lt;br /&gt;&lt;br /&gt;a summary of a collection of documents (which may comes from an automatic clustering) will help user decide if he/she wants to investigate that collection further or not -- a time saving feature :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107461915354078690?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107461915354078690'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107461915354078690'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107461915354078690' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107452002021887418</id><published>2004-01-19T20:45:00.000+07:00</published><updated>2004-02-03T18:01:15.780+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Thumbshots for search engine -- "Stop Guessing. Take Control!"&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Thumbshots.org. Featuring small picture of each webpage, so users have more clue if it a site they looking for or not.&lt;br /&gt;&lt;br /&gt;[&lt;a href="http://open.thumbshots.org/Computers/Artificial_Intelligence/Natural_Language/Research_Groups/"&gt;click here for example&lt;/a&gt;]&lt;br /&gt;&lt;br /&gt;&lt;span style="color:#666666;"&gt;&lt;b&gt;dude:&lt;/b&gt; What I don't understand is, how do they get so many sites adding code for them? (from the &lt;a href="http://www.thumbshots.org/dynamicintegration.pxf"&gt;this page&lt;/a&gt; site owners need to add a line of code to get the thumbshot on that directory, right? They have got &lt;/span&gt;&lt;a href="http://open.thumbshots.org"&gt;all this&lt;/a&gt;?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107452002021887418?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107452002021887418'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107452002021887418'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107452002021887418' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107439836611740255</id><published>2004-01-18T10:57:00.000+07:00</published><updated>2004-01-18T11:01:21.810+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Topic Clustering&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;There are more than Vivisimo out there, read about &lt;a href="http://www.faganfinder.com/search/clustering.shtml"&gt;Topic Clustering&lt;/a&gt; at &lt;b&gt;Fagan Finder&lt;/b&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107439836611740255?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107439836611740255'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107439836611740255'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107439836611740255' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107438961310564922</id><published>2004-01-18T08:32:00.000+07:00</published><updated>2004-01-18T08:35:53.153+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;C|NET News: Search may be Microsoft's next target, court told&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Microsoft may be unlawfully wielding its desktop dominance to put the squeeze on search engines and on document formats like Adobe Acrobat, the state of Massachusetts claimed on Friday.&lt;br /&gt;&lt;br /&gt;[&lt;a href="http://news.com.com/2100-1016_3-5142763.html?tag=nefd_top"&gt;READ THE NEWS&lt;/a&gt;]&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107438961310564922?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107438961310564922'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107438961310564922'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107438961310564922' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107393504934411433</id><published>2004-01-13T01:38:00.000+07:00</published><updated>2004-01-13T02:23:42.650+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Web Graphs&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;All people in computer science and some fields of engineering (e.g. industrial engineering?) are very familiar with "Graphs" -- those nodes and arcs. And, actually, we can represent the web as a [huge] graph. Where node=webpage, arc=(hyper)link.&lt;br /&gt;&lt;br /&gt;From this representation, it gives us a way to understand the characteristic of the web better (as we do well with normal graphs).&lt;br /&gt;&lt;a href="http://www9.org/w9cdrom/160/160.html"&gt;graph structure in the web&lt;/a&gt; | &lt;a href="http://www.dcs.kcl.ac.uk/staff/ccooper/store/ESA2001.ps" title="C. Cooper, A. M. Frieze. (2001), A general model of web graphs"&gt;web graph&lt;/a&gt; | &lt;a href="http://www.dcs.kcl.ac.uk/staff/ccooper/papers.html"&gt;more on web graph&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Peer-to-Peer&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;Talking about representing document/site as a node in a graph, &lt;a href="http://www.openp2p.com/" title="P2P"&gt;Peer-to-Peer&lt;/a&gt; people already done this since their early day.&lt;br /&gt;&lt;br /&gt;Making it more relavant to this blog, one of the most popular P2P application is obviously an IR-like system -- search for mp3 song or DivX movie, given a title or singer's name.&lt;br /&gt;&lt;br /&gt;Searching things on P2P network is not like a traditional search engine searching its database (which is a snapshot of a part of the web at a particular time, collected by spiders/web spiders).&lt;br /&gt;&lt;br /&gt;Rather, the P2P search visits each node, doing searching in that node, jump to other node .. and so on, in "real time". Clearly, it is impossible to visits every nodes in the network, there are just too many nodes out there. To decide which node it will make a visit or not, it needs a routing algorithm.&lt;br /&gt;&lt;br /&gt;As a result, we can simplified a search problem in P2P network as a routing problem, loosely.&lt;br /&gt;[ to find a document is to find a way to that document ]&lt;br /&gt;&lt;br /&gt;There are even some more advance routing algorithm that use &lt;a href="http://citeseer.nj.nec.com/garcia02semantic.html" title="Arturo Crespo, Hector Garcia-Molina (2002), Semantic Overlay Networks for P2P Systems"&gt;semantics&lt;/a&gt;!&lt;br /&gt;&lt;br /&gt;&lt;span style="color:#666666;"&gt;&lt;b&gt;bact':&lt;/b&gt; I used to think about using NLP with P2P routing. But it just "thinking" anyway, never do .. lazy me :(&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107393504934411433?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107393504934411433'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107393504934411433'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107393504934411433' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107386182284357934</id><published>2004-01-12T05:46:00.000+07:00</published><updated>2004-01-13T01:38:19.543+07:00</updated><title type='text'></title><content type='html'>Talking about &lt;a href="http://www.dcs.gla.ac.uk/~iain/keith/data/concepts/91.htm" title="Concept: Document Clustering"&gt;Document Clustering/Categorization&lt;/a&gt;/&lt;a href="http://www.dcs.gla.ac.uk/~iain/keith/data/pages/36.htm" title="Automatic Classification"&gt;Classification&lt;/a&gt;, about 'approach' to aid user access to montains of pages may be a &lt;a href="http://citeseer.nj.nec.com/375862.html" title="Dragomir R. Radev, Weiguo Fan (2000), Automatic summarization of search engine hit lists"&gt;Summarization&lt;/a&gt;.&lt;br /&gt;Instead of just only page title, url, and few first (nonsense) paragraphs from the page.&lt;br /&gt;Short summaries may help users to decide which pages are &lt;a href="http://whatwewant.blogspot.com/" title="whatwewant"&gt;whattheywant&lt;/a&gt; and whattheydontwant.&lt;br /&gt;&lt;br /&gt;นอกจากจะแบ่งกลุ่มเอกสารที่หามาได้ ให้หา(ต่อโดยผู้ใช้ว่าอันไหนจะเอา อันไหนไม่เอา)ง่ายๆ แล้ว&lt;br /&gt;ถ้าเรามีเนื้อหาย่อๆ ของเอกสารแต่ละหน้า ก็น่าจะทำให้ผู้ใช้ตัดสินใจได้ง่ายขึ้น เร็วขึ้น&lt;br /&gt;อ่านเปเปอร์ข้างล่าง ถ้าสนใจ:&lt;br /&gt;&lt;br /&gt;For papers about Summarization for Search Engine, try starts from here:&lt;br /&gt;Dragomir R. Radev, Weiguo Fan (2000), &lt;a href="http://citeseer.nj.nec.com/375862.html" title="from CiteSeer"&gt;"Automatic summarization of search engine hit lists"&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;CiteSeer? Hey! Citation graph is also another feature that we can use, .. have no idea about it yet.&lt;br /&gt;จริงๆ การใช้หลักของ citation ในเปเปอร์ มันก็ช่วยบอกอะไรบางอย่างเกี่ยวกะ "ความสำคัญ" และ "ความเกี่ยวข้อง" ของเอกสารได้&lt;br /&gt;ถ้าอ้างถึงกัน มันก็น่าจะเกี่ยวกัน และถ้าถูกอ้างถึงบ่อย ก็แสดงว่ามันน่าจะสำคัญ (ทำนอง PageRank เลย?) &lt;br /&gt;&lt;br /&gt;?&lt;br /&gt;&lt;span style="color:#666666;"&gt;&lt;br /&gt;&lt;b&gt;keng&lt;/b&gt; ใช่ดิ web structure mining ไง &lt;a href="http://www.google.com/search?q=related%3Ahttp%3A%2F%2Fwww.manager.co.th"&gt;อย่างrelated: url ของgoogle&lt;/a&gt;รึว่าของsearch engineอื่นๆก็ใช้อันนี้แหละ (discovery of web community)&lt;br /&gt;&lt;br /&gt;เอาอีกแล้ว อะไรจะบังเอิญขนาดนี้ พูดเรื่อง authority/hub อยู่พอดีวันนี้ (@siit.net)&lt;br /&gt;"ถ้าถูกอ้างถึงบ่อย ก็แสดงว่ามันน่าจะสำคัญ" -&gt; authority. &lt;br /&gt;กลับกันเรียกว่า hub (อ้างถึงที่สำคัญบ่อยๆ)&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www10.org/cdrom/posters/1043.pdf"&gt;ตัวอย่าง paperของหาrelated page&lt;/a&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="color:#666666;"&gt;&lt;br /&gt;&lt;b&gt;bact':&lt;/b&gt; Yes, this means "web structure" does not limited to "hyperlinks" only. But could be any kind of link or structure, and possibly an internal link within the same document (some summarization techniques use this "internal links" to find out "most relevance sentences", and select them to form a summary).&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107386182284357934?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107386182284357934'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107386182284357934'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107386182284357934' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107379694845857132</id><published>2004-01-11T11:52:00.000+07:00</published><updated>2004-01-12T06:04:04.603+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;PageRank explained&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://pr.efactory.de/"&gt;A Survey of Google's PageRank&lt;/a&gt;&lt;br /&gt;PageRank is one of algorithms used by Google search engine.&lt;br /&gt;If you want to know how PageRank works, this is the site.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;dudeอธิบายวิธีคำนวณPageRankด้วยภาษาไทยหนึ่งบรรทัด&lt;/strong&gt;&lt;br /&gt;PageRankของเพจใดๆคำนวณหาด้วยการนำค่าPageRankหารด้วยจำนวนลิงก์ออกของเพจที่ลิงก์มาหาทั้งหมดมารวมเข้าด้วยกัน&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107379694845857132?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107379694845857132'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107379694845857132'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107379694845857132' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107379473470876588</id><published>2004-01-11T11:18:00.000+07:00</published><updated>2004-01-12T13:18:50.580+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Vivisimo, a threat to Google's search throne?&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;วันก่อนได้รู้จักกับ &lt;a href="http://vivisimo.com"&gt;Vivisimo&lt;/a&gt; และประทับใจกับความสามารถของมัน ที่สามารถแบ่งผลลัพธ์ของการค้นหาออกเป็นกลุ่มๆ ได้อย่างรวดเร็ว โดยใช้ document clustering technique (ซึ่งไม่รู้ว่ามีsearch engineทางการค้าที่ไหนเคยนำเทคนิคนี้มาใช้ก่อนรึเปล่า)&lt;br /&gt;&lt;br /&gt;vivisimoเป็นmeta-searchยังต้องพึ่งsearch engineของที่อื่นอยู่ ซึ่งทำให้มีของเสียคือ&lt;br /&gt;1) ช้า ต้องรอผลของจากsearch engineที่อื่น&lt;br /&gt;2) precision &amp; recall ขึ้นอยู่กับผลลัพธ์ของsearch engineที่เอามาใช้&lt;br /&gt;&lt;br /&gt;ข้อดีของvivisimoคือสามารถแบ่งผลลัพธ์ออกเป็นกลุ่มๆ ได้อย่างรวดเร็ว ซึ่งแทนที่ผู้ใช้จะต้องเสียเวลาหาเอกสารที่ต้องการจากเอกสารจำนวนมากมาย ผู้ใช้สามารถเลือกค้นหาเฉพาะในกลุ่มเอกสารที่สนใจได้&lt;br /&gt;แต่ด้วยสถานการณ์ของการใช้search engineที่หลากหลาย vivisimoก็มีเอาไว้ใช้แค่ในบางโอกาส ที่อยากจะเรียนรู้เกี่ยวกับสิ่งที่ค้นหา แต่จะไม่ใช่เป็นsearch engineหลักของผู้ใช้&lt;br /&gt;&lt;br /&gt;ปัญหาคือประสิทธิภาพในการแบ่งกลุ่ม ถ้าไม่สามารถแบ่งกลุ่มได้ดีมีความหมายสำหรับผู้ใช้ ผู้ใช้ก็จะเลิกใช้เพราะว่าgoogleให้ผลลัพธ์ที่เร็วและน่าเชื่อถือกว่า (น่าเชื่อถือกว่าเพราะgoogleชื่อเสียงดีกว่า) vivisimoก็จะเป็นเพียงแค่ของเล่น ดูเพลินๆ &lt;br /&gt;&lt;br /&gt;แต่ถ้าgoogleนำเทคนิคนี้มาใช้เมื่อไหร่และสามารถแบ่งกลุ่มได้ดีเท่าvivisimo vivisimoก็คงจะนอนทันที (vivisimoเป็นผู้เชี่ยวชาญเรื่องdocument clustering ถ้าเรื่องที่ตัวเองเชี่ยวแล้วทำได้ยังแพ้googleก็สมควรนอน) &lt;br /&gt;&lt;br /&gt;อะไรจะเกิดขึ้น?&lt;br /&gt;1) Googleจะให้บริการdocument clusteringด้วย แต่เป็นเพียงแค่ฟังก์ชั่นเสริมสำหรับให้ผู้ใช้ใช้ในบางโอกาส (Googleอาจจะทำเองรึว่าซื้อvivisimoซะเลย  รึไม่ก็อาจจะอยู่แค่ใน&lt;a href="http://labs.google.com/"&gt;google labs&lt;/a&gt; โชว์ว่าตัวเองก็ทำได้)&lt;br /&gt;2) Googleไม่สนใจเพราะคิดว่าเป็นแค่ของเล่น มุ่งเน้นสิ่งที่จำเป็นดีกว่า&lt;br /&gt;&lt;br /&gt;&lt;span style="color:#666666;"&gt;&lt;b&gt;bact':&lt;/b&gt; &lt;a href="http://edition.cnn.com/2004/TECH/internet/01/05/seeing.search1.ap/index.html" title="Better search results than Google?"&gt;ข่าว CNN&lt;/a&gt; กะที่ Slashdot.org สองอัน &lt;a href="http://slashdot.org/article.pl?sid=04/01/05/1839233&amp;mode=thread&amp;tid=126&amp;tid=185&amp;tid=95" title="Better Search Results Than Google?"&gt;อันนี้&lt;/a&gt; กะ &lt;a href="http://slashdot.org/article.pl?sid=01/08/14/1726218&amp;mode=thread&amp;tid=95" title="Searching For Google's Successor"&gt;อันนี้&lt;/a&gt; เกี่ยวกับ Vivisimo&lt;br /&gt;ไม่คิดว่า Vivisimo จะต้องการทำ search engine นะ ที่เวบมันเหมือนเป็น demo มากกว่า ว่าตัว clustering engine ของเค้าทำอะไรได้บ้าง .. คือเหมือนของที่เค้าขายจริงๆ คือ clustering engine มากกว่าน่ะ (แล้วลูกค้ากลุ่มนึงที่เค้าจะขาย ก็คือ search engine ด้วย)&lt;br /&gt;อีกอย่าง คิดว่า document clusering ก็เป็นสิ่งจำเป็นเหมือนกันนะ สำหรับทุกวันนี้ซึ่งจำนวนข้อมูลมันเยอะเหลือเกิน ถ้ามีใครมาแบ่งให้ ก็น่าจะ(คน)หาง่ายขึ้น -- ประมาณหมายเลขชั้นในห้องสมุด&lt;br /&gt;&lt;b&gt;dude:&lt;/b&gt; obviously you are right. In vivisimo's homepage they state clearly document clustering &lt;b&gt;ENGINE&lt;/b&gt;, not "Hey! we are coming now in SE folks". โง่วะ อายจังเลย&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107379473470876588?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107379473470876588'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107379473470876588'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107379473470876588' title=''/><author><name>burlight</name><uri>http://www.blogger.com/profile/16319624372838515804</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107378366237499288</id><published>2004-01-11T08:02:00.000+07:00</published><updated>2004-01-13T01:18:09.183+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;Information Retrieval (and related) research groups in Thailand&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.crl-asia.org/tcl/" title="Communications Research Laboratory Asia"&gt;CRL Thai Computational Linguistics Lab&lt;/a&gt;&lt;br /&gt;&lt;a href="http://mind.cp.eng.chula.ac.th/" title="Chulalongkorn University"&gt;CU Machine Intelligence and Knowledge Discovery Lab&lt;/a&gt;&lt;br /&gt;&lt;a href="http://naist.cpe.ku.ac.th/" title="Kasetsart University"&gt;KU NAiST Lab&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.cs.sci.ku.ac.th/~ThaiIr/" title="Kasetsart University"&gt;KU Intelligent Information Retrieval and Database Lab&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.links.nectec.or.th/itech/i4.html" title="National Electronics and Computer Technology Center"&gt;NECTEC RD-I4: Text Processing Technology Group&lt;/a&gt;&lt;br /&gt;&lt;a href="http://kind.siit.tu.ac.th/" title="Sirindhorn International Institute of Technology, Thammasat University"&gt;SIIT Knowledge Information &amp; Data Management Lab&lt;/a&gt;&lt;br /&gt;&lt;a href="http://kind.siit.tu.ac.th/irdm/" title="IR and DM, for Very Large Scaled Information on the Internet"&gt;SIIT KIND IRDM&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107378366237499288?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107378366237499288'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107378366237499288'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107378366237499288' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107369037351043155</id><published>2004-01-10T06:14:00.000+07:00</published><updated>2004-01-11T23:32:14.730+07:00</updated><title type='text'></title><content type='html'>เมืองไทยมีที่ไหนทำ Question-Answering บ้างรึเปล่า?&lt;br /&gt;&lt;br /&gt;มันก็ประมาณ Information Extraction อะไรประมาณนี้แหละ&lt;br /&gt;ที่ &lt;a href="http://trec.nist.gov/" title="Text REtrieval Conference"&gt;TREC&lt;/a&gt; ก็มี &lt;a href="http://trec.nist.gov/data/qa.html" title="TREC Question-Answering Track"&gt;QA Track&lt;/a&gt; ด้วย&lt;br /&gt;ปีที่ผ่านมา (2003) ทีมจากมหาลัยของสิงคโปร์ (&lt;a href="http://www.nus.edu.sg/" title="National University of Singapore"&gt;NUS&lt;/a&gt;) ได้ที่ 3 จากทีมทั้งหมดที่ร่วมประเมิน และเป็นที่ 1 ถ้านับเฉพาะสถาบันการศึกษา .. อันนี้พูดไปงั้นๆ แบบว่าเผื่อจะจุดประกายอะไรในเมืองไทยมั่ง :P&lt;br /&gt;&lt;br /&gt;&lt;span style="color:#666666;"&gt;&lt;b&gt;dude:&lt;/b&gt; เคยอ่านเจอว่ามหาลัยเกษตรก็ส่งไป TREC เหมือนกัน ก็ไม่รู้เหมือนกันว่ารวมๆ แล้ว ที่ไทยทำกันบ้างรึเปล่า&lt;br /&gt;เกี่ยวกับเรื่องการวิจัยนิดหน่อย ในสาขาไอทีนี้ ชื่อก็บอกแล้วว่าเป็นเทคโนโลยีด้านข้อมูล แต่ก่อนข้อมูลที่คอมพิวเตอร์ประมวลผลได้ ต้องเป็นอะไรที่ไม่สับซ้อน แต่เดียวนี้เทคโนโลยีก้าวหน้า คอมพิวเตอร์เริ่มสามารถเข้าใจภาษามนุษย์กันได้แล้ว แต่ก็เป็นภาษาๆ ไป ถ้าภาษาไทยคนไทยไม่ทำแล้วใครจะทำ คนต่างชาติทำให้ แล้วก็ให้ชาวนาปลูกกระหล่ำปีไปแลก ส่วนวิศวกรไทยก็ทำหน้าที่เป็นล่าม? เห็นนักวิจัยญี่ปุ่นที่นี่จดสิทธิบัตร วิธีการป้อนภาษาไทยลงมือถือแล้วมันน่าเจ็บใจ นักวิชาการไทยทำอะไรอยู่? วิจัยมาตราฐานใหม่ด้านนู้นด้านนี้สำหรับโลก? วิจัยเสร็จก็เหลือแต่เป็น CV หนึ่งบรรทัด  .. อันนี้พูดไปงั้นๆ แบบว่าเผื่อจะจุดประกายอะไรในเมืองไทยมั่ง :D&lt;/span&gt;&lt;br /&gt;&lt;span style="color:#666666;"&gt;&lt;b&gt;bact':&lt;/b&gt; เอาจริงดิ จริงๆ ถ้าพูดถึงกลุ่มด้าน IR แล้ว เกษตรคงใหญ่สุดในเมืองไทย เพิ่งได้รับทุนจาก NECTEC ไปด้วย หลายล้านอยู่ ให้ทำระบบช่วยตัดสินใจสำหรับคณะรัฐมนตรี ทำนองนั้น (&lt;a href="http://naist.cpe.ku.ac.th/"&gt;NAiST Lab&lt;/a&gt; นำโดยอ.อัศนีย์ เป็น Center of Exellence ในโครงการความร่วมมือกับหน่วยงานวิจัยภายนอกของ NECTEC ด้วย) .. แต่คงลง track อื่นมั้ง ไม่ใช่ QA track&lt;br /&gt;พูดถึงเรื่องนักวิจัยญี่ปุ่น จริงๆ ก็มีนักวิจัยไทยทำงานด้านภาษาไทยอยู่ในญี่ปุ่นเยอะเหมือนกัน อาจจะมีคนญี่ปุ่น lead บางตัว แต่ก็มีที่ทำเองหมดไม่ใช่น้อย แน่นอนว่าทำเสร็จแล้ว ผลงานก็ต้องเป็นของหน่วยงานที่สังกัด อันนี้มันก็อาจจะน่าเจ็บใจ แต่คิดอีกที ถ้าเอาคนกลุ่มนี้กลับไปอยู่เมืองไทย จะมีใครให้โอกาสเค้าทำอะไรรึเปล่า?&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107369037351043155?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107369037351043155'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107369037351043155'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107369037351043155' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107354054655611065</id><published>2004-01-08T12:30:00.000+07:00</published><updated>2004-01-11T23:29:45.060+07:00</updated><title type='text'></title><content type='html'>For a quick start, we may 'captured' those posts in &lt;a href="http://siit.net/webboard/"&gt;siit.net webboard&lt;/a&gt; and put them here.&lt;br /&gt;&lt;br /&gt;Btw, I'm very very new to blogging.  Don't sure whether the nature of blogs can well used for a 'structured'/'organized' resources or not (e.g. links repository, faqs/QA).&lt;br /&gt;&lt;span style="color:#666666;"&gt;&lt;b&gt;dude:&lt;/b&gt; อยากรู้เหมือนกันว่าได้ปล่าว แต่ต้องได้ดิ&lt;br /&gt;&lt;b&gt;bact':&lt;/b&gt; let's see.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;And what's the primary language we will use in this blog?&lt;br /&gt;จะใช้ภาษาไทย หรือ ภาษาอังกฤษดี เวบแบบนี้ ภาษาอังกฤษมันมีอยู่แล้วรึเปล่า&lt;br /&gt;แบบว่า อยากให้คนไทยได้อ่านขนาดไหน ทำนองนั้น&lt;br /&gt;&lt;span style="color:#666666;"&gt;&lt;b&gt;dude:&lt;/b&gt; คิดเหมือนกัน. ตอนแรกก็ว่าจะให้เป็นภาษาไทย เพราะว่ามันคงมีทำนองนี้เป็นภาษาอังกฤษอยู่แล้ว แต่คิดอีกแง่หนึ่ง ถ้าเป็นภาษาอังกฤษ คนไทยก็อ่านได้ คนชาติอื่นก็อ่านได้ จะเอาไปอ้างอิงที่ไหนก็ได้ไม่ต้องแปลอีกที ตอนแรกว่าตั้งใจจะทำเป็นภาษาไทยเพื่อจะได้ช่วยให้เวบไทยมีอะไรเป็นสาระทางด้านนี้ด้วย แต่ถามตอนนี้เราว่าอังกฤษ&lt;br /&gt;&lt;b&gt;bact':&lt;/b&gt; OK, the primary language will be English.  Anyway, for my post will try to be bi-lingual, where possible.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107354054655611065?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107354054655611065'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107354054655611065'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107354054655611065' title=''/><author><name>bact'</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_qK_vdUsGM5s/S8sZzCfFp2I/AAAAAAAAFYA/4YMhX6cgl4w/S220/pedestrian-600.png'/></author></entry><entry><id>tag:blogger.com,1999:blog-6297867.post-107350491491701384</id><published>2004-01-08T02:44:00.000+07:00</published><updated>2004-01-08T12:59:18.706+07:00</updated><title type='text'></title><content type='html'>&lt;strong&gt;To do list.&lt;/strong&gt;&lt;br /&gt;- papers page (link to papers about IR and related. with summary would be nice)&lt;br /&gt;- search engine page (list of search engines with review and score?)&lt;br /&gt;- probably with QA, Directory, Portal and stuffs too.&lt;br /&gt;- News on the first page blog style (just post with some comments)&lt;br /&gt;- what else?&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6297867-107350491491701384?l=whatwewant.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107350491491701384'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6297867/posts/default/107350491491701384'/><link rel='alternate' type='text/html' href='http://whatwewant.blogspot.com/2004_01_01_archive.html#107350491491701384' title=''/><author><name>burlight</name><uri>http://www.blogger.com/profile/16319624372838515804</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author></entry></feed>
