Google In 2000
From PeacockWiki
Revision as of 06:00, 6 December 2005; Trevorp (Talk | contribs)
(diff) ←Older revision | Current revision | Newer revision→ (diff)
(diff) ←Older revision | Current revision | Newer revision→ (diff)
Presentation by Jim Reese in October 2000
- Google Pagerank looks at link structure
- Spam proof
- Started with a couple of PCs running Linux in an office in stanford. "Google Alpha"
- Professor handed them a checque in Sept 1998, and said this is too good, it has to become a company
- 300 PCs, 500K searches in 1999
- $25M in June 1999 of VC funds, bought more computers
- 6K PCs, 50M searches, Oct 2000. 1000 searches/second
- Sex and MP3 are number 1 searches, excepting day after academy awards, #1 search was "Jennifer Lopez Dress"
- in 1999, internet had 150M users, estimated to increase to 320M in a few years
- 500M pages in 1998, 3-8Billion in 2002
- deep web has more, possibly 2Bill in 2000
- 1999, 100M searches a day (for all search engines)
- estimated 500M in 2002
- Search servers requires massive
- Download
- Index Processinh
- Storage
- Redundancy and speed
- this requires a whole lot of computers
- Goal for requests in <0.5Sec
- currently ~0.35sec mean
- 75 full time engineers
- Page rank uses link structure analysis
- google has a billion page index
- Pagerank is an objective measure of importance
- a form of popularity content
- each link is a vote
- Hardware load balancing (paired)
- Global load balancing
- tuned daily
- GWS load balancing
- balances between index, doc, and add servers
- Many index servers, each with a shard of the DB
- +massivly redundany
- 1 query uses a dosen index+doc servers
- Uses the fastest responce
- TCP+UDP
- UDP for low bandwidth, low data
- Try again on fail
- Enhances query speed
- reduces bandwidth
- UDP for low bandwidth, low data
- Reliability and fault tolerance very important
- require 100% uptime
- Use 1000s of "grey box" PCs
- Are unreliable, but are cheap and fast
- Split data across servers
- replicate across clusters
- replicate across datacenteres
- KISS
- HArdware + software
- debugging 6K PCs across the world
- Pipe, router, router, load balancing, server, back end server, apps
- TCP for GWS - back end