Google In 2000

From PeacockWiki

Revision as of 06:00, 6 December 2005; Trevorp (Talk | contribs)
(diff) ←Older revision | Current revision | Newer revision→ (diff)
Jump to: navigation, search

Presentation by Jim Reese in October 2000


  • Google Pagerank looks at link structure
    • Spam proof
  • Started with a couple of PCs running Linux in an office in stanford. "Google Alpha"
  • Professor handed them a checque in Sept 1998, and said this is too good, it has to become a company
  • 300 PCs, 500K searches in 1999
  • $25M in June 1999 of VC funds, bought more computers
  • 6K PCs, 50M searches, Oct 2000. 1000 searches/second
  • Sex and MP3 are number 1 searches, excepting day after academy awards, #1 search was "Jennifer Lopez Dress"
  • in 1999, internet had 150M users, estimated to increase to 320M in a few years
  • 500M pages in 1998, 3-8Billion in 2002
  • deep web has more, possibly 2Bill in 2000
  • 1999, 100M searches a day (for all search engines)
    • estimated 500M in 2002
  • Search servers requires massive
    • Download
    • Index Processinh
    • Storage
    • Redundancy and speed
  • this requires a whole lot of computers
  • Goal for requests in <0.5Sec
  • currently ~0.35sec mean
  • 75 full time engineers
  • Page rank uses link structure analysis
  • google has a billion page index
  • Pagerank is an objective measure of importance
    • a form of popularity content
    • each link is a vote
  • Hardware load balancing (paired)
  • Global load balancing
    • tuned daily
  • GWS load balancing
    • balances between index, doc, and add servers
  • Many index servers, each with a shard of the DB
    • +massivly redundany
  • 1 query uses a dosen index+doc servers
    • Uses the fastest responce
  • TCP+UDP
    • UDP for low bandwidth, low data
      • Try again on fail
      • Enhances query speed
      • reduces bandwidth
  • Reliability and fault tolerance very important
  • require 100% uptime
  • Use 1000s of "grey box" PCs
    • Are unreliable, but are cheap and fast
    • Split data across servers
    • replicate across clusters
    • replicate across datacenteres
  • KISS
    • HArdware + software
    • debugging 6K PCs across the world
    • Pipe, router, router, load balancing, server, back end server, apps
  • TCP for GWS - back end
Personal tools