The Internet continues to grow at an astronomical pace, and is becoming more and more a part of our daily routine. It is little surprise then, that Google is now reporting that the web now has more than one trillion (that’s trillion with a “T”) unique web pages.
In actuality, Google admits that there are much more than one trillion URLs, and that they have no clue now many unique web pages there might be. They are only reporting that the massive Google search engine has identified one trillion unique URLs, no small feat in itself!
Google uses an undisclosed number of ultra-high capacity, high-speed drives to keep tabs of all the pages it categorizes. And not only do they keep track of the pages’ contents, the search engine also takes a “screenshot” of the site (a mini picture of the page) for its archives.
To complicate matters even further, Google records a wide variety of data related to the page, including the number of other websites linking to and from the page, and the predominate keywords associated with each. Finally, a cross-reference “map” of sorts is created which shows the pages’ standing and its connection to other sites online. These factors help Google determine the relative importance of the page (in comparison with others) and how it will be ranked in the search engine results when someone performs a Google search online.
As you might imagine, all of this data gathering and organizing takes an incredible amount of computer processing power and storage space. It also requires a very sophisticated set of software tools that are capable of thoroughly scanning millions of web pages in a single day.
The software “agent” that Google uses to constantly seek out new data on the Web is called the Googlebot. The Googlebot is the type of software robot (sometimes called a Spider or WebCrawler) which scours the Internet recording updates and changes to web pages and constantly monitoring and adjusting the overall page rank and content quality of web sites.
The Googlebot “reports” back to Google’s central server with all of the information it has gathered for a particular sector of the Web (and yes, the Web is divided into different sectors, just like geographic locations on a map). Once the raw data has been gathered, Google’s central server processes and sorts the information, all the while saving an archive of each web page, a procedure which requires a massive amount of computer storage space.
With over one trillion web pages categorized now by the search engine giant, Google has become a massive globally-network system; and the complexity and scale of Google’s network is only going to grow: many experts predict the Web will have over 2 trillion categorized pages by the year 2015.
Related Directory Categories:



