Compete
According to compete, my website ranks 408,961 out of the top 1 million websites in North America based on the number of unique visitors.
So how many websites are there in North America? No one seems to know. Tens of millions? Hundreds of millions? Billions?
If you enter “site:.com” without the quotations marks into Google’s search engine, you will get 5.86 billion hits. That gives some idea as to how many indexed pages there are on sites with a .com extension. However, there are many other extensions in use on the Internet.
Berkeley did a project way back in 2000 called How Much Information. And they updated the project in 2003. Some interesting data points from the 2000 report:
There are two groups of Web content. One, which we would call the “surface” Web is what everybody knows as the “Web,” a group that consists of static, publicly available web pages, and which is a relatively small portion of the entire Web. Another group is called the “deep” Web, and it consists of specialized Web-accessible databases and dynamic web sites, which are not widely known by “average” surfers, even though the information available on the “deep” Web is 400 to 550 times larger than the information on the “surface.”
The “surface” Web consists of approximately 2.5 billion documents, up from 1 billion pages at the beginning of the year, with a rate of growth of 7.3 million pages per day. Estimates of the average “surface” page size vary in the range from 10 kbytes per page to 20 kbytes per page. So, the total amount of information on the “surface” Web varies somewhere from 25 to 50 terabytes of information [HTML-included basis]. If we want to obtain a figure for textual information, we would use a factor of 0.4, which leads to an estimate of 10 to 20 terabytes of textual content. At 7.3 million new pages added every day, the rate of growth is, taking an average estimate, 0.1 terabytes of new information per day.
If we take into account all web-accessible information, such as web-connected databases, dynamic pages, intranet sites, etc., collectively known as “deep” Web, there are 550 billion web-connected documents, with an average page size of 14 kbytes, and 95% of this information is publicly accessible. If we were to store this information in one place, we would need 7,500 terabytes of storage, which is 150 times more storage than we would need for the entire “surface” Web, even taking the highest estimate of 50 terabytes. 56% of this information is the actual content, which gives us an estimate of 4,200 terabytes of high-quality data. Two of the largest “deep” web sites – National Climatic Data Center and NASA databases – contain 585 terabytes of information, which is 7.8% of the “deep” web. And 60 of the largest web sites contain 750 terabytes of information, which is 10% of the “deep” web.
Although I do not know how many sites exist in North America, I suspect it must be a large number.
Leave a Reply
Want to join the discussion?Feel free to contribute!