There are several major efforts underway to digitize the books of the world that are not currently under copyright. Google and Microsoft are leading this effort. (They do charge organizations for this service). In addition, there are not-for-profit efforts by groups such as the Open Content Alliance.
Vast repositories of the entire text of books eventually will available to the major search engines. It is already possible to do a limited book search on Google. Books can be searched, just like the many other categories on Google such as; Web, Images, Video, News, Maps, etc.
This search can be done from the more tab at the top of the main Google search page. Clicking on it will reveal numerous goodies, books included. Does this mean that Shakespeare will soon be appearing in results ? Not likely, but it does offer some exciting possibilities for research, and tremendous time savings for students and scholars whose material can be indexed on-line.
Why should this be of interest to Internet users, as well as Internet writers ? Well, this will open up writings, initially Western literature, to several billion readers. There are only approximately one billion people in the world that have access to libraries in their native countries, mostly in the developed nations. The knowledge of the world will soon be accessible to a much larger world audience.
Major libraries such as: New York Public Library and libraries at the University of Michigan, Harvard, Stanford, library discovery service and Oxford have signed on with Google. Others wanted a more open unrestricted access to their archives, free from any possibility of commercial interest such as the Boston Library Consortium (with 19 members) including the University of Massachusetts, the University of Connecticut. The Smithsonian and the University of California, are other institutions that decided to opt out of offers from Microsoft and Google.
In order to put the scope of this effort in perspective, there are an estimated 32 million books in the world that could be scanned for this effort. This number is somewhat in dispute, since many books are out of print, yet not technically out of copyright. Google is not making this distinction, and has decided to digitize anything that is either out of print or out of copyright. This has produced praise from some quarters and wrath from others, mainly publishers, and some writers. Google is in the midst of numerous lawsuits, but their efforts continue unabated, and it will probably take years for the courts to work out the details. Ironically, even a settlement, may work in Google’s favor. (see below sources).
Technically, the numbers of books in the world is much smaller if you only count books out copyright. (See the following table).
Book Edition Count
2000 B.C. – 1 B.C. 779
1 A.D. – 1449 2.291
1450 – 1499 11,234
1500 – 1599 100,731
1600 – 1699 240,171
1700 – 1799 537,139
1800 – 1899 2,573,101
1900 – 1919 1,651,313
1920 – 1960 5,335,059
At the University of Michigan, using special proprietary equipment and software, Google is able to digitize one million books per year. It will take Google about six years to copy the Library’s entire collection of 7 million volumes.
The amount of storage space required to store a single digitized book is approximately 1 MB. Google currently has over 10,000 servers on-line to index the entire Web, which consists of well over 2 billion web pages. 32 million MB of data storage required to store the world’s collection of books, is information of a much greater magnitude than the entire Web today.
It has been revealed that it costs $ 30.00 for the Open Content Alliance to digitize a single book, (although Google’s costs may be significantly lower). It may take more than 10 years and between 500 million to 1 billion dollars to digitize the world’s books. (This may be a gross overestimate since I don’t have figures on cost from Google, and the most aggressive effort is being made by Google in this area). With these kinds of costs involved, it is understandable why it is important that Google is part of this effort.
Without taking sides one way or the other on the legal ramifications of what Google is doing, clearly the idea of digitizing books is an idea whose time has come. When Stanford University made only their card catalog available electronically, within a fairly short period of time, students were visiting the library twice as often and checking out twice as many books. From these kinds of observations, it is clear that the availability of on-line books will not soon obsolete the existence of libraries.