Why page numbers fail us
I keep running into a deep information habit that has never worked well for its intended purpose, the page number has been an information curse. Printed documents use page numbers, which are intended as a reference point (not bragging rights often referenced in Harry Potter and Neal Stephenson books - I am on page 674 and you are on page 233). All of us are familiar with this problem from high school and college if you happened to have a different printed copy of a classic text. Page 75 of Hemmingway's Old Man and the Sea was not the same in everybody's copy.
Even modern books fail when trying to reference pages, just look at the mass market edition of Crypnomicon with 1168 pages and the hardcopy version of Crypnomicon with 928 pages of the same text. Trying to use a page number as a reference does absolutely no good.
Now we try and reference information on the Web, which should not be chunked up by page count, but by logical information breaks. These breaks are often done by chapter or headings and rightly so as it most often helps the reader with context. Documents that are placed on the Internet, many times for two purposes - the ability to print and to keep the page numbers. Having information that is broken logically for a print presentation makes some sense if it is going to be printed and read in that manner, but more and more electronic information is being read on electronic devices and not printed. The Adobe reader does not easily flow from page to page, which is a complaint I often hear when readers are trying to read page delimited PDF files.
So if page numbers fail us in the printed world and are even more abysmal in the realm of the electronic medium, what do we use? One option is to use natural information breaks, which are chapters, headers, and paragraphs. These breaks in the information occur in every medium and would cause problems for readers and the information's structure if they are missing.
If we use remove page numbers, essentially going native as books and documents did not havepage numbers originally (Gutenberg's Bible did not rely on page numbers, actually page numbers in any Bible are almost never used Biblical reference), then we can easily place small paragraph numbers in the margins to the left and right. In books, journals, and periodicals with tables of contents the page or article jumps the page numbers can remain as the documents self-reference. The external reference could have a solid means of reference that actually worked.
Electronic media do not necessarily needs the page numbers for self-references within the document as the medium uses hyper-linking to perform the same task appropriately. To reference externally from a document one would use the chapter, header, and paragraph to point the reader to the exact location of text or microcontent. In (X)HTML each paragraph tag could use an incremented "id" attribute. This could be scripted to display in the presentation as well as be used as hyperlink directly to the content using the "id" as an anchor.
I guess the next question is what to do about "blockquote" and "table" tags, etc., which are block level elements? One option is to not use an id attributes in these tags as they are not paragraphs and may be placed in different locations in various presentation mediums the document is published in. The other option is to include the id tag, but then the ease of creating the reference information for each document type is eliminated.
We need references in our documents that are not failures from the beginning.