Should HTML5 let you easily show your multi-part manual as a single document?

The Problem

A large document is generally best viewed on-line when it is composed of small HTML pages linked together. But it is often desirable to have a single file that compiles all the page together for printing, to run a 'find' on, to scan for a quick overview, for archiving, to run a spell-check, ...; so manuals are often kept in both forms. But the process of merging all the pages of a document together into a single Adobe PDF or HTML file can be time-consuming and is often fraught with problems.

update ..

There is now a "seamless" attribute proposed for the HTML5 IFRAME element that is close to (I think) this functionality. I don't know of any browser implementing it as of this date (2008-08-08), however. see: http://www.w3.org/TR/html5/embedded0.html#seamless

Things that did not work

Using the browser and IFRAME

At first, I thought the HTML element IFRAME could be used directly and trivially to merge multiple files into a single display page. It seemed a simple HTML document with IFRAME directives for each URL in it was all that was needed ...

  <html> <head> <title></title> </head> <body>
  <iframe src="URL1"></iframe>
  <iframe src="URL2"></iframe>
  <iframe src="URL3"></iframe>
              :
              :
              :
  </body> </html>

But all the documents came in with scrollbars. If I turned off scrolling, the included documents were displayed truncated. Why does IFRAME not default to a full document view? An IMG file without width and height specified always views full-size. I was very surprised that an IFRAME including a document does not. But a little JavaScript seemed (at first) to solve the problem.

This document shows several methods that come tantalizingly close to letting you use the browser alone to make merged documents, but each has a drawback. The methods are good enough that I use them, but each method requires a compromise. That's why I wonder if a new feature isn't required for HTML5.

This file shows how you can almost accomplish the goal quite simply using HTML IFRAME elements and some JavaScript routines. In general it works for making an on-line file, but several browsers (notably Internet Explorer) have problems printing the resulting file; especially if the included documents span more than one page. As written, if the browser width is changed the scroll bars might reappear until you do a reload of the page.

non-browser-based methods

One solution I used for years, first with mosaic(1) and later with netscape(1), used a feature these browsers had that let you easily create scripts to convert all your pages into PostScript (or to make slide shows and other generally useful things). The PostScript could then be merged easily into a single file. A sample Unix script outlines the process.

#!/bin/ksh
#######################################################################
HTML2PS(){
INFILE=$1 # INFILE should be a full path name
# Convert *.html or text files to PostScript
SCRATCH=/tmp/SCRATCH_$$.ps
rm -f $SCRATCH
netscape -noraise -remote "openFile($INFILE)"
netscape -noraise -remote "saveAs($SCRATCH,PostScript)"
}
#######################################################################
# for all files on command line convert them to PostScript and then
# append them all onto $SCRATCH for conversion with ps2pdf or similar
#utilities
for FILE in $*
do
   HTML2PS $FILE
done
# append documents
#cat $SCRATCH ;rm -f $SCRATCH
ls -ld $SCRATCH
exit
#######################################################################

With the demise of the netscape(1) browser, I can not depend on such a method long-term any longer, as no other browser supports such an easily scripted interface (although some come close). Other utilities like html2ps(1) and html2latex(1) or Adobe's distiller(1) or other commercial packages can be used to accomplish something similar, but all these methods require considerable effort to keep the consolidated documents up to date or only support a subset of HTML or have other issues (like cost). If you install a PDF driver on a machine with MSWindows and then install CygWin, you can use

   cygstart --print FILENAME

to make each section into individual PDF files. Then if you own the software to merge PDF files; or convert the files to PostScript with pdf2ps(1) and append them together and then run them thru ps2pdf(1) you can do the same thing. This is also easily scripted.

I wish there was a standard way using the browser

Although I have used all of the above methods successfully, a new element similar to FRAMESET(here called PUBLISH) would make a simple way to merge documents with HTML possible:

<html>
<PUBLISH>
<OBJECT src="fables/Androcles.html"></OBJECT>
<OBJECT src="fables/Avaricious_and_Envious.html"></OBJECT>
<OBJECT src="fables/Belling_the_Cat.html"></OBJECT>
<OBJECT src="fables/Hercules_and_the_Waggoner.html"></OBJECT>
</PUBLISH>
</html>

Where the PUBLISH element would signify that the files are merged in full-size; that they resize with browser width changes; allow text searches and other utilities to treat them as if they were part of the current page's text, obey page break requests from the included document; and so on. Where a compromise is needed, it would be made in favor of printing nicely.

Another solution might be a new scrolling value for IFRAME, which could easily allow other HTML to be mixed in with the merged file displays:

<html>
<iframe scrolling="fullview" id="newIframe1" src="fables/Androcles.html"></iframe>
<iframe scrolling="fullview" id="newIframe2" src="fables/Avaricious_and_Envious.html"></iframe>
<iframe scrolling="fullview" id="newIframe3" src="fables/Belling_the_Cat.html"></iframe>
<iframe scrolling="fullview" id="newIframe4" src="fables/Hercules_and_the_Waggoner.html"></iframe>
</html>

Here it is assumed that the new "fullview" value would prevent scrollbars, ensure the entire merged document is viewable and searchable, and maybe provide for resizing if width= and height= values were provided.

A sample showing expected results

The following display of selected Aesop's Fables demonstrates the desired results on-line using IFRAME directives and a small script that (hopefully) displays each of the documents entirely and eliminates the scrollbars. A few small documents, each using a different CSS style sheet, are displayed using IFRAME directives generated by a script.

This method works for a complex document where different sections are in different directories using different CSS style sheets and images in various directories and even sections in various formats (text, HTML, Adobe PDF, ...) and works just great on-line. There is an amazing difference in speed between the various browsers (Opera, Firefox, Safari, Internet Explorer, ...) in viewing a merged document this way when hundreds of files are being merged.

Every browser fails to some extent if you want to do a 'find' in the merged document or print the document (especially if the included files should span multiple pages when printed). And "inherited" requests for page breaks typically fail too.

Basically, if the following script is in the body of an HTML page, all that you need to do is change the list of files in the loadthem() procedure to display a multi-part document as a single file.

Since the scripts can be reused, I would suggest putting them into a *.js file; and then the document could just have the loadthem function in it. I generally even put that into a separate file so I can use it in other files so I can use different "presentation" functions that add page breaks or labels depending on my need.

EXAMPLE FILE


URL: APPEND.html

Summary

Using IFRAME elements and some Javascript scripts lets you easily make a document that merges many HTML documents into a single page. This method works with Opera, Internet Explorer, Firefox, and Safari browsers, and hopefully with others that support JavaScript and HTML IFRAME elements. I have found no problems with this method for on-line viewing, but the merged document does not work as desired for printing or when using browser tools such as 'find' and 'spell checkers'. Another method that literally merges HTML from each document does work with such tools, but is best limited to documents composed of pages that all share a CSS file (or don't use one) and that are kept in a single directory. A new standard HTML-only method for doing this seems required.

John S. Urban -- last updated Fri Aug 8 22:00:56 EDT 2008
VALIDATE HTML5 VALIDATE W3