Using Web Technologies to Print a Book
Using Web Technologies To Print A Book
I recently finished the second draft of my first novel and needed a way to prepare a decent-looking PDF to print and send to people so they could write all over it with red pens. Web technologies and a few Linux tools made this a fairly painless process.
The workflow goes:
- Write the book in Markdown
- Generate HTML from that Markdown
- Style the HTML with CSS
- Make the cover(s) with HTML and CSS
- Generate PDFs from that HTML + CSS
- Combine those PDFs into one
And here』s what you』ll need:
- A book written in Markdown
- MultiMarkdown, Pandoc, or similar
- A CSS stylesheet or two
- wkhtmltopdf
- Ghostscript
- A scripting language (I used Ruby)
The result will be a single PDF with numbered pages and an unnumbered front cover. Adding an unnumbered back cover is left as an exercise.
Step 1: Write the book in Markdown
I can』t help you much with this one. Writing a book is hard. But I believe in your abilities.
One issue I encountered with mine was needing three modes for the body text:
- the normal mode
- one for text a character had written, which would be indented and in a different font
- a variant on (2) where line breaks needed to be preserved
So you might want to keep this in mind. I ended up using Markdown』s blockquote syntax for mode 2 and its code block syntax for mode 3. This made it easy to target those blocks with CSS.
Another issue that came up was section breaks—how to format breaks in the text without using chapter or sub-section headers. In the Markdown, I used a single % %%
character on a line by itself. So after the HTML is generated, it can be piped through sed
to add a custom CSS class, e.g., to replace
with
.
Step 2: Generate HTML from that Markdown
I used MultiMarkdown but Pandoc would also make a great choice.
Depending on the way your book is split into files, you might want to start writing a build script. Here』s an example:
parts = [ "Talitha", "Imal", "Aunauf", "Empress", "Astronauts", ] parts.each do |part| system("multimarkdown -s ../#{part}/story.md | sed -E 's/_([^_]+)_/1/g' | sed -E 's/<h1 .+//g' | sed 's/%/
%/g' > output-#{part}.html") end
Those sed
commands (1) add intra-word italics (like Sal inger does), (2) remove redundant h1
headers (I added one to each file/chapter for reasons I can』t remember right now), and (3) fix the section-break
s.
To get the page numbers right, you』ll want the HTML for every chapter in the same file. The loop above produces a different file for each chapter, but you could also replace the output redirection with something like >> combined.html
.
Step 3: Style the HTML with CSS
Unless you add custom classes, the HTML generated from the markdown should not include classes, so your CSS will mostly need to target tag names— h1
, p
, blockquote
, etc.
If you want to add a page break between chapters, the chapter titles will need a consistent target (I used h2
tags) and this rule: page-break-before: always;
.
Step 4: Make the cover(s) with HTML and CSS
To make the front cover, follow the same process as with the book』s body: make the HTML, style it with CSS. You could use Markdown for this but the HTML might be simple enough that writing it by hand is an agreeable option.
Step 5: Generate PDFs from that HTML + CSS
You』ll want a version of wkhtmltopdf
with patched QT. If the version packaged for your distribution doesn』t have the patched QT, then you』ll want to download and install it yourself. You can check for the patch with the -V
option:
$ wkhtmltopdf -V wkhtmltopdf 0.12.4 $ wkhtmltox/bin/wkhtmltopdf -V wkhtmltopdf 0.12.4 (with patched qt)
You can specify page size, top, bottom, left and right margins, stylesheet, and, for page numbers, a footer file:
wkhtmltox/bin/wkhtmltopdf -s Letter -T 1in -B 1in -L 1in -R 1in --user-style-sheet style.css --footer-html footer.html combined.html body.pdf
A footer file should look something like:
function subst() { var vars={}; var x=document.location.search.substring(1).split('&'); for(var i in x) {var z=x[i].split('=',2);vars[z[0]] = unescape(z[1]);} var x=['frompage','topage','page','webpage','section','subsection','subsubsection']; for(var i in x) { var y = document.getElementsByClassName(x[i]); for(var j=0; j<y.length; ++j) y[j].textContent = vars[x[i]]; } }
The Javascript called onload
chomps through the variables passed to the file during processing and fills their values into the elements with matching class names. You can style those elements in the CSS.
Do something similar (but leave out the footer) to generate the cover:
wkhtmltox/bin/wkhtmltopdf -s Letter -T 1in -B 1in -L 1in -R 1in --user-style-sheet style.css title.html title.pdf
Then combine the PDFs with Ghostscript:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/default -dNOPAUSE -dQUIET -dBATCH -dDetectDuplicateImages -dCompressFonts=true -r150 -sOutputFile=final.pdf title.pdf body.pdf
So, to put this all together in a build script:
$ cat make #!/usr/bin/ruby wkhtmltopdf_cmd = "~/wkhtmltox/bin/wkhtmltopdf -s Letter -T 1in -B 1in -L 1in -R 1in --user-style-sheet style.css" parts = [ "Talitha", "Imal", "Aunauf", "Empress", "Astronauts", ] parts.each do |part| system("multimarkdown -s ../#{part}/story.md | sed -E 's/_([^_]+)_/1/g' | sed -E 's/<h1 .+//g' | sed 's/%/
%/g' > output-#{part}.html") end htmls = parts.reduce("") { |acc,val| "#{acc} output-#{val}.html" } system("cat #{htmls} | #{wkhtmltopdf_cmd} --footer-html footer.html - body.pdf") system("#{wkhtmltopdf_cmd} title.html title.pdf") system("gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/default -dNOPAUSE -dQUIET -dBATCH -dDetectDuplicateImages -dCompressFonts=true -r150 -sOutputFile=final.pdf title.pdf body.pdf")
You probably wouldn』t want to use this process for a final draft but it should work for all the ones you』re going to mark up anyway.
原文 : Hacker News