Wednesday, 10 July 2013

Centos 6/RHEL install Wkhtmltopdf Web Page to PDF converter

Wkhtmltopdf is an open source command-line shell utility that enables you to download, convert and print any given HTML page to a PDF document. It includes the facility to convert images to jpg, png etc.

Wkhtmltopdf is written in C++ and uses the WebKit rendering layout engine to convert the pages with minimal loss of quality. 
It is a useful and reliable solution for creating and storing snapshots of web pages in real-time.

Installing on Centos 6 is straighforward, just download and untar the tar.gz file from the web site or use wget.

$ sudo yum -y install wget

$ sudo su

# cd /opt

Download the file

# wget

# wget

Untar the file

# tar xvf wkhtmltopdf-0.10.0_rc2-static-i386.tar.bz2

Rename it to something easier to remember and type.

# mv wkhtmltopdf-0.10.0_rc2-static-i386.tar.bz2 htmlpdf

Symlink to /usr/bin

# ln -s /opt/htmlpdf /usr/bin/htmlpdf
# exit

Now use it to download and convert a web page to pdf

$ htmlpdf ~/minimallinux.pdf

View it with Xpdf

$ xpdf minimallinux.pdf

Web page converted to PDF with Wkhtmltopdf

Check the contents with pdfinfo

$ pdfinfo minimallinux.pdf

Title:          Minimal Linux: Centos 6/RHEL install and use OpenNTPD local clock sync
Producer:       wkhtmltopdf
CreationDate:   Wed Jul 10 11:44:24 2013
Tagged:         no
Pages:          4
Encrypted:      no
Page size:      595 x 842 pts (A4)
File size:      283896 bytes
Optimized:      no
PDF version:    1.4

Make a table of contents of the page using the toc flag

$ htmlpdf toc  ~/minimaltoc.pdf

View it with Xpdf

A Table of Contents generated with Wkhtmltopdf

Just to get an idea of how it looks.

You can also download to image fomats such as .ps etc, for more information see the help file.

$ htmlpdf --help

For similar tasks in browser see Firefox ConvertToPDF