User Tools

Site Tools


tutorials:bash_scripting:part1

This is an old revision of the document!


2. Converting a scanned document to a pdf document

Last year I did some consulting for a law firm that required me to submit timesheets with my invoices. In any given invoice period I would undertake work involving multiple clients. Work undertaken for each client was broken down into standard categories, telephone call, email, meeting, etc.

I was working from my own home and my first inclination was to record everything on a spreadsheet formatted to look like the log however this was a bit cumbersome and I found that it was much simpler to just keep a log as was done in the office on the side of my desk or in my diary and pen in entries as necessary.

The law firm filed everything as pdf files so I had to submit my log forms via email as a pdf documents.

To make life easy a wrote the following script to convert the scanned log forms from an image to a pdf. I used xsane set to lineart for scanning and saved as either .jpg or pmg which resulted in an image just about the same width and height as an A4 document. With xsane set to lineart and 300 dpi, the pdf files are around 93.5kB.

#!/bin/bash
############################################################
# /usr/local/bin/con2pdf 
# Usage: con2pdf [input file]
# Converts an image to a pdf.
# requires awk
# requires convert (from ImageMagick)
############################################################
 
# Assign a variable
input_file=$1
 
# Test to see if a variable was provided with the command
test -n "$input_file"
  if [ $? -eq 1 ]; then
    echo -e "\nUsage: con2pdf [input file]\n"
    exit
  fi
 
# Assign another variable using the output of a command
output_file=`echo "$input_file" | awk -F "." '{ print $1 }'`.pdf
 
convert $input_file $output_file  # This line does the actual conversion
 
rm $input_file  # This line removes the image file
 
# end of script

I will now explain how this script works:

Note that with the exception of the first line, any text prefixed with a hash, #, is ignored up until the next new line. Text prefixed with a hash is usually referred to as a comment. Comments can be put on the same line as a command but only after the command. There are no hard and fast rules about using comments. They are handy to explain things to other folks as well as oneself. I normally don't comment a small script as much as this one. Usually I just add some notes at the top and then add what I need as I go along to explain what's happening or why something needs to done a certain way

#!/bin/bash

The first line of my script begins with the two characters “ # ” and “ ! ”. Since files are seen by programs as streams of data, a method is required to determine the format of a particular file within the filesystem. Different operating systems have traditionally taken different approaches to this problem.*   In the case of Unix and in our case Linux, “ #! ” will tell the kernel to treat the file as an executable script and not a machine code program. “/bin/bash” declares the path to the command interpreter that will be used. In the instance bash.

input_file=$1

This line is used to assign a variable to input_file using the first string of text, i.e. a file name that has been entered after the command con2pdf. More than one variable can be passed to a script when it is run and they would be numbered $1, $2, etc, but I only want to pass the name of the input file to the script in this instance.

test -n "$input_file"

This line uses test a bash built in command (builtin) to test if the variable is a non zero string, i.e. if a file name was passed to the script when the command contopdf was run. Test will exit with an exit status of 0 (true) if input_file is a non zero string and 1 (false) if input_file is not a non zero string. The exit does not print to stdout but it can be assigned as a variable using $? and can then be evaluated using an if statement.

  if [ $? -eq 1 ]; then
    echo -e "\nUsage: con2pdf [input file]\n"
    exit
  fi

This if statement evaluates $? to see if it is equal to 1.

If $? equals 1 then it will run the bash builtin, echo which prints the text within the double quotes to stdout. Echo is used with the flag -e which enables interpretation of backslash escapes. In this instance a newline, \n, is inserted before and after the text.

The next command is the bash builtin exit which will be used to exit the script.

All if statements must be closed with fi.

output_file=`echo "$input_file" | awk -F "." '{ print $1 }'`.pdf

Instead of passing both an input filename and an output (save) filename to the script the next line to creates and assigns an output filename to the variable output_file. Variables can be assigned using the output of a command when the command is enclosed in two backticks, `, which is the symbol below the tlde.

In the command echo is used to print the variable input_file but instead of printing to stdout it is redirected with a pipe to awk. Awk or gawk is a pattern matching program. Here the flag -F is used to declare “.” (full stop) as the field separator. For example, the file name scanned_image.png consists of two fields separated by a full stop. Awk will print the first field, $1** (scanned_file) to stdout. Note .pdf on the same line, after the second backtick. This appends .pdf to $1 so if $1 was scanned_file, the variable output_file would be scanned_file.pdf <code bash> #!/bin/bash ############################################################ # con2pdf-gui # Intended to convert scanned document to pdf document # providing gui interface for file selection and saving. # requires awk # requires convert (from ImageMagick) # requires dirname # requires zenity to create gui file selection dialogs # zenity requires GTK+ ############################################################ input_file=$(zenity –file-selection –title “Select an image file to convert to pdf”) working_dir=$(dirname $input_file) cd $working_dir output_file=$(zenity –file-selection –save –title “Where do you want to save the pdf file?”\ –directory $working_dir –confirm-overwrite) convert $input_file $output_file # end of script </code>

tutorials/bash_scripting/part1.1330061134.txt.gz · Last modified: 2017/10/12 21:58 (external edit)