Translate

Saturday, July 22, 2017

Automating PDF's through Adobe API In Ruby

In the last couple of weeks I have been spending some time automating some pdf tests. It has definitely been a learning experience as I had never done anything like that before. Just some background I was doing this with Ruby in the Selenium framework. I started by going through some various gems out there that give access to different parts of the pdf. I found that for my specific use it would be good to use adobe api.

Some quick things about it I found useful.

One is the relationships between the different entities that are created such as the .app vs the .avdoc and the .activedoc. This relationship diagram I found very helpful in figuring out what was going on.


Second is to realize that pdfs are javascript, or at least a type of javascript, in the background. This will help because you can get the js object of the pdf and check various attributes as well as modify stuff on the fly in the app window itself, such as advanced searches. Here is a part of the code that I wrote using the getJSObject and searching things. @@js = AcroExch.AVDoc.getPDDoc.getJSObject




Third and last is get it right the first time. Part of the reason I decided to use this is because it was built, or so I thought. I ended up having to rewrite stuff as well as figuring out what the heck the previous guy was talking about. It is so hard to go into code like this and try to figure out all the different entities and API calls. This one is really true about any code you write, this just seemed especially irking to me because the story was supposed to be a quick 1pt story and it ended up taking me longer because I had to rewrite most of the functionality in our pdf class.

Here is part of my pdf.rb

require 'win32ole'
require 'mastercontrol/assertions'

class Rect
  def initialize upper_left_x, upper_left_y, lower_right_x, lower_right_y
    @upper_left_x = upper_left_x
    @upper_left_y = upper_left_y
    @lower_right_x = lower_right_x
    @lower_right_y = lower_right_y
  end

  def position
    {x: upper_left_x, y: upper_left_y}
  end

  def upper_left_x
    @upper_left_x
  end

  def upper_left_y
    @upper_left_y
  end

  def lower_right_x
    @lower_right_x
  end

  def lower_right_y
    @lower_right_y
  end

  def inspect
    "#{@upper_left_x}, #{@upper_left_y}, #{@lower_right_x}, #{@lower_right_y}"
  end
end

class Pdf
  include OpenFileDialog

  @@pdf_app = nil
  @@pdf_pddoc = nil
  @@pdf_avdoc = nil
  @@js = nil
  @@pdf_bookmark = nil
  @@pdf_page = nil
  @@pdf_annot = nil
  @@pdf_link = nil
  @@pdf_page_view = nil
  @@pdf_active_doc = nil

  def initialize
    @@pdf_app = WIN32OLE.new('AcroExch.App') if @@pdf_app.nil?
    @aliases = {
      'InfocardType' => 'DocumentType',
      'InfoCardType' => 'DocumentType',
      'Effective' => 'EffectiveDate',
      'InfocardNumber' => 'DocumentNumber',
      'InfoCardNumber' => 'DocumentNumber',
      'LastReview' => 'LastReviewDate',
      'NextReview' => 'NextReviewDate',
      'Header' => '___1___',
      'Footer' => '___2___'
    }
  end

  def open_pdf filepath
    @@pdf_avdoc = WIN32OLE.new('AcroExch.AVDoc')
    if not @@pdf_avdoc.Open(filepath,"Adobe Acrobat Pro DC")
      error_msg = "Unable to open pdf file with path #{filepath}\n"
      error_msg += "Is the path correct?\n"
      error_msg += "Is the pdf corrupted?\n"
      raise error_msg
    end
    @@pdf_bookmark = WIN32OLE.new('AcroExch.PDBookmark')
    @@pdf_active_doc = @@pdf_app.GetActiveDoc
    @@pdf_page_view = @@pdf_avdoc.GetAVPageView
    @@pdf_pddoc = @@pdf_avdoc.GetPDDoc
    @@js = @@pdf_pddoc.GetJSObject
    @@pdf_app.Show
    pdf_window = RAutomation::Window.new title: /Adobe Acrobat Pro DC/i
    pdf_window.activate
    return self
  end

  def open_pdf_from_resources filename
    self.open_pdf env['resource_dir'] + '/' + env[filename]
  end

  def open_downloaded_pdf filename
    # downloads_path = File.expand_path "#{ENV['USERPROFILE']}/downloads"
    file_path = get_full_file_path filename
    self.open_pdf file_path
  end

  def get_full_file_path partial_path
    Find.find("#{ENV['USERPROFILE']}/downloads/") do |path|
        if path.include? partial_path
            return path
        end
    end
  end

  def cleanup_downloads partial_path
    Find.find("#{ENV['USERPROFILE']}/downloads/") do |path|
        if path.include? partial_path
          File.delete(path) if path.include? partial_path
        end
    end
  end

  def verify_pdf_doc_is_opened
    unless pdf_doc_is_opened?
      raise 'There is no pdf document opened'
    end
  end

  def pdf_doc_is_opened?
    @@pdf_avdoc.IsValid
  end

  # advanced search matching multiple words works when searching with more than one word pass in the words in the same string separated by a space
  def advanced_search search_text, match_all_words:true, match_any_word:false,match_phrase:false,metadata:false
    if match_all_words
      @@js.search.wordMatching = "MatchAllWords";
    elsif match_any_word
      @@js.search.wordMatching = "MatchAnyWord";
    elsif match_phrase
      @@js.search.wordMatching = "MatchPhrase";
    end
    @@js.search.docXMP = metadata

    # there are more options available that are not coded in
    @@js.search.query(search_text);
  end

No comments :

Post a Comment