Akom's Tech Ruminations

Various tech outbursts - code and solutions to practical problems

Linux OCR on a large PDF using tesseract and pdftk

Posted by Admin • Thursday, January 19. 2017 • Category: Linux
This turned out to be harder than I thought. I found a large (50MB) PDF with about 50 pages, and none of the tesseract GUI's seemed to be able to handle it without crashing. The solution is to convert the PDF to TIFF so that command-line tesseract could handle it directly, but now ImageMagick couldn't handle that conversion as it was running out of memory (even with the limit settings). So the only option was to reduce the load on all the moving parts by splitting the PDF into pages.

Even after splitting the PDF and running each page through the PDF->TIFF->Tesseract->PDF chain I was still having issues:
Error in pixReadFromTiffStream: spp not in set {1,3,4}
Huh? So it turns out that sometimes you may wind up with an alpha channel in your TIFF and tesseract can't handle this. There is a solution, fortunately. So finally, I put all of these steps together into a script:

Continue reading "OCR on a large PDF using tesseract and pdftk"

Java Linux Running Jenkins Swarm client as a service via Upstart

Posted by Admin • Wednesday, December 21. 2016 • Category: DevOps, Java, Linux
This turned out to be fairly simple, with only one gotcha: do not follow the how-to's out there that tell you to use expect fork. The process doesn't technically fork. When I had that setting enabled, upstart commands would hang under very specific but repeatable conditions (if the process was killed externally).

So, here is my upstart conf file:

Continue reading "Running Jenkins Swarm client as a service via Upstart"

Linux Docker: Automatically remove containers that have been running too long

Posted by Admin • Thursday, October 20. 2016 • Category: DevOps, Linux
Why Because my Jenkins setup sometimes starts containers and forgets about them. Either it thinks it failed to start one, or the container itself has trouble starting. Either way, I'm left with containers that are running, trying to connect to Jenkins in vain, forever. The proper way to fix this is probably to have the containers timeout at some point, but that mechanism is broken.

Anyway, the fix I have is a true hack: find containers that have been up more than 2 days and kill them. None of our jobs should run for more than about a day, so this is a safe limit. Here is a bash script to do this:

Continue reading "Docker: Automatically remove containers that have been running too long"

Linux Jenkins command line ssh: Host key verification failed despite ssh-agent

Posted by Admin • Thursday, August 4. 2016 • Category: DevOps, Linux
After hours of "Why does it work locally but not in Jenkins", this error boils down to StrictHostKeyChecking... In other words, since the job runs as a user on a random slave, and this user most likely doesn't have a known hosts file with an entry for the target system, this fails rather cryptically. You think that the user's keys don't work, but that's not the problem.

The whole setup boils down to :
  1. Install ssh-agent plugin
  2. Configure credentials with a valid ssh key for your target
  3. Enable ssh-agent with that credential entry in your job config
  4. Add StrictHostKeyChecking=no to whatever ssh command you are using. Some examples:
    • GIT:
      export GIT_SSH_COMMAND="ssh -oStrictHostKeyChecking=no"
    • SSH:
      ssh -oStrictHostKeyChecking=no ....

Linux Ubuntu 15.10: Handling Dell Latitude laptop dock events to reconfigure displays

Posted by Admin • Tuesday, April 12. 2016 • Category: Linux
Couldn't find any good solutions out there. I have a Dell Latitutde E6410 running Ubuntu 15.10 with XFCE (no Gnome or KDE). I use a physical docking station that allows me to use two external monitors. Ubuntu doesn't seem to understand that if these external monitors vanish, it should switch to the monitor that remains (built-in LCD), and vice versa, so I have to automate it myself.

The trick to doing this reliably is finding a udev device you can monitor using udev rules and perform actions when it appears or disappears. This would be a device that is 100% absent when undocked and is 100% present when docked. There is no kernel built-in dock device on these laptops. In order to watch what devices are coming and going, I used the following command (as root):
udevadm -p -u
Run that in a terminal, undock/dock and see what it says. You may want to save the output to a file, it's a lot. In my case, there was nothing definitive. There is "drm" device which sounds promising, but there is no good way to tell a "remove" apart from an "add" event. Ultimately, I settled on the numlock event. Why? Because it is specific to my external USB keyboard (Microsoft 4000) that is always plugged into the docking station. The internal laptop keyboard is unlikely to generate that event, and the full-size keyboard always produces one. Here is the relevant remove event:

Continue reading "Ubuntu 15.10: Handling Dell Latitude laptop dock events to reconfigure displays"

Linux Migrating Maven Jenkins jobs to FreeStyleJobs due to JDK 1.6 incompatibility

Posted by Admin • Tuesday, March 15. 2016 • Category: DevOps, Linux
Soon after Jenkins 1.609.1 support for JDK 1.6 was dropped altogether. This means that you can run jobs with whatever JDK you want as long as you are not using the Maven job type. Maven jobs require JDK 1.7 or higher (actually they run on whatever version the master is using, ignoring the JDK setting in the job configuration). This is a big deal for a shop that does extensive cross-platform testing. There are two solutions:
  1. Convert hundreds (in our case) of Maven jobs to regular (freestyle) jobs. There appears to be no simple script to do this, so I wrote one here
  2. Use maven toolchains to compile and test with a different JDK. Although converting jobs is a viable option that requires no additional setup, using toolchains may be preferred in some cases. For one thing, Jenkins offers the "Perform Release" button on maven jobs, something that is difficult to emulate in freestyle jobs.
Regarding option 1:

Initially I tried doing this using JobDSL, but that is way too time-consuming and hard to maintain. The simplest approach is to take the job XML, make minimal changes and push it back. Except that it's not that simple - you can't change the job type. You need to create new jobs, which is why I wrote a script to do all of the above. It will move the pre/post builders to the main builder list, and create a new maven job step in between. The rest is left unchanged and most of the XML is naturally valid for this version of jenkins/plugins since it's pulled from the live Jenkins.

Java Linux Jenkins: Bulk editing jobs to remove a trigger

Posted by Admin • Thursday, December 17. 2015 • Category: DevOps, Java, Linux
I have about 200 jobs that have the HudsonStartupTrigger (this is a plugin) turned on. This makes all of them run every time the master is restarted, causing the build queue to go crazy. I don't know why people turned this on in so many jobs (probably blindly copying jobs) over the years, but I'd like it gone. I don't want to restart the master or uninstall the plugin.

Here is a script console snippet to do this (can be adapted to remove other triggers easily). This does not do folders, if you're using that plugin.

import hudson.triggers.*
for (item in Hudson.instance.items) {
    name = item.name
  if (item instanceof AbstractProject) {
    triggers = ((AbstractProject)item).getTriggers()
    triggers.each{descriptor, trigger ->
      if (descriptor instanceof org.jvnet.hudson.plugins.triggers.startup.HudsonStartupTrigger$HudsonStartupDescriptor) {
            out.println("Removing startup trigger from job " + name)

Linux Suse11 cannot mount NFS share

Posted by Admin • Wednesday, December 9. 2015 • Category: DevOps, Linux
This is a recent issue on Suse 11.3. I have hundreds of machines mounting the same share fine - Centos, RedHat, even AIX.
mount -v /mountpoint                                                                                                               
mount.nfs: timeout set for Wed Dec  9 11:09:27 2015
mount.nfs: trying text-based options 'rsize=8192,wsize=8192,intr,hard,addr=X.X.X.X'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: portmap query retrying: RPC: Program not registered
mount.nfs: prog 100003, trying vers=3, prot=17
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: prog 100003, trying vers=2, prot=6
mount.nfs: portmap query retrying: RPC: Program not registered
mount.nfs: prog 100003, trying vers=2, prot=17
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: prog 100003, trying vers=2, prot=6
mount.nfs: portmap query retrying: RPC: Program not registered
mount.nfs: prog 100003, trying vers=2, prot=17
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: requested NFS version or transport protocol is not supported

Suprisingly, the problem is exactly what it says it is. Adding ",vers=4" to the options takes care of the problem. Hope this saves you time.

Linux Jenkins Swarm Slaves on Windows using Puppet

Posted by Admin • Monday, October 12. 2015 • Category: DevOps, Linux
Jenkins Swarm plugin is great, and instrumentation for Linux is fairly well-known, but what about Windows? Here is one approach for setting it up as a windows sevice (what we want Puppet to do):
  1. Download Swarm jar
  2. Download winsw from Kohsuke
  3. Rename winsw to jenkins_swarm.exe
  4. Create jenkins_swarm.xml
  5. Run jenkins_swarm.exe install
  6. Start service
Now for the details

Continue reading "Jenkins Swarm Slaves on Windows using Puppet"

Linux Jenkins Windows Slaves cannot install JDK

Posted by Admin • Monday, October 12. 2015 • Category: DevOps, Linux
Despite making the jenkins slave user a local administrator. The slave runs tools\hudson.model.JDK\JDK_1.7\jdk.exe (or 1.6), and fails (see the extended post body for the log).

If you're seeing Error 1722.There is a problem with this Windows Installer package, then most likely you have another (newer?) version of this JDK installed system-wide. Uninstall from Control Panel, try again.

Continue reading "Jenkins Windows Slaves cannot install JDK"

Linux Webex in Ubuntu 15.04

Posted by Admin • Monday, September 28. 2015 • Category: Linux
The following was required to get shared screens to display:
  1. Using openjdk 7 (although I have 7 and 8 installed, 8 is default, and I removed oracle java - but the browser plugin still runs 7)
  2. Follow Option 1 from http://askubuntu.com/a/623397/238077
  3. If you are missing libxmu6, install the 32 bit version: sudo apt-get install libxmu6:i386 (this is actually mentioned in the answer, but I missed it)
  4. Firefox should now support webex

Linux Getting VMware vSphere 5.5 Server Console to work on Ubuntu Linux 15.04

Posted by Admin • Wednesday, September 16. 2015 • Category: DevOps, Linux
As everyone who uses this product eventually finds out, VMware vSphere (5.5) browser-based web client will generally work in Linux (at least in chrome), but opening a server console is nearly impossible. The web client requires flash 11.5, and of course there is no linux flash above 11.2...

After a lot of dancing I managed to get it working in Firefox 40.0.3 on Ubuntu 15.04. Here are are the steps in a nutshell:
  1. Clear out your ~/.mozilla ... True, you may not need to, but I did. Start firefox once, then exit.
  2. Install pipelight
  3. Enable flash in pipelight (I wound up doing this globally and for current user, not sure which one did it
  4. Install the VMware client integration plugin (if you did before, uninstall and reinstall). That's the download you get on the bottom of the vSphere login page
  5. Start firefox and see if it works. If not, fiddle with the plugins (under extensions). You may see two versions of flash. I was not able to disable 11.2 from the UI without also disabling version 18 from pipelight - so I wound up deleting /usr/lib/mozilla/plugins/flashplugin-alternative.so (but you can also apt-get remove flashplugin-installer)
  6. Explicitly go to extensions again and switch all VMware items to "Always Activate"
  7. Restart firefox over and over throughout the process :-)

Good luck

Update: After some system updates, the pipelight plugins stopped appearing in Firefox, and needless to say nothing worked. Deleting ~/.mozilla again seemed to resolve it (You then have to switch all the plugins to "Always Activate" again).

Linux Maven toolchains and puppet

Posted by Admin • Monday, August 31. 2015 • Category: DevOps, Linux
For today's trick we'll be setting up maven toolchains.xml files on Jenkins slaves that are managed by puppet. One option for doing this is to use the Jenkins Config File Provider plugin - but all that really does is push a predetermined file to the slaves before running the job (now if it only worked together with the Jenkins tool provider data...) In our case, we have Jenkins slaves with different OS versions and thus different toolchain locations - for example, minor JDK releases differ across OS's and thus the paths are not the same.

Instead, the plan is to do two things:
  1. Create a puppet custom fact with installed tool info
  2. Use it from a puppet template
Here is how

Continue reading "Maven toolchains and puppet"

Linux Getting ETVnet.com to play video on Ubuntu

Posted by Admin • Friday, June 12. 2015 • Category: Linux
In Ubuntu 15.04 it already works in Firefox as long as you have vlc installed, and you select /usr/bin/vlc the first time Firefox asks you what to do. Not so much in Chrome... Chrome uses xdg-open to determine what to launch. By default, xdg-open has no idea what to do with a mms:// style URL, so we need to set up a protocol action in xdg, as per http://askubuntu.com/questions/190895/how-to-change-what-xdg-open-does-with-ssh-userip-liniks :
xdg-mime default vlc.desktop x-scheme-handler/mms

UPDATE: I find that vlc does not gracefully recover from eTVnet's "glitches" - sound stops playing but video continues. Surprisingly, Totem does not have this issue, so I swtiched:
xdg-mime default totem.desktop x-scheme-handler/mms

Linux Simple puppet update-alternatives

Posted by Admin • Thursday, June 11. 2015 • Category: DevOps, Linux
This is a quick and dirty interface to update-alternatives on Centos/Redhat/Ubuntu for puppet. Seems to work well and doesn't require any modules.
Usage example: alternatives_update { 'java': versiongrep => '1.8' }

class my_alternatives {

  # Manipulates alternatives using update-alternatives.
  # Supports RHEL, Centos and Suse.
  # Ubuntu not tested (yet).
  # If multiple matches are available, picks the first one.
  # There is rudimentary alternatives support in the java class,
  # but it's rather limited and doesn't support most platforms and java versions.
  define update (
    $item = $title,   # the item to manage, ie "java"
    $versiongrep,     # string to pass to grep to select an alternative, ie '1.8' (1.8.*openjdk would also work)
    $optional = true,  # if false, execution will fail if the version is not found
    $altcmd   = 'update-alternatives' # command to use
  ) {

    case $::osfamily {
      'RedHat','SuSE': {

        if ! $optional {
          # verify that we have exactly 1 matching alternatives, unless it's optional
          exec { "check alternatives for ${item}":
            path    => ['/sbin','/bin','/usr/bin','/usr/sbin'],
            command => "echo Alternative for ${item} version containing ${versiongrep} was not found, or multiple found ; false",
            unless  => "test $(${altcmd} --display ${item} | grep '^/' | grep -- '$versiongrep' | wc -l) -eq 1",
            before  => Exec["update alternatives for ${item} to ${versiongrep}"],

        # Runs the update alternatives command
        #  - unless it reports that it's already set to that version
        #  - unless that version is not found via grep
        exec { "update alternatives for ${item} to ${versiongrep}":
          path    => ['/sbin','/bin','/usr/bin','/usr/sbin'],
          command => "${altcmd} --set ${item} $( ${altcmd} --display ${item} | grep '^/' | grep -- '$versiongrep' | head -n 1 | sed 's/ .*$//' ) ",
          unless  => "test -x \"$(${altcmd} --display ${item} | grep 'currently points' | grep -- '$versiongrep' | awk '{print \$NF}')\"",
          onlyif  => "${altcmd} --display ${item} | grep '^/' | grep -- '$versiongrep' ", # check that there is one (if optional and not found, this won't run)


      # Leave Ubuntu alone, this probably won't work there anyway