Akom's Tech Ruminations

Various tech outbursts - code and solutions to practical problems

Backing Up Google Docs automatically from Linux Linux

Posted by Admin • Monday, October 26. 2009 • Category: Linux

I'm not quite sure why a working example of doing this is so hard to find!

As far as I can tell, at the time of this writing there are two choices of ready-made (and free) apps that do this: GDocBackup and php-google-backup, a tiny php script (on google code) that uses Zend libraries. The former is a windows binary and is said to run in Mono. The latter is a php script which currently partially works (can't handle spreadsheets or PDF's). Not wanting to run Mono (an emulator - I might as well write this in Java), and not satisfied with only backing up .doc's and presentations... I hacked up my own.

Here is how I did it

First I took the script from php-google-backup as a starting point and did a bunch of reading on the API's for both Google and Zend stuff.

Then I modified it to be more flexible (URL's are apparently all different for each type of document). It caches logins and sets exit code. It will also work with both Google Apps accounts and the basic Gmail accounts. Then I got really fed up with Zend framework and I rewrote the whole thing from scratch. I am using it on a cron, so it works (well, today, anyway)

Installation

Prerequisites

  • php: (obviously).
  • Zend libraries: on ubuntu this is a two-step install: apt-get install zend-framework, then edit /etc/php5/cli/conf.d/zend-framework.ini and uncomment the include
  • pecl HTTP module for version 0.4+. Easiest way to install in Ubuntu:
    1. apt-get install php5-dev
    2. apt-get install libcurl3-dev
    3. pecl install pecl_http # defaults are fine
    4. enable it with a conf.d/ file of your choosing like: /etc/php5/cli/conf.d/peclhttp.conf :
       [PHP]
       extension=http.so
                

Download

I moved the app to Sourceforge
I've restarted the versioning with 0.4 (since it's beta... it's sort of what comes after 1.3 ... right?)

Usage

Run it with -h for usage information.

You can edit the email/password in the top, or pass it in on the command line or even be prompted (run it with -h for help). You should probably rename it to .php As with any software you should review the code to make sure it's not malicious and looks reasonable.

Run it using php (php filename [params]) and if it works - you can run it on cron to provide automated backups.

This is how I run it: (I hardcoded the password)
php php-google-backup.php -u MY@EMAIL.ADDRESS -b workspace/docs -t workspace/titles -c workspace/category -a workspace/archive

Details

What it does:
  1. Gets the list of your documents
  2. Gets each one in your preferred format, and saves it to your -b path using the documents ID as the filename
  3. Symlinks the above file into your -t path using the document's Title as the filename
  4. Symlinks the above file into your -c path using each category and the document's Title as the filename
  5. Moves any documents that didn't come back from google (deleted?) into your -a path, and moves their title symlink there too
  6. Tries to clean up dead symlinks
  7. Wags its tail


Note that you can change the formats that it saves documents in by changing the "format" array element values in the top of the file. Google supports several formats for each type of document (eg doc, odf, etc) - see google documentation

Wishlist

I'd like to have this app support folders. What I'd probably do (since a document may appear in multiple folders) is to continue saving the documents to the main directory (gdocs-backup), and then creating symlinks from "folder" directories to represent the tree structure that Google has. (Did this in 1.1)

Unfortunately as far as I can tell Zend doesn't support the folder element? I might just be confused about how it works. The API does support the ?showfolders=true query string parameter, but the data seems not to be represented in the resulting class instances that Zend returns. (Yeah it supports it apparently)

I'd also want to store the original filename. I can do that pretty easily (Done in 1.1)

Would be nice to handle document deletions... currently old docs will just sit around forever. Likewise, dead symlinks are only partially cleaned up at the moment. I guess it's possible to just remove all symlink dirs every time and allow the app to recreate them... this potentially solves deletions too as any file that doesn't have a symlink to it is old? Needs more thought. In the meantime, you can just do something like "find SOME_DIR -type l -delete" in your cronjob prior to each run. (Done in 1.3)



Comments welcome

0 Trackbacks

  1. No Trackbacks

2 Comments

Display comments as (Linear | Threaded)
  1. Why files jpg aren't backup?
  2. I'm not entirely sure what you're asking as you didn't provide much detail. Since the app is now on Sourceforge, you should use the forums there to ask questions.

Add Comment


You can use [geshi lang=lang_name [,ln={y|n}]][/geshi] tags to embed source code snippets.
Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

What is the primary language of this blog? (Anti-SPAM question)


Submitted comments will be subject to moderation before being displayed.