Akom's Tech Ruminations

Various tech outbursts - code and solutions to practical problems

Preserving a Static Copy of Atlassian Jira (and Confluence) as a Jekyll Site Linux

Posted by Admin • Monday, October 25. 2021 • Category: Linux

Last month an old Jira installation I own was compromised via a recent vulnerability. This is bound to happen. Keeping a public product like that secure would require very frequent patching, which is a lot of maintenance. Fortunately, this installation is only a historical record of a popular open-source project (current development uses github issues). In other words, I can get away with a static, read-only copy.

Going Static

Of course, I can use

wget --mirror

to save the whole site exactly as-is. This is a simple option, but it will also need a lot of massaging (for example, to remove confusing links to login, javascript that may break, etc).

Instead, I'm going to convert the content to markdown so that I can then regenerate the site using Jekyll, changing the look and feel, headers and footers as needed. This will also preserve all existing URLs (including attachments).

Making this Happen

I wrote a project to help automate this process fully via the Jira API: https://github.com/akomakom/jira-to-jekyll/.

The basic goal is to do what I outlined above. There is a comprehensive README that explains the process.

Now what about Confluence?

Confluence is a different beast altogether. In my case, I needed to not only preserve the content but to be able to add and edit pages in the future.

Confluence has a built-in option to export one "Space" at a time (Space Tools -> Export).

Unfortunately all the export options have issues:

  1. XML: incredibly complicated schema, a single huge file per space (or even the whole site if exporting from Settings)
  2. PDF: It's a PDF. It looks alright, but it won't help you preserve old links
  3. HTML: This is a decent option, except that it loses a lot of "special content". Any widgets or plugin-formatted pages don't survive

And none of these options preserve attachments. There is an API, which is quite complex as well.

The way I actually exported from Confluence was as follows:

  1. Started with HTML export of a single Space
  2. Massaged the data
    • Renamed files (half of them were named using page IDs)
    • Generated Jekyll front matter manually (especially permalink to preserve original URLs)
    • Copied attachments from the Confluence server manually
    • Tweaked the attachment links so they are valid for markdown

It was a horribly manual process. There is a project you can use to make some of the massaging of the exported HTML faster, but it won't help if your page content never made it to the HTML in the first place.

In the end, I did it, and I have a site that I can re-style and manage quite nicely.

0 Trackbacks

  1. No Trackbacks


Display comments as (Linear | Threaded)
  1. No comments

Add Comment

Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
You can use [geshi lang=lang_name [,ln={y|n}]][/geshi] tags to embed source code snippets.
Standard emoticons like :-) and ;-) are converted to images.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.

What is the primary language of this blog? (Anti-SPAM question)
Markdown format allowed

Submitted comments will be subject to moderation before being displayed.