Preserving a Static Copy of Atlassian Jira (and Confluence) as a Jekyll Site

Random Entry: Software RAID in Ubuntu Karmic 9.10
< Bootstrapping S3/CloudFront with LetsEncrypt | Upgrading a non-LTS Ubuntu installation after EOL >

Preserving a Static Copy of Atlassian Jira (and Confluence) as a Jekyll Site

Posted by Admin • Monday, October 25. 2021 • Category: Linux

Last month an old Jira installation I own was compromised via a recent vulnerability. This is bound to happen. Keeping a public product like that secure would require very frequent patching, which is a lot of maintenance. Fortunately, this installation is only a historical record of a popular open-source project (current development uses github issues). In other words, I can get away with a static, read-only copy.

Going Static

Of course, I can use

wget --mirror

to save the whole site exactly as-is. This is a simple option, but it will also need a lot of massaging (for example, to remove confusing links to login, javascript that may break, etc).

Instead, I'm going to convert the content to markdown so that I can then regenerate the site using Jekyll, changing the look and feel, headers and footers as needed. This will also preserve all existing URLs (including attachments).

Making this Happen

I wrote a project to help automate this process fully via the Jira API: https://github.com/akomakom/jira-to-jekyll/.

The basic goal is to do what I outlined above. There is a comprehensive README that explains the process.

Now what about Confluence?

Confluence is a different beast altogether. In my case, I needed to not only preserve the content but to be able to add and edit pages in the future.

Confluence has a built-in option to export one "Space" at a time (Space Tools -> Export).

Unfortunately all the export options have issues:

XML: incredibly complicated schema, a single huge file per space (or even the whole site if exporting from Settings)
PDF: It's a PDF. It looks alright, but it won't help you preserve old links
HTML: This is a decent option, except that it loses a lot of "special content". Any widgets or plugin-formatted pages don't survive

And none of these options preserve attachments. There is an API, which is quite complex as well.

The way I actually exported from Confluence was as follows:

Started with HTML export of a single Space
Massaged the data
- Renamed files (half of them were named using page IDs)
- Generated Jekyll front matter manually (especially permalink to preserve original URLs)
- Copied attachments from the Confluence server manually
- Tweaked the attachment links so they are valid for markdown

It was a horribly manual process. There is a project you can use to make some of the massaging of the exported HTML faster, but it won't help if your page content never made it to the HTML in the first place.

In the end, I did it, and I have a site that I can re-style and manage quite nicely.

Mon	Tue	Wed	Thu	Fri	Sat	Sun
← Back	May '25
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Akom's Tech Ruminations