Preserving a Static Copy of Atlassian Jira (and Confluence) as a Jekyll Site
Posted by Admin • Monday, October 25. 2021 • Category: LinuxLast month an old Jira installation I own was compromised via a recent vulnerability. This is bound to happen. Keeping a public product like that secure would require very frequent patching, which is a lot of maintenance. Fortunately, this installation is only a historical record of a popular open-source project (current development uses github issues). In other words, I can get away with a static, read-only copy.
Going Static
Of course, I can use
wget --mirror
to save the whole site exactly as-is. This is a simple option, but it will also need a lot of massaging (for example, to remove confusing links to login, javascript that may break, etc).
Instead, I'm going to convert the content to markdown so that I can then regenerate the site using Jekyll, changing the look and feel, headers and footers as needed. This will also preserve all existing URLs (including attachments).
Making this Happen
I wrote a project to help automate this process fully via the Jira API: https://github.com/akomakom/jira-to-jekyll/.
The basic goal is to do what I outlined above. There is a comprehensive README that explains the process.
Now what about Confluence?
Confluence is a different beast altogether. In my case, I needed to not only preserve the content but to be able to add and edit pages in the future.
Confluence has a built-in option to export one "Space" at a time (Space Tools -> Export).
Unfortunately all the export options have issues:
- XML: incredibly complicated schema, a single huge file per space (or even the whole site if exporting from Settings)
- PDF: It's a PDF. It looks alright, but it won't help you preserve old links
- HTML: This is a decent option, except that it loses a lot of "special content". Any widgets or plugin-formatted pages don't survive
And none of these options preserve attachments. There is an API, which is quite complex as well.
The way I actually exported from Confluence was as follows:
- Started with HTML export of a single Space
- Massaged the data
- Renamed files (half of them were named using page IDs)
- Generated Jekyll front matter manually (especially permalink to preserve original URLs)
- Copied attachments from the Confluence server manually
- Tweaked the attachment links so they are valid for markdown
It was a horribly manual process. There is a project you can use to make some of the massaging of the exported HTML faster, but it won't help if your page content never made it to the HTML in the first place.
In the end, I did it, and I have a site that I can re-style and manage quite nicely.
0 Comments
Add Comment