Regular Expression to Parse Word-style Footnotes into WordPress's Simple Footnotes Format
Ben Balter
March 20, 2011
I needed a quick-and-easy way to parse Microsoft Word's footnote format into a more web-friendly format for a recent project. After a bit of regular expression hacking, I was able to build a WordPress plugin to automatically convert content pasted from Word into a format readable by Andrew Nacin's popular Simple Footnotes plugin. The process is surprisingly simple given WordPress's extensive filter API. First, to grab the footnotes from Word's format: This creates an array ( ) with the both the footnote number and the text of the footnote. We then need a way to replace the in-text reference with the parsed footnotes so that Simple Footnotes can understand them. I did this by creating two arrays, a find array and a replace array with each Word-style footnote reference and its Simple Footnote formatted counterpart: Finally, so that the entire replacement can be done in a single pass, push a final find/replace pair into the end of the array, to remove the original footnotes: Because PHP's function can handle arrays, all we have to do is run a single function: You can find the full plugin code in the GitHub repository. To use, you can download the plugin file[^1] and activate (be sure you already have Simple Footnotes installed). Copy the content from Word, and Paste into the "Paste from Word" box (may need to toggle the "Kitchen Sink".[^2] Thoughts? Improvements? The above code solved a rather stubborn workflow problem in a project I was working on, and hopefully it can do the same for you. Feel free to use/improve the above code. [^1]: Licensed under GPLv2 [^2]: You can even Fork the plugin over on GitHub
Discussion in the ATmosphere