Tuesday, September 8, 2015

LaTeX to MS Word

The good news first: our paper on wrinkling yoghurt was accepted. I won't tell you where since it is under embargo. If you want to know what it means to have a paper accepted, and what sort of work in involved (disclaimer: comparable to the amount of work need for the research in itself), have a look at this:



But last Friday night, 11pm, we received a mail from the editorial office saying something like
Our typesetting department must have Word document files of your paper [...].
Please provide these files as Word documents as quickly as possible and within the next 8 hours to keep your paper on schedule for publication.
Well, except for the short notice and the looming weekend, there was a big problem. Our paper was not a Word file and the conversion is all but easy. Let me explain

Academic paper formatting


But there is another last step not described here: formatting. Of course, you can open any text editor (notepad, your webmail) and type words after words to write the text of your article. But then, you would miss:
  1. figures
  2. equations 
  3. cross-references
Figures are the graphs, the pictures and the drawings. You can insert them if you switch to any modern word processor, LibreOffice or Microsoft Word for example. Getting them beautiful and right is a work in itself.

Equations are a nightmare with Word. Just writing y≈αx7 makes you seek in 3-4 different menus. But you can do it. Markup languages, like the HTML of this page, makes it easier once you know the syntax. LaTeX is a markup language made for equations. The above equation just writes $y=\alpha x^7$. LaTeX syntax for equations has become a de facto standard, so other languages likes Markup, offer to write the math parts in LaTeX. Plugins in LibreOffice also do that.

Cross referencing means that in your text you can write "see Figure 3", then switch figures 3 and 4 and have "see Figure 4" written automatically in your text. In practice you do not write the figure number, but insert a reference that the program will convert into the figure number. You can do that with Word through one menu and a rather unhelpful dialogue box. In LaTeX it is just "Figure \ref{fig:velocitygraph}". Actually, in the final document, you can even get an hyperlink to the figure. Same with chapters, sections, equations.

What is great with LaTeX is that your bibliography can be generated the same way. You just insert "was discovered recently \cite{Leocmach2015}" and the paper Leocmach2015 gets inserted in your bibliography, formatted properly and consistently. In the final document you would get "was discovered recently [17]" with an hyperlink going to the 17th item in your bibliography. Of course, you have ways to do that with plugins in Word or LibreOffice.

LaTeX is also nice because you can specify what you mean and then let the program format it for you. For example, when I want to write "10 µm" what I mean is ten micro metre, not "one zero space greek letter mu m", so in LaTeX I write "\SI{10}{\micro\metre}" and it will generate a "10", followed by an unbreakable space (you don't want the number and the unit on different lines or pages), followed by a micro sign µ (different from the μ in some fonts) and a "m".

By the way, LaTeX is open source and free, no need for a licence. The "final document" is a PDF that anybody can read. Actually, until Friday night Editors and Referees of our paper had only seen and judged the PDF. Nobody was caring about formatting (even if it helps to have a clean looking paper to show rather than a messy Word file).

So LaTeX is made for academic paper writing, and heavily used in Math, Physics and other communities. It would be unthinkable that a journal specialised in Physics refuse LaTeX formatted paper. However for Biology, Word is the norm. It must be a pain for the typesetting departments who have to translate Word format into something more usable. Broad audience journals often accept both, but not the journal we submitted to.

Latex to Word conversion

I spent most of my Saturday thinking about a reliable and reusable way to convert my paper. This won't be the last time I am asked to provide a Word file. I received advices on Twitter, tried various solutions, all unsatisfactory, and at the end I settled to this method:
  1. Dumb down the LaTeX layout. I was using a two column layout with figures within the text, I switched to a single column layout with a figure per page at the end of the document.
  2. Add \usepackage{times} to your LaTeX preamble in order to use Word default font Times New Roman.
  3. Let LaTeX make the PDF. All the commands, custom packages, etc. are taken into account. Cross references and bibliography are also right.
  4. Convert PDF into Word. @fxcouder did it for me using Adobe. There are probably open source ways of doing it. Simple equations were preserved, but as soon as fractions were involved the format was messy.
  5. Clean the Word file. No messing up, you need a real Microsoft Word with a licence. Work in 97/2000 compatibility mode. You need to show formatting marks, track down and delete the section breaks to obtain a single flow of text. Fix also the line breaks and hyphenations for the paragraphs to be in one piece. All messy equations must be cleaned up, leaving only the numbering for the numbered ones.
  6. Re type the equations. I did this by hand in Word since I had few equations to retype. 
  7. Copy the whole text and paste in into the template provided by the editor.
Rather than retyping the equations, an other way would be to process the original tex document into an ODT (LibreOffice) document using pandoc.
pandoc -s article.tex -o article.odt
Pandoc is messing cross-references, citations and anything custom in your LaTeX code. Do not use it for the conversion of the main text. However it gets the equations right. Then from LibreOffice, you can export the equations and import them back into the Word document.


So, why worry? It takes you only half a day for a 15 pages paper instead of the 8 hours requested by the editorial office.