Writing Persian in Markdown and converting it to different format by Pandoc

Writing Persian in Markdown and converting it to different formats using Pandoc is not always hassle free. In fact few steps are involved to get everything up and running which described in this post.

Basically three steps should be done to setup everything to be able to convert written Persian text in Markdown to different formats using Pandoc. They are as follow:

Adding annotation to Markdown files containing Persian text
Persian font installation
Run Pandoc with proper parameters

Without taking the above steps any attempt to convert Persian Markdown text to other formats will be failed with either Pandoc errors or the end results won’t be desirable, such as separated characters or even empty documents.

It is good to note that these steps are only tested in Ubuntu distribution and may slightly vary in different flavors.

This instruction can be applied to add Arabic language support to Pandoc as well, only step two will be different.

Step One: Adding annotation to Markdown files containing Persian text

For each Markdown file that contains Persian or Arabic (generally any right to left language) contents, the following annotation should be added to the first few lines of the file before the actual content.

---
lang: ar
dir: rtl
title: "Put the title"
author: "Author name"
date: "Publication date"
---

This is necessary to explicitly tell Pandoc to set text direction right to left.

Step Two: Persian font installation

To be able to get desired output for a Markdown file using Pandoc, proper Persian fonts should be installed. So, Pandoc can utilize them to render the text to the destination format, let say PDF.

This step can be skipped, if the system has proper Persian fonts installed.

Unfortunately, Ubuntu does not provide great Persian fonts that can be supported by Pandoc, even though they are used comfortably and flawlessly in applications such as LibreOffice. Hence, installation of a set of proper Persian fonts is a must.

On the bright side, there is a great script that automates this task which is called persian-fonts-linux from Fzerorubigd.

To run the installation script, just clone the project at this URL.

And then execute farsifont.sh or zfarsifont.sh script depends on preference. The first script is CLI whereas the second on is GUI. Both approaches are quite straightforward.

Step Three: Run Pandoc with proper parameters

Now that everything is setup, the only remaining step is to executes Pandoc with proper parameters to generate the output correctly.

To do so, run the following code:

$ pandoc -s [Input.md] -t beamer -o [Output.FileType] --latex-engine=xelatex -V mainfont='BNazanin'

In the above command the first and second arguments set the input file name and the output file name with its type which are very general. The specific yet important parameters to this use case are --latex-engine=xelatex and -V mainfont='BNazanin'. The former sets Latex engine to xelatex since the default engine does not support Persian and any attempts to compile with the default engine will be resulted in errors. The latter parameter guides Pandoc to use BNazanin font to render the output. Basically, the previous step is done just to acquire necessary fonts. The absence of this parameters will usually be resulted in blank text or separated characters. But there is flexibility in font selection. As an instance, ‘BZar’, ‘BNasim’, ‘BMitra’, and other variety of fonts can be used. BNazanin just selected for demonstration only in the above example.