đź“ť PDFtoMD đź“ť

Github Link


PDFtoMD is a small utility I designed to transform PDF files into Markdown (.md) documents. Developed to manage and reformat PDFs downloaded from Apple Notes using Wondershare Dr.Fone, this tool simplifies the process of organizing notes by converting them into a more editable and manageable format.

The primary function of PDFtoMD is to remove the header and footer text added by the Dr.Fone tool during the PDF creation process and to convert these PDFs into Markdown files. This conversion was crucial as I moved over to managing my notes in a text-based format, offering greater flexibility and compatibility with various text editors and note-taking applications. What pushed me to do this was seeing other implementations of note taking using tools such as


Inspiration


My journey to creating PDFtoMD began with a quest for a more efficient and sustainable method of note-taking. Like many, I was drawn to the digital realm for organizing my thoughts and ideas. My exploration led me to various platforms and tools, each offering unique approaches to note management. It was through watching YouTube videos about note-taking systems using tools like Neovim, QOwnNotes, and Obsidian that I found my inspiration. These platforms demonstrated the power of digital note-taking in a way that resonated with my needs for accessibility and organization.

For years, my primary note-taking tool had been my iPhone. It was convenient and always at hand, but it lacked the flexibility and accessibility I needed, and the loose nature of the notes meant that it was harder to keep myself organised. I wanted my notes to be like a second brain of information, accessible across various platforms and easily manageable. This desire led me to seek a method to migrate my notes from the iPhone into a format that aligned with these aspirations.

My familiarity with Markdown, cultivated through the creation of my portfolio site using a Jekyll framework, played a pivotal role in this transition. Markdown’s simplicity and versatility made it an ideal choice for my notes. It was a format I had come to appreciate for its clarity and ease of use. After considering various platforms, I settled on Obsidian. Its multi-platform nature (supporting both iPhone and Windows) and the extensibility offered by community-made extensions made it the perfect fit for my needs.

The challenge then was to convert my existing notes into this new format. The notes, exported from my iPhone via Wondershare Dr.Fone, were in PDF format – not ideal for the kind of fluid, interconnected digital workspace I envisioned with Obsidian. Thus, PDFtoMD was created out of necessity. It was made to bridge this gap, to transform these static PDF files into dynamic Markdown documents, ready to be integrated into my new digital note-taking ecosystem.


Functionality


The tool is built using C# and targets the .NET framework. PDFtoMD operates by scanning a specified directory for PDF files and then processes each file individually. It extracts the text content from the PDFs, cleans up the formatting by removing unwanted headers and footers, and then converts the content into Markdown format. The resulting Markdown files are saved in a separate output directory, ready for use.

This is achieved through the use of regular expressions (regex), a powerful tool for text processing and manipulation.


Usage


Users can easily set up PDFtoMD by defining the input and output directories in the Program.cs file. Once configured, the tool will process all PDF files in the input directory and output the converted Markdown files in the designated output directory. It is run within an IDE because the use case I had was very niche, and programming a user interface wasn’t even a priority. I just needed to convert my PDFs and as such it is a very basic tool.


Implementation


using System.Text;
using System.Text.RegularExpressions;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using Path = System.IO.Path;

namespace PDFtoMD;

internal static class Program
{
    private const string InputDirectory = @"C:\Games Development\C#\PDFtoMD\PDFtoMD\Input";
    private const string OutputDirectory = @"C:\Games Development\C#\PDFtoMD\PDFtoMD\Output";

    private static void Main()
    {
        foreach (string filePath in Directory.GetFiles(InputDirectory, "*.pdf"))
        {
            try
            {
                ConvertPdfToMarkdown(filePath, OutputDirectory);
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error processing {filePath}: {ex.Message}");
            }
        }
    }

    private static void ConvertPdfToMarkdown(string filePath, string outputDirectory)
    {
        using PdfReader reader = new PdfReader(filePath);
        StringBuilder cleanedText = new StringBuilder();

        for (int i = 1; i <= reader.NumberOfPages; i++)
        {
            string text = PdfTextExtractor.GetTextFromPage(reader, i);
            cleanedText.Append(CleanText(text));
        }

        using StreamWriter writer = new StreamWriter(Path.Combine(outputDirectory, Path.GetFileNameWithoutExtension(filePath) + ".md"));
        writer.Write(cleanedText.ToString());
    }
    
    private static string CleanText(string text)
    {
        text = Regex.Replace(text, @"\bNotes\b", "");
        text = Regex.Replace(text, @"Dr\.Fone iPhone \(210\) \d{2}/\d{2}/\d{4} \d{2}:\d{2}:\d{2} (AM|PM)", "");
        text = Regex.Replace(text, @"(.+?)\s+\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}\s+\1", "");
        return text;
    }
}

Future Aspirations: From Tool to Plugin


Looking ahead, I envision taking PDFtoMD to the next level. My goal is to evolve it from a standalone tool into a comprehensive solution that can directly extract notes from an iPhone and convert them into Markdown format, specifically tailored for Obsidian. Inspired by my recent uptake of Obsidian, my idea is to create a plugin that seamlessly integrates apple notes on iPhone, into the platform, simplifying the process to migrate notes for end users. This development would not only benefit my personal workflow but also offer a valuable resource to others seeking to optimize their digital note-taking practices.