It’s been a while since I did a geeky work post (apologies to friends reading this).
Today I spent ages looking on the web for how to get the title of a PDF out of the file using C#. I couldn’t find anyone that had done it, so I opened the PDF in notepad and the title was just in there, in plain text, really easy to get at. So, here’s the code for the benefit of future googlers on the subject…
using System.Text.RegularExpressions;
using System.IO;
...
private static readonly Regex REGEX_PDFTITLE = new Regex("<< /Title \\((.*?)\\)", RegexOptions.Compiled);
private string GetPDFTitle(string pdfFilePath)
{
using(StreamReader sr = new StreamReader(pdfFilePath))
{
string currentLine = null;
while((currentLine = sr.ReadLine()) != null)
{
Match m = REGEX_PDFTITLE.Match(currentLine);
if(m.Success)
{
return m.Groups[1].Value;
}
}
return String.Empty;
}
}

This wont retrieve the title if there is no area that says “Title”. In most cases the title will just be displayed with no text stating it is the title, there needs to be some way of actually getting any title out of a pdf!