Cyotek Forums

Source Code => Source Code => Topic started by: iPhiTech on February 22, 2018, 01:26:22 PM

Title: PdfImagebox can open searchable PDF
Post by: iPhiTech on February 22, 2018, 01:26:22 PM
Hello, Sir, I'm here Again

If Possible can open searchable PDF and can highlight not the selection highlight.
I'm getting Error when i try to load searchable PDF.
Title: Re: PdfImagebox can open searchable PDF
Post by: Richard Moss on February 22, 2018, 05:38:30 PM
Hello,

Sounds like you're trying to go beyond the bounds of what the PdfImageBox sample was supposed to do, which is to convert a page in a PDF to an image and display it.

You haven't posted any details about the error, but if it's an error opening a PDF it's doubtful I can help - the PDF  to image conversion is pretty much all done via Ghostscript which is a third party library that I used.

I really have no idea if this is possible or not, you're probably better off finding a library that can read the actual elements in a PDF document (iTextSharp comes to mind), that way you can perform your searching etc, although I don't know how you'd render them.

Regards;
Richard Moss
Title: Re: PdfImagebox can open searchable PDF
Post by: iPhiTech on February 26, 2018, 06:57:53 AM
(https://i.imgur.com/uywDzhe.png)

when I'm trying to open searchable pdf getting this error. how about if I show some message box the pdf is searchable.
Title: Re: PdfImagebox can open searchable PDF
Post by: Richard Moss on February 26, 2018, 07:30:44 PM
Check what the value of result is and see if matches a Ghostscript error code (https://ghostscript.com/doc/current/API.htm#return_codes). It could be that you're passing the wrong parameters, for example.

Edit: Actually, I can see from the locals window that the result code is -100, which is a catchall for a fatal Ghostscript error. In that case, I don't know what to suggest - I actually haven't used Ghostscript for years and am unlikely to anytime soon.
Title: Re: PdfImagebox can open searchable PDF
Post by: iPhiTech on February 26, 2018, 07:38:20 PM
any suggestion to catch the error
Title: Re: PdfImagebox can open searchable PDF
Post by: Richard Moss on February 28, 2018, 05:04:26 PM
You can use normal management exception handling

try
{
// perform ghostscript operation
}
catch (GhostScriptException ex)
{
  // do something with the exception
}
Title: Re: PdfImagebox can open searchable PDF
Post by: iPhiTech on March 01, 2018, 11:47:23 AM
sir any idea how to validate pdf that is already searchable or not
Title: Re: PdfImagebox can open searchable PDF
Post by: Richard Moss on March 03, 2018, 03:17:40 PM
Hello,

As I mentioned, I don't really work with PDF's so off the top of my head I don't know. A few seconds searching however brought up this answer (https://stackoverflow.com/a/5003230/148962) on stackoverflow which describes how you can extract text from PDF files. As I believe a "searchable" PDF is simply one that has text elements then this should cover what you need.

This uses the iTextSharp (https://www.nuget.org/packages/iTextSharp/) package. I tested it in a console application and it spat out all the text in the sample PDF file I provided, so this may suit your needs.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using Path = System.IO.Path;

namespace PdfCheck
{
  class Program
  {
    static void Main(string[] args)
    {
      ListText(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "sample.pdf"));

      Console.ReadKey(true);
    }

    private static void ListText(string fileName)
    {
      StringBuilder text = new StringBuilder();
     
      using (PdfReader reader=new PdfReader(fileName))
      {
        for (int page = 1; page <= reader.NumberOfPages; page++)
        {
          ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
          string currentText = PdfTextExtractor.GetTextFromPage(reader, page, strategy);

          currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
          text.Append(currentText);
        }
      }

      Console.WriteLine(text.ToString());
    }
  }
}


Hope this helps.

Regards;
Richard Moss