Full text indexing for PDFs

More
7 years 10 months ago - 7 years 10 months ago #62565 by agata
Hello,
Is the possibility of full text indexing for PDFs (uploaded via the download field) anywhere on the roadmap? So that search could be performed not only for document title and meta data but also for the text within the PDF as well. Something like this: extensions.joomla.org/extension/os-pdf-indexer
Thanks,
Agata
Last edit: 7 years 10 months ago by agata.

Please Log in or Create an account to join the conversation.

More
7 years 10 months ago #62574 by ggppdk
Replied by ggppdk on topic Full text indexing for PDFs
Hello

we could do this with just a little more work
- but we would need to install 1 more 3rd part PHP library to be able to extract text from the PDFs

e.g.
github.com/smalot/pdfparser


-- Flexicontent is Free but involves a big effort on our part.
Like the our support? (for a bug-free FC, despite having a long list of functions) Like the features? Like the ongoing development and future commitment to FLEXIcontent?
-- Add your voice to the FLEXIcontent JED listing with a 5-star...
The following user(s) said Thank You: agata

Please Log in or Create an account to join the conversation.

More
7 years 10 months ago #62586 by agata
Replied by agata on topic Full text indexing for PDFs
That would be amazing! Would it work with all 3 search plugins (Flexi advanced search, Joomla search and Finder) and the ACL? What do you think the timeframe would be? If you need to talk over any details, please PM me.

As it turns out, a document manager we were going to use has a shortcoming that the devs can't fix that disqualifies it from being used on this particular project. So this is a life saver.
Million thanks,
Agata

Please Log in or Create an account to join the conversation.

More
7 years 10 months ago #62861 by agata
Replied by agata on topic Full text indexing for PDFs
Hello,
Could you give me an idea as to when this might happen?
Also, another handy feature would be to be able to time stamp selected documents so that when a person downloads or views a document, a date of that download/view gets added to the bottom of the document. There are ready libraries to do that as far as I know, it would again be a matter of implementing it within FC.
Please don't hesitate to send me a message if the way to make this happen was to hire you for these tasks.
Best,
Agata

Please Log in or Create an account to join the conversation.

More
7 years 10 months ago #62886 by agata
Replied by agata on topic Full text indexing for PDFs
This could be the library for time stamping PDFs perhaps? www.kryptokoder.com/signwithext.html
Best,
Agata

Please Log in or Create an account to join the conversation.

More
7 years 10 months ago #62887 by ggppdk
Replied by ggppdk on topic Full text indexing for PDFs
Hello

this is probably little work
- every field has a method to prepare indexed text
in it we load the external lib and get the PDF text

also our indexer is made to handle delays properly
- thus if this is enabled then text-indexing/re-indexing will be slower, but no timeouts will occur

i am concerned of the FILE SIZE of FLEXIcontent package,
- i will move this and 1 or 2 more 3rd party libs into a seperate package and add a warning / download link, if someone tries to use them


-- Flexicontent is Free but involves a big effort on our part.
Like the our support? (for a bug-free FC, despite having a long list of functions) Like the features? Like the ongoing development and future commitment to FLEXIcontent?
-- Add your voice to the FLEXIcontent JED listing with a 5-star...

Please Log in or Create an account to join the conversation.

Moderators: vistamediajoomlacornerggppdk
Time to create page: 0.591 seconds
Save
Cookies user preferences
We use cookies to ensure you to get the best experience on our website. If you decline the use of cookies, this website may not function as expected.
Accept all
Decline all
Essential
These cookies are needed to make the website work correctly. You can not disable them.
Display
Accept
Analytics
Tools used to analyze the data to measure the effectiveness of a website and to understand how it works.
Google Analytics
Accept
Decline