Adding support for notify callback when web page screenshots ready (avoid polling)

10 posts / 0 new
Last post
puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

Note: This feature is not currently applicable to the URL-to-PDF feature, as it returns PDF data in real-time. If there is a need or justification for returning the PDF in a notification, please let us know and we will consider it.

Since ShrinkTheWeb's launch, we have always instructed users of the "Advanced Method" API to poll on a graduating delay (30s, 60s, 90s, etc) to download requested web page screenshot captures when ready. While this method works and is relatively straightforward, it is not really scalable.

Although it has been "on the list" for years to provide a notify callback, providing a more robust delivery of images, potentially much quicker turnaround, and reducing unnecessary requests to our servers; there are always a hundred other "urgent" tasks to handle. So it kept getting pushed back.

Now, though, I'm happy to announce that we will finally be releasing a notify callback that will automatically contact your server when a capture is ready and will send information about the capture, along with the image data itself. This will all happen within a single call (POST), reducing resource usage on both sides.

This feature will be released soon, so please leave a quick comment to subscribe to this thread, if you are interested in trying it out. On release day, we will also update the Sample PHP code with a sample "notify callback" script that you can just drop in place.

Much of the effort this year has been done in preparation for our upcoming API overhaul that will pave the way for adding support for some great, new features that go beyond just capturing and delivering screenshots of websites. The details are still being kept under wraps, but there is a reason to all the madness!

Stay Tuned!

GEEK-SPEAK
For those who are curious, the notification callback from our server will be in JSON format. If requested, we will add XML format, but we feel that users will not require XML, so we are not going to support it initially.

In regards to the single POST notification with image payload, there will be an option to NOT get the image payload, opting for the URL to download from the API instead. The benefit to overriding the default for making a secondary download request is that some users, with large images (1 Mb or more), may save on bandwidth costs. The reason is that including the image data with the JSON response requires Base64 encoding of the image data, which adds 33% overhead to the size. If you override the default and download from the API, it means making an extra call but will reduce the bandwidth usage.

Lastly, when we release the updated API later this year, we will switch to JSON as the default (from XML now) for all requests. There will be entirely new Sample PHP Code for users to drop in, so that they can take advantage of the new features. However, we will also make notes of the minor changes required to upgrade custom integrations that cannot be so easily replaced with our Sample code. With a few easy find/replace searches, the conversion should be easy, and for those who absolutely cannot make changes; the current API will remain operational, indefinitely. It just won't allow users to take advantage of the new features that are planned.

If anyone has questions or ideas, please feel free to share or ask away! Smile

naftali
Offline
Joined: 09/16/2014
Visit naftali's Website

Not sure how to subscribe without leaving a comment.

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

Good point, naftali!... I could have sworn that you could subscribe without a comment, but I just tested it and see that a "Comment is required" when checking "This post" under "Subscriptions." Sorry for the confusion.

By the way, I have gotten this feature working in testing. It is very slick and very fast. I will be releasing it in BETA shortly.

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

Update: I have quickly put this together for BETA testing, since a few users were asking and were really eager to get their hands on this feature. As I quickly documented this just to get it out there, I may have missed something. Just let me know if you run into any problems and I'll take a look.

Steps to use (assumes you are using the sample PHP code):

  1. Visit the documentation area and download the newest v2.0.8 PHP Sample code
  2. Configure the KEYs, storage folder, TOKEN, CALLBACKURL, etc in the stw_example_code.php file (or whatever you named it)
  3. Upload the stw_notify.php somewhere on your server and make sure it is publicly accessible
  4. In the stw_notify.php file, make sure the stw_example_code INCLUDE is working (in case you renamed any files or need to provide explicit path)

If you are getting very large images (for example: 1024, full-length), you may benefit from downloading directly from the API instead of having us push the file to you (pushing the file bloats it by about 33%, uses extra bandwidth due to required base64 encoding on JSON or XML data transfer). If you would rather the script save bandwidth and make a round-trip extra call instead, set the NOTIFYNOPUSH value to 1 in the stw_example_code.

Best regards,

Brandon

p.s. This is the time, while I'm in this code, to make any requests or suggestions. Wink

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

GEEK-SPEAK
I was hoping this task would take about 10-15 hours, but it ended up taking 80 hours. The reason is that I kept thinking of so many ways it could break (bad notify URLs, 404 errors, broken integrations, and of course issues with the feature itself). So, I have written in a lot of retry logic, timing, monitoring, debugging, and logging.

Here's a list of some of the aspects this feature takes into account:

  • Accepts a notify callback URL, if it doesn't have a history of not working
  • Admin (us) has ability to easily de-list blacklisted notify URLs
  • System will retry failed notifications up to 3 times, with a 10 minute delay between each attempt
  • Any notification failures will be logged in the STW portal, with a specific reason
  • Any successful notifications will be logged, if you enable "Full Logging"
  • When you log in, you will see a note if any of your callback URLs are blacklisted
  • This feature handles POSTing of the notify, which overcomes limitations of using GET requests with software such as "Code Ingniter" that exposes URLs like /api/webhook/ and ignores query string (not sure if still an issue but saw this reported elsewhere)
  • Using the POST method also allows for larger data transfer, by default, without having to configure PHP or web servers (as GET is more restricted)
  • This feature is very secure by supporting a customized TOKEN for the remote script access, SSL/HTTPS callback URLs, and secure API follow-up (optional notifyNoPush) by using your locally stored KEYs
  • Notifications are handled by multiple processes, running in parallel, on multiple web nodes
  • Monitoring processes will detect if a notify process is stuck (badly broken URL or malicious script that holds open connections) and will reset within 60 seconds (and blacklist the URL)

Anyway, I think that's it for now. Back to the grind, I go. Wink

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

Update: In my rush to get this out-the-door, I missed a couple of nuances with the hash calculations. I made sure everything else was working but just now got to the part of making sure the hashes were outputting correctly from initial request and back through to the notify script. Unfortunately, it didn't work without some tweaking.

I have uploaded a revised PHP Sample code (v2.0.8 still but dated 04/19/2016).

See the changelog for details.

GEEK-SPEAK
Basically, I had to modify the filename hash calculation a bit, because we do not know some pieces of information when a request is made and cannot infer that information when posting to the remote site. So the code now accounts for that and cleaned up the hash calculation at the same time. It was using some parameters in its calculation that it really shouldn't have been. I did not review that closely back when I had a developer rewrite those sections, because the hash on the client side was really arbitrary. However, with notify callbacks, it is important to match the hashes or else we'll have multiple files for the same request and will never realize that a request has been downloaded. In my testing, everything seems to be working well, but there are a lot of scenarios, and I've only tested 2 of them so far. Once I get my test site fully integrated and updated, I will run a full gamut of tests.

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

Update: I've integrated this feature into a couple of test sites and have run through a myriad of requests. I monitored several factors and saw that throughput and response are pretty good. The time from request to image delivered is very fast (usually 500ms-2s from the moment we finish capturing). Of course, nobody but a handful of BETA testers are using this feature right now, so delays can be expected at high volume, but this is a good start.

As far as bandwidth usage, I noted that the base64 encoding seems to add 50% to the image size, instead of the 33% most developer threads suggested. I saw that smaller images may bloat by about 33%-50% but that larger images were usually closer to 50% bloated. So that is a consideration for using the round-trip method (NotifyNoPush set to 1). In my testing, the round-trip method added a mere 250ms to the overall time to deliver a new capture.

We will be running multiple notifier process, across multiple servers, so the throughput should be quite good, even at volume. However, if some of our extremely high volume users (those who request 750,000 New Requests with a couple of hours, for example), the whole notification queue may get backed up for hours for older, cached refresh requests. Since our algorithm gives priority to New Requests over Refreshes, those will still be returned quickly.

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

I just added a note to the initial post in this thread, regarding URL-to-PDF not currently being supported by the callback notification. In short, the reason is that this is the one feature we offer, currently, that responds in real-time. This is done this way, because the demand is low for PDFs of web pages, and we do not expect volume to increase to such a point that would become a problem. However, we may change this behavior, if usage and volume of URL-to-PDF does grow substantially.

So sorry to the user who frustratingly kept trying and wondering why there was never any callback for his PDF requests! I wanted to document here, so that others may avoid the same frustration. Currently, if making an "Advanced Method" request for URL-to-PDF, you just save the data response, as-is.

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

Update: I have added some additional logging for the "Unknown Reason" failure case for Notify Callbacks. In this case, if there is any output on your website, usually an error message, then you can see what our script sees by click on "Error Details" beside the error in the logs. This should greatly simplify the process of troubleshooting.

I also discovered that failed Notify Callback URLs were not being blacklisted properly in many cases. This has been corrected tonight and should properly blacklist from now on.

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

Update: Yesterday, I added a small enhancement to the logging of screenshot requests. Now, when you hover over the "question mark" (?) to the right of the status message, you will see the request details as usual, but an additional line will be added if a notify callback was requested.

If the notify callback has been queued, the note will be: "Notify Requested"

If the notify callback URL is blacklisted, the note will be: "Notify Ignored (Blacklist)"
In this case, no notification will be sent.

For screenshots showing the mouse hover, displaying request details and notify status, please visit our article on Screenshot Request Full Logging

ShrinkTheWeb® (About STW) is another innovation by Neosys Consulting
Contact Us | PagePix Benefits | Learn More | STW Forums | Our Partners | Privacy Policy | Terms of Use

Announcing Javvy, the best crypto exchange and wallet solution (coming soon!)

©2018 ShrinkTheWeb. All rights reserved. ShrinkTheWeb is a registered trademark of ShrinkTheWeb.