New Capture Generators Launched (v5-0-1)

4 posts / 0 new
Last post
puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

In short, we have launched an updated version of our capture generators to overcome a new issue that occurred in a small, but growing, percentage of requests. On sites that used browser-based drawing tools, our rendering engine would not properly handle certain cases of floating point precision when the requested image size was larger than 640px wide. This ultimately turned out to be a "bug" within the Gecko (foundation of Firefox) platform on which our service has been based for the past several years.

This update overcomes this, and a few other obscure, issues. All-in-all, there were only about 1,000 errors per 1 million requests, but the number appeared to be climbing quickly. So it's best to address it before it becomes a bigger issue.

GEEK-SPEAK
During routine monitoring of the system, we noticed that there were some strange errors in the logs. The status code for some broken requests were simply "to" instead of HTTP:200 or BLANK_DETECTED or similar. Upon further investigation, the URLs in all of those cases were working and the "to" was an artifact of how we pull error information. The error condition being generated was a new one for us and wasn't handled properly.

All of those errors have been refreshed after the updated generators went online, and all of them captured correctly this time around. We do not foresee any further errors of this kind.

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

Update: There was a slight hiccup in the latest upgrade, due to a new option in the SDK where the default was overriding our custom setting. In short, the failure rate spiked to 2% and generated "BLANK_DETECTED" on a number of working sites. This was due to an old issue where sites would not properly validate to OCSP security. The new setting broke our "fix" for that, but we have pushed an update to "fix" the new option as well. So, things appear to be capturing properly now. I have placed about 100,000 broken requests back in queue, and those should be refreshed within an hour or two.

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

Update: I did not update again until now, because it was unclear what was continuing to cause incomplete captures, BLANK_DETECTED, and CAPTURE_FAILED errors. It took 6 days to test and discover that a "security" change to a piece of software we were forced to upgrade (to overcome the original issue) caused these issues. The "fix" for it was complex and took some time to fully test. That "fix" has been deployed and overcomes the security issues with that piece of software.

So, the capture success rate is likely even better than before the recent issues, but we will monitor to make certain of that. The downside is that the newer versions of software we are running seem to be much slower than before. Capture times have spiked from 5-12 seconds up to about 18-30 seconds for the same URLs. Once we are confident that the issues are resolved, we will investigate the slowness and see if there is a viable workaround.

I have put 87,000 captures, BLANK_DETECTED and CAPTURE_FAILED, back into queue to try again --as some of them may actually be working URLs and should capture now. However, there is no way for us to know if a site has "partially captured", so if you see any; please give them a quick refresh. Aside from a manual refresh, the system's monthly refresh should fix any lingering incomplete captures over the next 30 days.

Please do let us know if you see any strange behavior from this new version of capture generators. So far, with this fix in place, things are looking good (aside from being slower).

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

Update: I have just pushed another update to the capture generators to overcome the "slowness" caused by the forced update of the website render platform we currently use. It looks like capture times are back down to 5-12 seconds average. There was a "bug" in the new capture platform version that was causing our capture process to hang for 10 seconds on each capture. Disabling that option has cleared up the hang condition and does not seem to have adversely affected captures. I spent 8 hours testing a few hundred troublesome requests today, to ensure that the performance issue was corrected without causing new issues. We will continue to monitor the queue, as one user just put 900,000 in queue a few hours ago, but it currently looks like everything is running well now.

GEEK-SPEAK RANT
It is really tiresome that so many software vendors, these days, seem to disregard their existing users' stability in lieu of "innovation" and "making progress." It seems like just about every software developer has been pushing out so many updates, and even making them mandatory in some cases, and they do not ensure backward compatibiity or even support 1-2 years old implementations. IF IT ISN'T BROKE; DON'T FIX IT. The current version of capture code I wrote has worked well for over 5 years (with relatively minor enhancements to handle advancements, like CSS3 & HTML5), but now the underlying platforms are no longer supported and the new platforms are buggy (as this exercise in upgrading, to overcome another "bug" in their older platform, has proven).

In contrast, when I release an update, I make sure that existing users' code will still function. What a novel idea, eh? It used to be that way for other companies as well. I am about to make a HUGE change to ShrinkTheWeb's API later this year, in the name of innovation and expansion, but existing user integrations will still work. Imagine that. It's just that IF they want to take advantage of the new features, they will need to update their sample code library or plugin/module.

Software companies these days are just too big to not be lazy, I suppose. That includes the big boys, like WordPress, Adobe, Microsoft, Mozilla, Google (Chrome), etc. They are all guilty.

To compound the problem, our CMS vendor pulled the plug on that too, and so we're forced to make a 1,000+ hour migration to a new CMS (requiring updating of 10,000+ lines of code and hundreds of queries). This is despite writing all of our code and queries to follow "Best Practices" and be portable, way back when, but everything has changed in the past 5 years. So now we're having to rebuild again.

Ok. End of rant.

We'll get there. It will just take a bit longer and require more all-nighters, but it will happen. We just have to keep up the good fight! Smile

ShrinkTheWeb® (About STW) is another innovation by Neosys Consulting
Contact Us | PagePix Benefits | Learn More | STW Forums | Our Partners | Privacy Policy | Terms of Use

Announcing Javvy, the best crypto exchange and wallet solution (coming soon!)

©2018 ShrinkTheWeb. All rights reserved. ShrinkTheWeb is a registered trademark of ShrinkTheWeb.