Enhancing Lock-to-Account for performance improvements

3 posts / 0 new
Last post
puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

This announcement is sent to explain why the main portal (not the service) was in maintenance mode off-and-on so much over the past 19 hours. The service itself was never uninterrupted but I had to stop people from logging in to manage accounts while I was adding major new support for the "Lock to Account" feature. Basically, I spent the last 19 hours straight coding a master "whitelist" and "blacklist" that is referenced for all users who use the feature. There should be no impact to your service but you may read more in this thread.

The primary delivery codes were modified slightly but should not cause any negative impact.

This new support means that we, ShrinkTheWeb, will be able to define domains/IP that are considered "public-use" by all users and always allowed. Domains that fall in to this category would be: facebook.com, blogspot.com, linkedin.com, etc. If you are showing our previews on sites we have listed in the whitelist, you would not need to put them in your "Allowed Referrers" for them to work (and would get an error anyway). As of this post, the first two (facebook.com and blogspot.com) are the only ones on the whitelist.

I also added a blacklist that will prevent users from adding certain domains/IP that we deem to be troublesome or undesirable. As of this post, there are no entries on the blacklist.

GEEK-SPEAK
While I did run into an unbelievable number of obstacles while coding this support, I believe that I managed to avoid any impact to any users and also left all "Allowed Referrers" lists as they were before I started. At a couple of points, there were numerous incorrect referrers added to a few thousand users but I managed to code a "fix it" script that cleaned all of those up.

For those of you who are programmer types, you may appreciate the level of difficulty this "apparently simple" enhancement truly became.

On the database management side of things, I had to build a master list of all user's "allowed referrers" against which I had to loop through all of the submitted domains/IPs to be whitelisted. If any of them were found in another account, I had to poll the CMS database to discover the exact user or users involved. Then, I had to loop through all of those (still within the first major loops) and remove those from the "allowed referrers" for each user, but of course, I had to also build an original array from which to remove that domain/IP and then rebuild the list from a modified array.

So all those loops within loops and arrays within arrays can really drive one crazy. Smile

Then, to make matters worse, I wanted this to be as intuitive as possible and that meant that users had to be able to enter each domain/IP in a one-per-line format. Sounds simple? Well, it ended up requiring a lot of testing to figure out the best way to handle all the possible bad data, various newline characters, spaces, and empty lines.

Once I finally got that down, I realized that my comparison routine wasn't working in all cases. Many times, it would not detect that the submitted referrers were the same as in the database. This turned out to be because of invisible characters, such as newlines, spaces, and vertical tabs. So I ended up having to convert all lists to an array, trim bad characters, and rebuild into a clean, re-indexed array.

When all was said-and-done, I finally had a stable and efficient way to administer the global whitelist and blacklist.

There are so many tasks left on my plate and I still have one more big tasks before I can meet tomorrow night's deadline, so I have to move on.

However, the next item on the list in regards to this (far down the list in priority) is to evaluate methods to improve the efficiency of testing all submitted domains/IPs against all other accounts listed referrers. Currently, this takes about 20-30 seconds. If I break out a cron job to keep a pre-fetch master list every 5 minutes or so, that would speed up the test to about 3-5 seconds but leaves about 5 minutes to be out-of-date. So I will have to ponder that more when I find the time. Wink

That's all for now. Ciao!

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

More GEEK-SPEAK
So as I'm digging in to the final task required before the deadline tonight, I am realizing that optimizing the performance in the way mentioned in the previous post might give all the benefits AND simplify my final task. The final task is making sure that new sign-ups get locked down but that registration returns very quickly to the user. That meant using a cron job that builds the master list every few minutes and checks the website submitted.

So that sounds like the same thing but coded in two different places, slightly differently (i.e. every programmers' bane). To avoid that scenario, we can pre-fetch, speed up the entire process in every place in the system, and that just leaves having to check for duplicates in the master list occasionally.*

*Since there is the possibility that a malicious user could sign up after a list was fetched and then sign up again with the same domain/IP before the next list is fetched, there is potential for abuse. However, we will be able to easily automate the process now and ban any new account with duplicates. We will be notified by email and investigate, at which point anyone trying to abuse the system will be BANNED FOR LIFE.

puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

Update...

Even More GEEK-SPEAK
Going live with the new lock to account feature and announcing that really bombarded us with emails from users. So I am just now getting around to updating this thread, even though I finished the work days ago.

As is sometimes the case in a programmer's life (although I'm not really a programmer), we go down a path that wastes a lot of time before we reach that "aha!" moment. This was one of those times...

Since I realized that it would make sense to run the pre-fetch together with the two different places we needed the code to run, I went ahead and coded all of that to pre-fetch, serialize using json_encode and store it in the database. There were some data obstacles that I ran into but they were relatively easy to resolve and I spent about 4 hours in total. At the end, I ran my test but this time, I placed a timer on the script. As expected, it ran the pre-fetch in about 30s-60s, which is what I was seeing upon each submit on the user side (that this pre-fetch would eliminate).

Over the next 4-6 hours, I got all of that working right up to the point of having to recursively iterate through a multidimensional array to find the proper data while using userIDs as keys (but having to have a 3rd dimension, since each user needed multiple referrer support). As I was considering to use PHP's array_flip (too slow), a recursive function with indexing (didn't work), a recursive function using array_keys (sort of worked), etc; I noticed that my pre-fetch data wasn't updating properly.

At around hour 14, I realized that I was checking every single entry to see if it was already in the array and then skipping (which is not the behavior I wanted now). Then, it occurred to me that in_array is a slow moving function too. Once I removed that function/test from my pre-fetch routine, the timer went from 30s-60s down to 0.22s! Oopsie!

So, I wasted a better part of my day/night on a Friday and worked about 14 hours to pre-fetch when really the ideal scenario is what I had before but without the in_array checks inside the master loop. I only needed one in_array test outside the loop. Now, if you've updated your "Allowed Referrers" in our system, you'll see that a test for 5 URLs against more than 30,000 other user referrers takes less than 1-2s. Jeez. And of course, it took another 2 hours just to put things back the way I had them and fully test and update the code without the looped in_array... lol.

Anyway, at least it's working now and it's fast. So it is still user-friendly and scalable.

Cheers!

ShrinkTheWeb® (About STW) is another innovation by Neosys Consulting
Contact Us | PagePix Benefits | Learn More | STW Forums | Our Partners | Privacy Policy | Terms of Use

Announcing Javvy, the best crypto exchange and wallet solution (coming soon!)

©2018 ShrinkTheWeb. All rights reserved. ShrinkTheWeb is a registered trademark of ShrinkTheWeb.