Denial of Service Attack Mitigated

1 post / 0 new
puravida's picture
puravida
Jedi Warrior
Offline
Joined: 09/01/2007
Visit puravida's Website

Hello,

Due to prior system instability, we resorted to using Twitter to keep users updated on the system stability and system outages. Our Twitter feed, for reference, is:

http://twitter.com/shrinktheweb

It has been a long ordeal of tracking down the root cause of the system instability (which took months due to incompetent "experts" who couldn't identify the problem) and then making a game plan to overcome the issue. As it turned out, while our servers never saw more than 5% utilization (very underutilized), they were running out of TCP/IP sockets and hitting other TCP limits. We tracked a few of our larger customers as having exponential growth and assumed we were just over this unforeseen limit of TCP. Therefore, we set out to overcome this limitation in the only way we could determine: with a load balanced server farm cluster. This much more expensive solution should boost TCP limits by 500% or more and provide for quite a bit of growth. However, it has taken months (numerous delays and disappearing consultants) to build out the new servers and server farm. It was a monumental task -even without the delays.

However, during that build out, I noticed that the system usage was growing exponentially on a daily basis and even exceeded the combined usage of our largest customers. After some digging, I discovered that a free user has been launching a unique type of denial-of-service attack on the service and THAT has been causing all of the headaches over the past year! This user joined last year and started off with a small number of legitimate looking requests. So it went unnoticed. Over time, his requests grew in volume until, about six months ago, his requests became so frequent that it began to cause the system instability that many have noticed. Even though the user was over his account limits and was getting nothing but error messages, each request used up a database socket. Because of how it was done, we thought it was organic growth but now we now it was a subtle denial-of-service attack launched by a man in Canada with 14 powerful servers bombarding our service with 1.5 billion requests per month and growing. In contrast, the combined usage of our largest two customers is roughly 500 million requests per month.

So this free user was monopolizing the service by using up most of the TCP/IP sockets. I have since mitigated this type of attack (which is still on-going and growing) with new code that prevents it from having any noticeable effect on the service or users. However, we are still going to spend the extra money to migrate to the more powerful server farm solution in order to prepare for growth of the service. Now that the service stability has resumed, my focus is on rebuilding our battered business --which has suffered greatly due to this hacker's malicious attacks.

On a side note, I was able to track down the hacker through his IPs (his local IP and his server's IPs) and also through his domain names (all of which share a common registrant that we believe is true information). As a result, we have filed a police report and will be launching a lawsuit against him and try to make him pay for what he has done to our service --severely hurting our business, our income, and causing much stress and damage to our loyal users as well.

To add to the grief, our primary server also experienced outages related to bad RAM, which was replaced, and is now occasionally locking up with an ambiguous reason being logged. No one is able to identify the cause of the latest lock up, so we are just that much more motivated to move to the new server farm solution as quickly as possible. The automatic redundancy of the new solution will avoid outages such as these in the future.

Due to the nature and complexity of the server farm solution with built-in redundancy and scalability, it will be noticeably slower than our current solution but we are going to pay as much as we can to speed it up as much as we can afford. In the end, it may not be too much slower. Ultimately, we feel that system stability is worth more than speed.

In short, things are back to normal and we do not expect any further system instability. We will announce a short window of downtime to switch to the new server farm solution when it is ready and fully tested. That may be as soon as next week.

Until then, all I can do is apologize to users for this denial-of-service attack and it taking so long to discover it due to the way it was launched. This has been a great source of stress for us and I know it has been a source of stress for many of our loyal users like you. We thank you for sticking through it with us and assure you that we are doing all that we can to move in the right direction.

Sincerely,

Brandon

ShrinkTheWeb® (About STW) is another innovation by Neosys Consulting
Contact Us | PagePix Benefits | Learn More | STW Forums | Our Partners | Privacy Policy | Terms of Use

Announcing Javvy, the best crypto exchange and wallet solution (coming soon!)

©2018 ShrinkTheWeb. All rights reserved. ShrinkTheWeb is a registered trademark of ShrinkTheWeb.