Scraping a DIV Element from a Web Page with PHP

I recently read an article about CodeEval, a free gamified website for ranking developers, and bringing employers and developers together. Essentially, a developer can sign up, complete coding challenges, and earn badges and a “Hacker Ranking” that will compare his or her skills to others who have signed up on the site. Also, completing some challenges will allow the developer to unlock the ability to apply for jobs with various tech startups through the site.

After completing some of the challenges I decided to see if, like Klout and some other social ranking sites, I could get a widget to put on my blog that would show my “Hacker Rank”. Unlike Kred, CodeEval apparently does not have this functionality as yet. So I decided to make my own.

The ranking information is shown in a div element on the user’s public profile, assuming that the user allows the profile to be shown.

Using the PHP code below, I was able to scrape the information from CodeEval’s site. Next, in a Text widget on WordPress, I create an empty table and used jQuery to populate the empty table with the div I scraped from CodeEval along with CSS code that I included in my PHP file to give the badge a similar look and feel to what is on the CodeEval site. Ultimately, I could create a WordPress plugin for this, so that it could be done without having to create the codeeval.php file on the site, but I haven’t done that yet.

This code could be used to scrape from any site, as long as the element has a unique class name and PHP has file_get_contents enabled.

codeeval.php:

<?php
$previous_value = libxml_use_internal_errors(TRUE);
$codeeval = $_GET['codeeval'];
$score_url = 'https://www.codeeval.com/public/' . $codeeval . '/';
$html = file_get_contents($score_url);
$classname = 'wrapper-rank';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");

/* this function preserves the inner content of the scraped element.
** http://stackoverflow.com/questions/5349310/how-to-scrape-web-page-data-without-losing-tags
** So be sure to go and give that post an uptick too:)
**/
function innerHTML(DOMNode $node)
{
  $doc = new DOMDocument();
  foreach ($node->childNodes as $child) {
    $doc->appendChild($doc->importNode($child, true));
  }
  return $doc->saveHTML();
}
libxml_clear_errors();
libxml_use_internal_errors($previous_value);
?>
<a href="<?php echo $score_url; ?>" style="text-decoration:none;text-align:center;font-family:Arial Black;color:black;" target="_blank">
<div class="codeeval">
<img src="https://www.codeeval.com/site_media/images/logo-code-eval.png" alt="CodeEval" />
<h3>hacker ranking</h3>
<div class="wrapper-rank">
<?php
foreach ($results as $node) {
    $full_content = innerHTML($node);
   echo $full_content;
}
?>
</div>
</div>
</a>

Here is the CSS I used:

.codeeval img {
display: block;
margin-left: auto;
margin-right: auto;
background-color: white;
}
.codeeval h3 {
text-align: center;
color: #CC240A;
letter-spacing: 0.2em;
text-transform: uppercase;
margin: 0;
padding: 0;
}
.wrapper-rank {
background: none repeat scroll 0 0 #CC240A;
padding: 5px;
width: 258px;
height: 69px;
font-style: Arial;
font-weight: normal;
font-size: 12px;
}
.wrapper-rank .main-rank {
background: none repeat scroll 0 0 #BB2610;
clear: both;
overflow: hidden;
padding: 15px;
text-align: center;
width: 228px;
height: 39px;
}
.wrapper-rank .main-rank h4 {
color: white;
float: left;
font-size: 58px;
font-weight: normal;
margin: 0;
padding: 0;
text-align: center;
}
.wrapper-rank .main-rank span {
color: #FFFF00;
float: left;
font-size: 20px;
margin: 15px 0 0 5px;
text-align: left;
}
.wrapper-rank .main-rank span em {
color: #222222;
display: block;
font-style: normal;
font-size: 16px;
}

After the codeeval.php file is created, create this table in the Text widget:

<table>
   <tr style="vertical-align:middle;text-align:center;">
      <td id="codeeval" style="width:100%;vertical-align:top;text-align:center;">
      </td>
   </tr>
</table>

Lastly, you need to get the unique ID in the URL from your CodeEval public profile for use below. This jQuery statement will populate the table above with the scraped div.

(function($) {
$("#codeeval").load("/codeeval/codeeval.php?codeeval=<<your CodeEval ID>>");
})(jQuery);

For further reading about CodeEval and similar sites, read Thoughts on Professional Learning – Inspired by CodeEval & HackerRank.

One Reply to “Scraping a DIV Element from a Web Page with PHP”

Leave a Reply