Scraping a DIV Element from a Web Page with PHP

PHP logo

I recently read an article about CodeEval, a free gamified website for ranking developers, and bringing employers and developers together. Essentially, a developer can sign up, complete coding challenges, and earn badges and a “Hacker Ranking” that will compare his or her skills to others who have signed up on the site. Also, completing some challenges will allow the developer to unlock the ability to apply for jobs with various tech startups through the site.

After completing some of the challenges I decided to see if, like Klout and some other social ranking sites, I could get a widget to put on my blog that would show my “Hacker Rank”. Unlike Kred, CodeEval apparently does not have this functionality as yet. So I decided to make my own.

The ranking information is shown in a div element on the user’s public profile, assuming that the user allows the profile to be shown.

Using the PHP code below, I was able to scrape the information from CodeEval’s site. Next, in a Text widget on WordPress, I create an empty table and used jQuery to populate the empty table with the div I scraped from CodeEval along with CSS code that I included in my PHP file to give the badge a similar look and feel to what is on the CodeEval site. Ultimately, I could create a WordPress plugin for this, so that it could be done without having to create the codeeval.php file on the site, but I haven’t done that yet.

This code could be used to scrape from any site, as long as the element has a unique class name and PHP has file_get_contents enabled.

codeeval.php:

<?php
$previous_value = libxml_use_internal_errors(TRUE);
$codeeval = $_GET['codeeval'];
$score_url = 'https://www.codeeval.com/public/' . $codeeval . '/';
$html = file_get_contents($score_url);
$classname = 'wrapper-rank';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");

/* this function preserves the inner content of the scraped element.
** http://stackoverflow.com/questions/5349310/how-to-scrape-web-page-data-without-losing-tags
** So be sure to go and give that post an uptick too:)
**/
function innerHTML(DOMNode $node)
{
  $doc = new DOMDocument();
  foreach ($node->childNodes as $child) {
    $doc->appendChild($doc->importNode($child, true));
  }
  return $doc->saveHTML();
}
libxml_clear_errors();
libxml_use_internal_errors($previous_value);
?>
<a href="<?php echo $score_url; ?>" style="text-decoration:none;text-align:center;font-family:Arial Black;color:black;" target="_blank">
<div class="codeeval">
<img src="https://www.codeeval.com/site_media/images/logo-code-eval.png" alt="CodeEval" />
<h3>hacker ranking</h3>
<div class="wrapper-rank">
<?php
foreach ($results as $node) {
    $full_content = innerHTML($node);
   echo $full_content;
}
?>
</div>
</div>
</a>

Here is the CSS I used:

.codeeval img {
display: block;
margin-left: auto;
margin-right: auto;
background-color: white;
}
.codeeval h3 {
text-align: center;
color: #CC240A;
letter-spacing: 0.2em;
text-transform: uppercase;
margin: 0;
padding: 0;
}
.wrapper-rank {
background: none repeat scroll 0 0 #CC240A;
padding: 5px;
width: 258px;
height: 69px;
font-style: Arial;
font-weight: normal;
font-size: 12px;
}
.wrapper-rank .main-rank {
background: none repeat scroll 0 0 #BB2610;
clear: both;
overflow: hidden;
padding: 15px;
text-align: center;
width: 228px;
height: 39px;
}
.wrapper-rank .main-rank h4 {
color: white;
float: left;
font-size: 58px;
font-weight: normal;
margin: 0;
padding: 0;
text-align: center;
}
.wrapper-rank .main-rank span {
color: #FFFF00;
float: left;
font-size: 20px;
margin: 15px 0 0 5px;
text-align: left;
}
.wrapper-rank .main-rank span em {
color: #222222;
display: block;
font-style: normal;
font-size: 16px;
}

After the codeeval.php file is created, create this table in the Text widget:

<table>
   <tr style="vertical-align:middle;text-align:center;">
      <td id="codeeval" style="width:100%;vertical-align:top;text-align:center;">
      </td>
   </tr>
</table>

Lastly, you need to get the unique ID in the URL from your CodeEval public profile for use below. This jQuery statement will populate the table above with the scraped div.

(function($) {
$("#codeeval").load("/codeeval/codeeval.php?codeeval=<<your CodeEval ID>>");
})(jQuery);

For further reading about CodeEval and similar sites, read Thoughts on Professional Learning – Inspired by CodeEval & HackerRank.

Getting the Number of Twitter Followers Without Using the Twitter API

Twitter logo

Since Twitter upgraded the Twitter REST API to version 1.1 earlier this year, a simple HTTP query against the API no longer works. You now have to include authentication information which would require redevelopment of many programs that perform this previously simple action.

I wanted to include the number of Twitter followers on this blog using the Social Impact Widget, which was made for version 1 of the Twitter API.

Using the YQL language API from Yahoo, this information can still be retrieved by Web scraping, without resorting to using Twitter’s API. Aakash Chakravarthy’s article explains this method well.

By using this method, the PHP code in the Social Impact Widget can be modified to pull the number of Twitter followers via Web scraping.

The original PHP code for the widget that retrieves the Twitter data:

$return_Twitter = $this->_helper_curl(sprintf('https://api.twitter.com/1/users/show.json?screen_name=%1$s',
		$var_sTwitterId
	), $this->var_sUserAgent);

Remove this code and add the below, also making some changes to the try block immediately below it to change the way it stores the returned data:

$return_Twitter = $this->_helper_curl('http://query.yahooapis.com/v1/public/yql?q=SELECT%20*%20from%20html%20where%20url=%22http://twitter.com/' . $var_sTwitterId . '%22%20AND%20xpath=%22//a[@class='js-nav']/strong%22&format=json'
                    , $this->var_sUserAgent); // Opening the Query URL
	try {
		$obj_TwitterData = json_decode($return_Twitter);
		if($obj_TwitterData) {
			if(!empty($obj_TwitterData->query->results->strong[2])) {
				$this->array_Options[$this->var_sArrayOptionsKey]['twitter-count'] = intval(str_replace(',', '', $obj_TwitterData->query->results->strong[2]));
			} // END if(!empty($obj_TwitterData->query->results->strong[2]))
		} // ENDif($obj_TwitterData)
	} catch(Exception $e) {
		$this->array_Options[$this->var_sArrayOptionsKey]['twitter-count'] = (int) $var_iTwitterFollowerCount;
	}

And now, your widget will appear correctly! The caveat here is that if Twitter ever changes the layout of the site, then the line

$obj_TwitterData->query->results->strong[2]


will have to be modified accordingly.

Making the SyntaxHighlighter Evolved Plugin Work with Infinite Scroll on WordPress.org Sites

WordPress logo

SyntaxHighlighter Evolved is a great plugin to format source code on a WordPress.org blog. Also, WordPress.org blogs can now allow you to use Infinite Scroll, which uses AJAX to load posts as you scroll down the blog.

The problem is that once you begin scrolling past the first block of posts that were initially loaded, the posts loaded using Infinite Scroll no longer highlight the code.

The fix is to call the “SyntaxHighlighter.highlight()” method each time a block of posts are loaded.

I’m sure there are many ways this could be done. The way I chose was to take this block of code from the file “jetpack/modules/infinite-scroll/infinity.php”:

// If primary and fallback rendering methods fail, prevent further IS rendering attempts. Otherwise, wrap the output if requested.
if ( empty( $results['html'] ) ) {
    unset( $results['html'] );
    do_action( 'infinite_scroll_empty' );
    $results['type'] = 'empty';
} elseif ( $this->has_wrapper() ) {
    $wrapper_classes = is_string( self::get_settings()->wrapper ) ? self::get_settings()->wrapper : 'infinite-wrap';
    $wrapper_classes .= ' infinite-view-' . $page;
    $wrapper_classes = trim( $wrapper_classes );

    $results['html'] = '<div class="' . esc_attr( $wrapper_classes ) . '" id="infinite-view-' . $page . '" data-page-num="' . $page . '">' . $results['html'] . '</div>';
}

and add this one line to the block:

$results['html'] .= '<script type="text/javascript">SyntaxHighlighter.highlight();</script>';

resulting in this block:

// If primary and fallback rendering methods fail, prevent further IS rendering attempts. Otherwise, wrap the output if requested.
if ( empty( $results['html'] ) ) {
    unset( $results['html'] );
    do_action( 'infinite_scroll_empty' );
    $results['type'] = 'empty';
} elseif ( $this->has_wrapper() ) {
    $wrapper_classes = is_string( self::get_settings()->wrapper ) ? self::get_settings()->wrapper : 'infinite-wrap';
    $wrapper_classes .= ' infinite-view-' . $page;
    $wrapper_classes = trim( $wrapper_classes );

    $results['html'] = '<div class="' . esc_attr( $wrapper_classes ) . '" id="infinite-view-' . $page . '" data-page-num="' . $page . '">' . $results['html'] . '</div>';
    $results['html'] .= '<script type="text/javascript">SyntaxHighlighter.highlight();</script>';
}

Save the file and reload your blog. All code blocks should now be highlighted as you scroll!