As part of the recently conducted BackdoorCTF ‘15, we developed a redis based queue taskrunner for phantomjs tasks called phoenix that we are open-sourcing today. This blog post is about the problem we faced and the solution we developed and how it can help you.
As part of two challenges in the CTF (MEDUSA, JSHUNT), we wanted a browser solution that could automatically open the webpages for the challenge in a safe environment after each submission. These challenges are web challenges and require techniques such as XSS, which means that they can only be exploited in a real browser.
Unfortunately, running a full browser such as Chrome/Firefox is not feasible in such sitations. Instead, we decided to use the most popular headless browser phantomjs for our task. Using phantom is hard, though. It has its own API, which uses JS, but is completely different in its own manner (too many synchronous methods, for one).
As part of the CTF, we had to build a queue system as well, which would do the following:
- Setup a new webpage
- Start a new phantomjs instance that visits that particular page
- Store the result of the run (log id) somewhere and report it back
Since a lot of this is common to both the tasks, we decided to create a small tool that helps us run these jobs.
phoenix handles the following for you:
- setup a common configuration for a task
- start multiple jobs on that task, each with a slight variation
- jobs are run on phantomjs
- store logs of each run
- report back log id to the task queue
This is all done via a mix of redis, phantomjs and nodejs.
phoenix is configured via a
config.yml file, which is expected to be present wherever phoenix is run. You can see the sample config file for a list of configuration options that phoenix supports. These include things like user agent support, custom headers, request body (in json or post format), basic authentication support.
We have a lot of sensible defaults for phantom, which include a maximum timeout of 10 seconds, and extended timeouts for whenever a web request (such as image/script/ajax) is made.
All jobs that start get their own id, and create their own directory with three files:
browser.log. The first holds the complete configuration object that is passed to phantom. The second is the console log of the web page (which means any console.log statements made in the web page context). The third is a higher level log containing browser events such as redirects, web requests and timeout extensions.
We plan to make the logging format configurable, but for now it is file system based.
It is assumed that (generally) each job will have its own url, slightly differing from the task’s standard url. This is handled in two ways:
- You can pass a query parameter, which is sent along with the url specified in the config
- You can pass a valid http/s url that will be visited as it is
To push a job to a queue, you do the following:
- Generate a queue id by RPUSH to a channel:queue list in redis
- Publish the queue id on the channel in redis
log in redis
For each job, the job id (generated randomly) is stored in the
channel:log:id key in redis. This can be polled to check whether the job has finished or not.
As you can see, phoenix is a very robust mechanism to handle a variety of job queues. It is available on npm today as the
phantom-phoenix package, which sets up a binary called
phoenix. Further instructions on usage etc can be found on the repository.
We hope it will be useful to people looking to setup a queue system based around phantomjs instances. Since it uses redis as the queue mechanism, it can be easily scaled to multiple machines as well.