Downloading tweets using Twitter’s Streaming API for Big Data Analysis

Data Collection Twitter

I recently wrote a simple PHP web app for creating Twitter data collection campaigns. It allows you to download hundreds of thousands of tweets [tested upto 1.2 million tweets on an Amazon EC2 instance] based on a specific set of keywords. The data it produces is stored in a MySQL database which can further be converted to a CSV or any other format of your choice.

The data collected is properly formatted and stored in a MySQL database, here are the fields that are recorded:

Fields per Tweet



Fields per Twitter User


The source code/app can be downloaded from here.

The instructions for its usage are detailed in this blog post.

Step 1: Create a database


Step 2: Create a new Twitter App [link]. Get your apps credentials from Twitter and add them up in db/140dev_config.php.step0-scraper

Step 3: Edit db/db_config.php to match your own MySQL database server settings



Step 4: Go to your MySQL database and import the database structure [mysql_database_schema.sql] inside the db folder into your database.step3-scraper

Before proceeding forward make sure that you have libssh2 installed on your server, if you haven’t you’ll have to install it now since the script requires the ssh2 library to create a new ssh session to its host machine and run the data collector script in a new screen session.

Ubuntu: sudo apt-get install libssh2-php

OPTIONAL: If you don’t want the script to SSH into your machine you’ll have to run your campaign yourself by running the following commands via a terminal on your host machine:

php get_tweets.php <campaign-name> &
php parse_tweets.php <campaign-name> & 

Replace <campaign-name> with the name you chose while creating the campaign. 

Important : Don’t forget to restart your webserver once the installation is complete.


Update SSH credentials to match those of your server in the file NiceSSH.class.php



Step 5 [the last step!]: Open up your Webserver to the directory of the TFDC project, and click on create a new campaign.



Written by on February 24, 2015

More from the blog

GateSentry updates

Found some free time this weekend, decided to put it to use with some long needed updates to Read More

GateSentry Raspberry Pi : Updating the expired certificate

If you're using GateSentry's Raspberry Pi image, you might be seeing some certificate expiry Read More

Building a simple serverless CRUD app powered by Lambda and DynamoDB

Amazon web services has a very Read More