Sentiment Analysis on Twitter with HP Vertica and IdolOnDemand

Sentiment Analysis allows us to determine the «attitude» of a speaker. For instance, we would like to read tweets from an event of our interest to identify which tweets are positive opinions, which are negative opinions and moreover, what’s the specific sentiment they are expressing. Is it love, hate, gratefulness?

In this tutorial, we are going to build a Java desktop system with the following functionalities:

  1. Extracts Tweets from a specific search using Twitter Search API.
  2. Analyze them using a Sentiment Analysis API.
  3. Store them in a database specialized in big data and analytics.
  4. Provide a GUI application to visualize the data analyzed.

The source code for this project is available as open source. You can download it from its Bitbucket repository.

The Sentiment Analysis API is developed by HP IdolOnDemand, a platform that offers developers different tools to extract meaning from unstructured data like tweets, videos and pictures.

We use HP Vertica Analytics Platform as our database. This platform is specially designed to store more data and run queries faster than our traditional solutions.

Technologies used

Setting up our project

To keep our system organized, we are going to work with a project composed of four sub-projects, one for each of our functionalities.

We use Gradle as our build automation system. It allows to work with subproject and takes control over the building process, handling dependencies, compiling, testing and building executables for us. You can install it following their instructions or, if you have a Mac, just run brew install gradle.

The following sections assume that the Gradle multi-project is already set up and that you are in the root directory. We explain this in detail at the end of the tutorial but right now we want to focus on our system functionalities.

Twitter Search

We are going to implement a Singleton service TwitterClient that receives a query string and a method reference using lambda expressions to consume our tweets.

If you downloaded the source code, go to the TwitterCollect project before continuing.

Defining our interface

Following TDD principles, let’s build a JUnit test for this scenario. In our test, the lambda expression will add our tweets to a list.

    // imports are remove for brevity
    public class TwitterClientTest {
        public void search() {
            TwitterClient client = TwitterClient.getInstance();
            String sampleQuery = "lagunex -filter:retweets";
            ArrayList<Tweet> result = new ArrayList<>();
  , tweet -> result.add(tweet));
            assertTrue(result.size() > 0);

Implementing the search method

Before we connect with Twitter, we need to create a new app in their Application Manager. This will give us a «Consumer Key (API Key)» and a «Consumer Secret (API Secret)», which are the credential we will use to authenticate ourselves.

Using twitter4j to interact with Twitter

If we wanted to implement the connection from scratch, we would have to handle OAuth authentication, HTTP connections, JSON parsing and so on. Luckily, there is a java library which handle all this low level operation for us, twitter4j, so we add it as a dependency in our build.gradle script and we are ready to go:

    dependencies {
        compile group: 'org.twitter4j', name: 'twitter4j-core', version: '4.0.2'

The next step is to tell twitter4j our app credentials so it can connect with Twitter. The library offers different ways to configure this. For this tutorial, we are going to create a file named in src/test/resources.


That’s it! twitter4j is ready to work. Let’s write some code.

Writing our Client

twitter4j offers a Singleton to interact with the REST API. We create the class TwitterClient to encapsulate this Singleton and expose only the functionality we are interested in.

    // imports are remove for brevity
    public class TwitterClient {
        private static TwitterClient instance;
        public static TwitterClient getInstance() {
            if (instance == null) {
                instance = new TwitterClient();
            return instance;
        private TwitterClient() {
            try {
                twitter4j = TwitterFactory.getSingleton(); // singleton configure with
                twitter4j.getOAuth2Token(); // this line is mandatory for twitter4j to connect with Twitter
            } catch (TwitterException e) {
                // handle exception
        public void search(String query, Consumer<Tweet> consumer) {
            Query q = new Query(query);
            try {
                QueryResult qr =;
                for(Status s: qr.getTweets()) {
                    consumer.accept(new Tweet(s));
            } catch (TwitterException e) {
                // handle exception

The use of twitter4j is highlighted. Our method search takes advantage of lambda expressions from Java 8. The consumer parameter, which can be defined as a lambda by the caller, will handle each tweet.

For this application, we only use four parameters from a tweet: id, message, language and createdAt. We defined a class Tweet that extracts this attributes from a twitter4j.Status object.

    // imports are remove for brevity
    public class Tweet {
        private final long id;
        private final String message;
        private final LocalDateTime createdAt;
        private final String language;
        public Tweet(Status s) {
            id = s.getId();
            message = s.getText();
            language = s.getLang();
            createdAt = LocalDateTime.ofInstant(s.getCreatedAt().toInstant(),ZoneOffset.UTC);
        // getter and setters are not shown for brevity

Now we run our test to see the result:

    $ gradle test
    Total time: 6.107 secs

To finish this part, let’s write a small main application to be able to search for tweets from the command line:

    public class Main {
        public static void main(String[] args) {
            Consumer<Tweet> printTweetToOutput = tweet -> {
                String singleLineMessage = StringUtils.collapseLines(tweet.getMessage());

            TwitterClient.getInstance().search(args[0], printTweetToOutput);

main calls TwitterClient and prints the output on line 15. Previously, we defined a Consumer using a lambda expression to print our tweets in a single line, escaping the | character in the tweet’s message to avoid conflicts with the | character used as separator. This «one line per tweet» representation will facilitate us the task of inserting the data into Vertica later. The StringUtils class used to remove the line breaks and escape the message is provided in the repository in a sub-project named common. We will not get into the details of its implementation because it is out of the scope of this tutorial.

We need to copy src/test/resources/ into the src/main/resources directory to configure twitter4j at runtime. Finally, run gradle installApp to build the executable. It will be saved in build/install/TwitterCollect with all its dependencies.

    $ cd build/install/TwitterCollect
    $ ./bin/TwitterCollect "lagunex -filter:retweets"
    563023115459260416|@sirlordt @ninfarave  ¡Oh! He ganado en #LagunexDomino para Android 103-0|es|2015-02-04T17:15:45
    560501402016186368|Yes! I've just won in #LagunexDomino for Android|en|2015-01-28T18:15:22

The first functionality of our system is complete. Let’s perform Sentiment Analysis on those tweets.

IdolOnDemand Sentiment Analysis

IdolOnDemand offers the API analyzesentiment to perform Sentiment Analysis over text. We need to implement a library that encapsulates this service and exposes a Java interface to perform the analysis.

As we can see in its documentation, the service receives the text to analyze and an optional parameter to specify the text’s language, in case it is not English. Its JSON response includes the results of the analysis. You can try the API to get familiar with its behavior.

If you downloaded the source code, go to the IdolSentimentAnalysis project before continuing.

Defining our interface

We want a Singleton with a method analyse that receives the text and language and returns the result in a POJO that matches the interesting attribute of the JSON response. Our test for this interface looks like this:

    // imports are remove for brevity
    public class SentimentAnalysisTest {
        public void analyse() {
            SentimentAnalysis engine = SentimentAnalysis.getInstance(); 
            SentimentResult t = engine.analyse("This is a good day", "eng");

Implementing the analyse method

To connect with the IdolOnDemand platform, we need an API key. Create an IdolOnDemand Account (it’s free), go to Manage your API Keys and generate a new one.

Consuming a REST service

Following the same strategy we used for TwitterClient, we are going to rely on third-party libraries to establish the HTTP connection with the IdolOnDemand platform and parse its JSON into a POJO. In this case we will use spring-web and its RestTemplate to consume the REST service and jackson-databind (used internally by RestTemplate) to parse the response. Therefore we need to add these dependencies to our build.gradle script:

    dependencies {
        compile 'org.springframework:spring-web:4.1.4.RELEASE'
        compile 'com.fasterxml.jackson.core:jackson-databind:2.5.0'

Writing our SentimentAnalysis service

Our service needs to access the following information: the API endpoint, the API key, the text to analyse and its language. The API endpoint is constant, so we can define it directly in the class. The API key will be passed at runtime as a System Property and the text and language will be our method parameters. This four requirements are highlighted in our code. Because IdolOnDemand receives its parameters as GET parameters, we need to encode the text we want to analyse.

    // imports are remove for brevity
    public class SentimentAnalysis {
        private static final String URL = "";
        private static SentimentAnalysis instance;
        private final String API_KEY;
        public static SentimentAnalysis getInstance() {
            if (instance == null) {
                instance = new SentimentAnalysis();
            return instance;
        private SentimentAnalysis(){
            API_KEY = System.getProperty("idolOnDemand.apiKey");
        public SentimentResult analyse(String opinion, String language) {
            opinion = encode(opinion);
            RestTemplate rest = new RestTemplate();
            // calls the API and parse the JSON response into a Java object
            return rest.getForObject(
                    String.format("%s?apikey=%s&text=%s&language=%s", URL, API_KEY, opinion, language),
        private String encode(String opinion) {
            String encoded = null;
            try {
                encoded = URLEncoder.encode(opinion, "UTF-8");
            } catch (UnsupportedEncodingException ex) {
                // handle exception 
            return encoded;

The interesting part of this code is that we do not have to know anything about HTTP connections, status code or JSON parsing. It is all handled automatically by the highlighted method rest.getForObject.
This method receives the URL we want to call and a Java class specifying how we want to parse the result. This Java class must match the response of the REST API to work properly. Therefore, we need to define the classes SentimentResult, Sentiment and Aggregate to comply to the JSON schema:

    // imports are remove for brevity
    public class SentimentResult {
        private List<Sentiment> positive;
        private List<Sentiment> negative;
        private Aggregate aggregate;
        // getters and setters omitted for brevity

    // imports are remove for brevity
    @JsonIgnoreProperties(ignoreUnknown = true)
    public class Sentiment {
        private String sentiment;
        private String topic;
        private double score;
        public String toString() {
            return String.format("%s|%s|%s", sentiment, topic, score);
        // getters and setters omitted for brevity

The @JsonIgnoreProperties annotation allow us skip the definition of the attributes we are not interested in. In this case original_text, original_length, normalized_text and normalized_length. jackson-databind will ignore this JSON parameters during parsing.

    public class Aggregate {
        private String sentiment;
        private double score;
        public String toString() {
            return String.format("%s|%s", sentiment, score);
        // getters and setters omitted for brevity

Our service is ready to use, now we run our tests:

    $ gradle test
    Total time: 17.232 secs

To conclude this part, let’s write a small main application, just like we did with our TwitterClient:

    public class Main {
        public static void main(String[] args) {
            SentimentResult result = SentimentAnalysis.getInstance().analyse(args[0], args[1]);

Finally, we install and run the executable, passing our idolOnDemand.apiKey at runtime using the JAVA_OPTS system variable.

    $ gradle installApp
    $ cd build/install/IdolSentimentAnalysis
    $ JAVA_OPTS="-DidolOnDemand.apiKey=YOUR_KEY" ./bin/IdolSentimentAnalysis "I like cats" "eng"

Create a Vertica database

We are halfway through with our system. It can search for tweets and analyse them to extract their attitude. Now we need a database to store them and a library to connect with the database and query its data.

HP Vertica Analytics Platform offers a database system adapted to the requirements of Big Data: more storage and faster queries. You can try HP Vertica Community Edition for free. Look at its documentation center, Started Guide and Reference Manual for further information.

One of the advantage of Vertica is that we interact with it using SQL just like we do with a regular database, making it very easy to create our tables and perform our queries.

If you downloaded the source code, go to the VerticaConnection project before continuing.

What data do we need to store?

From our Twitter search we retrieved id, message, language and createdAt, and for each tweet the Sentiment Analysis tool returned an aggregate sentiment and score and a list of positive and negative sentiments found in each tweet.

It can be seen that the relation between a tweet and its aggregate sentiment information (sentiment and score) is 1:1 so we can use only one table to store both. We need an addition table for the list of sentiments as it relates 1:n with a tweet (a single message can hold more that one sentiment, e.g. «I love pizza and ice cream but I don’t like chocolate»).

The following SQL script will create those tables for us:

    -- In Vertica, varchar size is given as byte length, not character length.
    -- We consider 4 bytes per character (worst case in UTF-8).
    -- Therefore 140*4 = 560
    create table tweet
    (tweetid                  integer         not null primary key,
        message             varchar(560)    not null,
        lang                char(2)         not null,
        created_at          timestamp       not null,
        aggregate_sentiment varchar(10),
        aggregate_score     float
    create table sentiment
    (sentimentid        auto_increment primary key,
        tweet_id  integer not null,
        sentiment varchar(560),
        topic     varchar(560),
        score     float
    alter table sentiment 
            add constraint fk_sentiment_tweet foreign key (tweet_id)
                references tweet (id);

To populate our tables, we can use plain text tbl files. This files represent one entry per row and its attributes are separated by pipes «|», similar to «csv» files. The toString() methods we have defined so far and the way we output tweets in TwitterCollect help us with this.

However, a Tweet object does not include aggregate information. This is gathered only after the Sentiment Analysis returns and it is stored in Aggregate objects so we need a way to combine this information. Our final tbl files, one for each table, should look like this:

    562113053282426880|When you have #MarshawnLynch...why the hell wouldn't you just run it in? #SB49 #SuperBowl|en|2015-02-02T04:59:30|neutral|0.0
    562112884223000576|Best moment of #sb15 #sb49! @waegn @deremann @RSherman_25 via @9GAG|en|2015-02-02T04:58:49|positive|0.5787074952096031
    562112806552485888|Congrats to the Pats, @LG_Blount and @PatrickChung23 !!!!! #producks #SB49|en|2015-02-02T04:58:31|positive|0.8294462782412093

Notice that the first four fields correspond to a Tweet while the last two correspond to its Aggregate.


With this two files created, we are ready to insert data into Vertica.

Connecting Java with Vertica

Our database is created. The next step is to write a Java library that communicates with it to insert values and to query it. For that, we need to create a Vertica account and download the Vertica JDBC driver. Once downloaded, install it in your local Maven repository to facilitate its use.

    mvn install:install-file \
            -Dfile=vertica-jdbc-7.1.1-0.jar \
            -DgroupId=com.vertica \
            -DartifactId=vertica-jdbc \
            -Dversion= \

To establish the connection and perform the updates and queries, we are going to use spring-jdbc. Let’s add its dependency and the JDBC driver dependency to build.gradle.

    dependencies {
        compile group: 'com.vertica', name: 'vertica-jdbc', version:''
        compile 'org.springframework:spring-jdbc:4.1.4.RELEASE'

Now we can define a Singleton class to update and query the database

    // imports are not show for brevity
    public class Vertica {
        private static final String HOSTNAME = "vertica.hostname";
        private static final String DATABASE = "vertica.database";
        private static final String USERNAME = "vertica.username";
        private static final String PASSWORD = "vertica.password";
        private static Vertica instance;
        public static Vertica getInstance() {
            if (instance == null) {
                instance = new Vertica();
            return instance;
        private final JdbcTemplate jdbcTemplate;
        private Vertica() {
            DataSource dataSource = createDataSource();
            jdbcTemplate = new JdbcTemplate(dataSource); 
        private DataSource createDataSource() {
            SimpleDriverDataSource dataSource = new SimpleDriverDataSource();
                System.getProperty(HOSTNAME), System.getProperty(DATABASE)
            return dataSource;
        public int insertTweetRecord(String tblRecord) {
            String query = "insert into tweet values (?,?,?,?,?,?)";

            // we should split by "|" only when it is not escaped
            Object[] args = tblRecord.split(String.format("(?<!\\\\)\\|"));

            // restore the original message with unescaped characters and line breaks
            String unescapedMessage = StringUtils.unescape(args[1].toString());
            args[1] = StringUtils.uncollapseLines(unescapedMessage);
            return jdbcTemplate.update(query, args);

        public List<Map<String,Object>> getAggregateTotal(LocalDateTime start, LocalDateTime end) {
            StringBuilder query = new StringBuilder(100);
                query.append("select aggregate_sentiment as label, count(*) as total ")
                     .append("from tweet ")
                     .append("where created_at >= ? and created_at < ? ")
                     .append("  and aggregate_sentiment is not null ")
                     .append("group by label ")
                     .append("order by total desc ");
            return jdbcTemplate.queryForList(query.toString(), Timestamp.valueOf(start), Timestamp.valueOf(end));

The use of spring-jdbc is highlighted on the code. The connection is stablished during createDataSource(). We have a sample method to insert a tweet and another one to execute a query. Both method use the same strategy: first, build the query and then, pass the arguments. The actual insertion and query are execute by the jdbcTemplate object at the end of both methods.

Notice that in line 41 and 42 we remove the possible escaped character from the tweet’s meesage and restore its line breaks in order to store the original message in the database.

After implementing a method to insert values in our database, we can easily write a main application to execute it:

    // imports are not shown for brevity
    public class Main {
        public static void main(String[] args) {
            Vertica vertica = Vertica.getInstance();
            Console console = System.console();
            String line = console.readLine();
            while (line != null) {
                line = console.readLine();

Testing our Vertica service

We just defined a service with two methods: one to insert a new tweet and another that returns the total number of aggregate sentiments (neutral, positive and negative). Now we write a test class to verify them. Vertica reads the connection parameters hostname, database, username and password from System Properties as highlighted in lines 24-30. During test, we created a file where we add this information. Notice the loadSystemProperties() method that reads the file and save its info as System Properties.


    // imports are remove for brevity
    public class VerticaTest {
        Vertica vertica;
        public void setUp() throws Exception {
            vertica = Vertica.getInstance();
        private void loadSystemProperties() throws Exception {
            Properties system = System.getProperties();
            InputStream is = VerticaTest.class.getResourceAsStream("/");
        public void getAggregateTotal() {
            LocalDateTime begin = LocalDateTime.of(2015, Month.FEBRUARY, 2, 1, 0);
            LocalDateTime end = LocalDateTime.of(2015, Month.FEBRUARY, 2, 8, 0);
            List<Map<String, Object>> result = vertica.getAggregateTotal(begin, end);
            assertEquals(3, result.size());
            List<String> labels = Arrays.asList(new String[]{"negative", "neutral", "positive"});
   -> {

        public void insertTweetRecord() {
            String tweetToInsert = "1234|nice test with \\| escaped characters|en|2015-02-02 03:00:00|positive|0.89";
            int result = vertica.insertTweetRecord(tweetToInsert);
            assertEquals(1, result);


Finally, we can run the test to check our implementation is correct.

    $ gradle test
    Total time: 19.387 secs

Our test passed. We can now add more tests and public method to to gathered different analytics from our tweets and to insert sentiments.

Data Visualization using JavaFX

On the last part of this tutorial, we create a standalone JavaFX application to visualize the data from our database.

If you downloaded the source code, go to the DataVisualization project before continuing.

First of all, we need to add a dependency to our VerticaConnection in build.gradle.

    dependencies {
        compile project(":VerticaConnection")

The GUI application has the following functionality:

  • Set a time range to collect analytics.
  • See the number of tweets group by aggregate sentiment (neutral, negative or positive) in a pie chart.
  • See the number of tweets from the top 15 sentiments in a pie chart.
  • See how the average aggregate score evolved during the given time with a line chart.
  • See the evolution of the number of tweets from the top 15 sentiments during the given time with a line chart.
  • See all tweets that correspond to a given sentiment (aggregate or specific) or in a given time window. To do this, locate the data table on the left side of any chart, right click on a time window or sentiment cell and select «view tweets with…» to open a popup window with the related tweets.

To create the charts, we use JavaFX Charts which are UI components already available to display the data.

A GUI application usually involves a lot of code to set up the different components and bind them to their corresponding behavior. Giving a detailed explanation of how to create one goes beyond the scope of this tutorial. Therefore, we will only describe the components used and point to additional resources that can help you with the details.

  • The application shows the data using a TableView.
  • The data seen in the table is also displayed as charts using Pie Charts and Line Charts, each representing a different analytic from the data.
  • The user select the time range he wants to analyse and the application update the charts accordingly. Each chart is generated with data coming from a specific method of our Vertica service. This method are similar to the one we defined before, adapting the SQL query to each chart requirement.
  • An additional method that returns the tweet message is implemented in Vertica and is used in the GUI to display all tweets from a given time or sentiment. This functionality is implemented as a ContextMenu

The following GIF image illustrates the main functionality of our application.

Data Visualization with JavaFX

Setting up our gradle multi-project

We decided to split each functionality in its own subproject. After all, they are independent and can be decoupled easily. To work with a multi-project, we create a subdirectory for each subproject and add a settings.gradle file in our root directory.

    include 'TwitterCollect',

This is the final layout for our system:

    | common.gradle
    | build.gradle
    | settings.gradle
    | TwitterCollect
    | | build.gradle
    | | src
    | IdolSentimentAnalysis
    | | build.gradle
    | | src
    | VerticaConnection
    | | build.gradle
    | | src
    | | db
    | | scripts
    | DataVisualization
    | | build.gradle
    | | src

The file common.gradle includes information that is shared among all sub-projects and it’s referenced in the root build.gradle.

    subprojects {
        apply from: rootProject.file('common.gradle')

    apply plugin: 'java'
    apply plugin: 'application'
    repositories {
        mavenLocal(); // needed for vertica jdbc
    dependencies {
        testCompile group: 'junit', name: 'junit', version: '4.10'

The application plugin adds the task installApp which build executables for our projects that are easier to run than calling java and setting the classpath and system properties manually or gradle run, which makes it hard to pass system properties or arguments.

Each subproject has its own build.gradle script and a src directory that follows the Maven standard directory layout.

The end

This is a long tutorial and if you got this far, congratulations. I hope you find it useful. If you have any questions about it, feel free to leave a comment.

Remember you can download the source code to follow this tutorial from its Bitbucket repository. The source code contains full CLI applications for TwitterCollect and IdolSentimentAnalysis.

Deja un comentario