User Interest Modeling from Twitter

Most of us are directly or indirectly engaged on social media. Either there are active participants or just people who follow other to know what is happening in the world.

Have you ever imagined where above activities can lead to? A lot of data is being generated through activities on social media but what does it refer to? What is there inside this data? Can we make this data useful?

And the answer is yes!. Social media data tells a lot about an individual. Their likes/ dislikes, personal insights, interest, skills and much more such things.

These insights can help in building more personalized user experience in form of recommendation system, targeted ads, employee engagement in an organization etc.

Lets start with inferring persons domain of interest with one of the social media Twitter. Twitter is an active public microblogging website with 313 million active users with around 1 billion tweets by 2016.

Data extraction

With extracted list of users network and tweets from the timeline of all members of that network, create a network and an activity graph of a user using Neo4j graph database that follows property graph model.

Creating users network graph

Generally, we follow people/organization/groups on social media which belong to our interest. That forms our network. Let say, I am an active user of Twitter, I am interested in Machine Learning, I like to receive tweets from a person who is a machine learning expert and possibly he/she will also be connected to person/organization which work on this domain.

Here is how a network looks like –


Fig 1


Using Neo4j graph database, created network graph for a particular user is as shown in Figure 2.

Neo4j graph database, created network graph

Fig 2


Creating users activity graph

Now its time to extract data from the activity  i.e from tweets posted/received by members of an above-created network. Here, we will call this data as entities of tweets.

Generally, we tweet on people/organization/event/domain about which we like to talk.

e.g. Ms.Dhoni scored 100 runs– this tweet talks about Ms.Dhoni. I tweeted this because I do like cricket and like to talk about this.

Another tweet can be Microsoft launched new application– This tweet is about Microsoft and I am keeping myself updated with news of Microsoft because I am interested in it.

So, all these entities extracted from tweets say something about our interests. To get this, we extracted people and organization from tweets using Stanford NER and created an activity graph from this entities along with hashtags as in Figure 3

Fig 3


Activity graph created using Neo4j graph database for a particular user is as shown in Figure 4

Fig 4


Weighted inference of niche interest

Now, I created both, my network and activity graph with tweets collected from all nodes of network graph i.e my tweets along with the tweets collected from the extended network I am a part of.

Its time to infer my interest from this network and activity graph which is a weighted aggregation of interests common to me and other users of my network, as it makes this inference more stronger.

Here, weight is defined as a property of a relation between each node in my network and the associated niche interests. It decreases as the depth of the network increases

Based on relativity, lets assign some weight to this relation –

  • Weight of interests in direct relation to my activity: 1.0
  • Weight of interests through my 1st level network: 0.5
  • Weight of interests through users of my 2nd level network: 0.3

You will have a question that how this weighted scheme is going to help for inferring interest?

The answer to this question is – if entities here, i.e people/organization/hashtags,

lets consider #Docker, which is being posted both by me and members of my network, it will get high weightage, i.e importance of #Docker as my interest is increased to 1.8.

After following above process, an interest of a twitter user is inferred. Which includes frequently mentioned and highly weighted hashtags, people and organizations in tweets as shown in below results.

(Results of a Twitter handle @rahulvit09)


Interests inferred from following this process can help to build a recommendation system or an application that aims for personalization of content addressing which requires user’s profile. Mainly when there is a cold start or no prior information available about a user, inferred interest from social media can help to start with recommending articles of their interests from the news as wells as other information sharing web portal.

Leave a Reply

Your email address will not be published. Required fields are marked *