Techniques to Remove Topical Experts in Twitter: A Survey

Techniques to Draw out Topical Experts in Twitter: A Survey

  • Kuljeet Kaur, Jasminder Singh


An Online Social Network (OSN) such as Facebook, Twitter, Google+ etc. socially links users across the world. Through these communal media programs, users generally form a electronic network which is based on mutual trust without any personal connection. As increasingly more users are becoming a member of OSNs, the topical ointment expert id is a literal need to ensure the relevance and trustworthiness of content provided by various users. In this paper, we've reviewed the existing techniques for extraction of topical experience in Twitter. We provide an overview of varied capabilities, dataset and methods used for topical skills detection and extraction.

Keywords: Topical Experts, OSN, Twitter


Various OSNs permit the exchange of real-time information across wider audience in small percentage of a few moments. In microblogging sites like Twitter, content may be looked at as a micropost(e. g. , 140 personality tweet). Also, microblogs assist users in getting their microposts reach the audience in microseconds. Furthermore sensors, wherein real-time data will come in with every second, every micropost has shorter life time due to varied posts from varied locations each second [19]. According to [11], Twitter is the new Facebook in 2014 with 288 million every month energetic users [12] and 500 million tweets each day [13].

As users on OSN grow exponentially, a credible search system is must to find relevant users. In other words, how a consumer can rely and trust on the content he results in in the Twitter. These credible, relevant, reliable writers or experts on a specific topic are termed as Topical Experts. Spotting this, Twitter acquired launched WTF service (Who to Follow) this year 2010 to extract experts related to a topic. But, it is found that WTF [21] sometimes generate results constituting of users whose Tweets profile information (called "bio", 160 individuals personal explanation) contain the related query however in actual, are not really related to this content.

The traditional techniques identify topical ointment experts using attributes like tweets, profile bio, lists, range of followers etc. Our work consolidates the prevailing approaches and features future directions. The structure of the newspaper is organized as follows: next Section represents methodology to carry out this review. In Section 3, we make clear determination behind review. In Section 4, we point out features used for various methods, accompanied by Section 5, which reveals comparative research of work done. Section 6 constitutes discussions and in Section 7, we discuss research path. Finally, Section 8 concludes the paper.


For the concerned topic, the papers included for review are selected from major directories like IEEE Xplore, ACM Digital Library, Yahoo Scholar etc. The directories returned around 50 papers, out which papers posted after 2009 were shortlisted. Then headings and abstracts were read from the shortlisted papers. Finally, 13 paperwork were decided on whose subject and abstract were meticulously related to topical ointment extraction in Tweets. The compilation of varied approaches in the form of review is of primary importance to new researchers, for expansion of work in this website.


As amounts of authors in Tweets are growing exponentially; determining important users is of utmost importance in this period. For any individual, the question for whose content to learn to get up to date and reliable information pertains to identification of topical know-how. So, mining such experts for just about any topic to keep in close contact with them or pursuing them, is one of the main domains explored by various experts. This newspaper lists all the methods, which can only help researchers to review the task done in this area.


From the prior studies, it could be deduced that different authors have seemingly used different combination of capabilities to find authoritative users on a subject. Here are the features found in categorizing and distilling topical ointment experts in the preceeding studies
  • Tweet[20]: 140 personas communication, can be textual or can contain links to multimedia system content
  • Retweet[20]: Forwarded tweets form retweets
  • Mentions[20]: Replies to the information with @Username
  • Hashtags[20]: #subject or #keyword presents all tweets related to a subject or keyword
  • #Supporters[20]: Range of users who obtain your tweets in their timelines
  • #Followings[20]: Also called friends, whose tweets you receive on your timeline
  • Bio[20]: 160 people personal description
  • Lists[20]: 140 individuals name and an optional description, used for taking care of followers

5. COMPARATIVE Evaluation OF VARIOUS Strategies

OSN is the speediest method for disseminating real-time information. Twitter, functions both as a micro-blogging and social-networking site. Pursuing any consumer in Twitter doesn't require any access right from the person, thus group of digital friends grows to form the interpersonal network. Desk 1 represents the prior studies' detailed research regarding topical ointment expert extraction with contribution. As, information grows exponentially with the users, thus, a credible search system is needed to find relevant users. By relevant users, we signify professionals of a topic as well as the seekers also, who assist in spreading concept to anonymous much larger audience.

Jianshu Weng et al. [1] found important twitterers with a specific topic. The writer explained that Twitter itself gives more influence to users with more no. of supporters. But, focusing on only 'pursuing' human relationships is not reliable as the tendencies of pursuing back a pal either credited to courtesy or common hobbies (homophily) is analysed. Thus, to find the important twitterers, TwitterRank approach is applied which considers both website link composition and topic sensitivity into account. To investigate the link structure in the gathered data, all friends and enthusiasts of each consumer were considered along with their tweets. LDA was used to mine subject areas of a user from tweets for topic sensitivity, followed by position of users' affect. The results confirmed that lively twitterers don't imply important twitterers. They either show some fans or followings of one another. The tests confirmed highest similarity between this algorithm and TSPR [18] anticipated to topic sensitivity.

Aditya Pal and Scott Counts [2] identified topical authorities on the basis of tweets, mentions and graphs using Gaussian Blend Model clustering method. The tweets related to engine oil spill, iphone, world glass were mined using sample substring. Self-similarity report showed level of expertise of any user in a particular issue. The clustering algorithm was applied on 17 features, accompanied by ranking of creators in the 3 picked categories. The review conducted exhibited that the users find tweets as useful and interesting from the most notable authors displayed by this approach. It also showed that users trust either quality content or renowned writers presented to them. The 2 2 most important features concluded are topical signal (scope of involvement of any author with a topic) and mention impact (@username while replying or referring to other users).


Attributes used




JianshuWeng et. al. [1]

  • Tweets
  • Following relation
  • LDA approach
  • Graph based
  • Top 1000 Singapore structured Twitterers from Twitterholic. com
  • Topic sensitive influential Twitterers tracked with improved accuracy

Aditya Pal and Scott Matters[2]

  • Tweets
  • 17 features used
  • Clustering based
  • 5 times' tweets from firehose dataset
  • Tweets collected for the picked 3 categories looked authoritative and informative to the users

Parantapa Bhattacharya et. al. [3]

  • Tweets
  • Lists
  • Semantic approach predicated on Lists, account, tweets
  • 38. 4M Tweets user's profiles
  • Identified topical groupings on niche subject areas, and lacking member if any

HemantPurohit et. al. [4]

  • Tweets
  • Profile metadata
  • 3 methods proposed
  • Modified tf-idf approach
  • Twitter profiles
  • Wikipedia
  • Personal websites
  • US Labor statistics
  • Promised 92. 8% summaries as educational in best case and 70% in most severe case

Claudia Wagner et. al. [5]

  • Lists
  • Bio
  • Tweets
  • Retweets
  • LDA approach
  • Wefollow directory
  • Twitter profiles
  • Best results with Bio and Lists

Daniela Pohl et. al. [6]

  • Tweets
  • Modified tf-idf
  • Online Clustering alogorithm
  • 1943 tweets on Hurricane Sandy, 2012
  • Online Clustering algorithm with reduced clusters uncovered all subevents

Kevin R. Canini et. al. [7]

  • Tweets
  • Link Structure
  • LDA approach
  • Tf-idf approach
  • Wefollow directory
  • Twitter profiles
  • Content and cultural status of experts affect trust of followers

Saptarshi Ghosh et. al. [8]

  • Lists
  • Mining Lists meta-data
  • Ranking experts
  • 54M Twitter profiles
  • Better performance than the Twitter WTF service for more than 52% of the questions.

Shaomei Wu et. al. [9]

  • Lists
  • Snowball Sampling
  • Ranking experts
  • Firehose dataset of 42M users
  • Elite users are responsible for spreading the content to bigger audience

Naveen Sharma et. al. [10]

  • Lists
  • Mining Lists meta-data
  • Ranking experts
  • 54M Tweets profiles
  • Cloud of traits, describing someone's hobbies is generated

Table 1. Comparative Evaluation of Existing Approaches

Parantapa Bhattacharya et. al. [3] employed lists to find topical teams (experts + seekers) and analyzed their characteristics. The analysis highlighted many differences between topical and bond established groups in terms of size, member type, interests etc. It is found that community diagnosis algorithms can't be applied anticipated to weak connection between experts and seekers. In the gathered data, first, experts of a subject were found accompanied by seekers, then merging both to create a topical ointment group. The strategy successfully discovered niche topical groups. It is noteworthy that amounts of experts are directly proportional to numbers of seekers. The way resulted in an individual connected part covering of 90% of professionals, which shows well inter-connectivity between experts.

Hemant Purohit et. al. [4] proposed approaches to make automatic interesting summaries of users in limited people. To create summaries, 3 solutions were used, particularly, Occupation Pattern structured, Link Triangulation founded and User Classification based mostly. 92. 8% of summaries generated by Website link Triangulation are believed to be educational and useful based on evaluation done by users, considering readability, specificity and interestingness metrics. For the users, who have been less popular and active, meformer data (compiled by consumer himself, self-descriptive) was used to generate the summation. Wikipedia webpages were also regarded as a source of informer data for the technology of conclusion in Website link Triangulation method, which revealed highest favorability.

Claudia Wagner et. al. [5] elaborated that out of tweet, retweet, bio and List, which user-related content make a good topical ointment expertise account. Two experiments conducted by choosing experts with known subject matter of knowledge from Wefollow directory. The first experiment resulted in most severe expertise judgment when the participants were shown only content (tweets+retweets) which became best when contextual information (bio+tweet) were shown. The next experiment done to know the similarity of inferred issues from 4 user-related data, examined that lists performed the best by revealing 77. 67% of the precise topic appealing of experience. The similarity of topics shown by tweets and retweets is also noteworthy. Another contribution made is the fact bio performs an important role in inference of matter.

Daniela Pohl et. al. [6] represents implication of using interpersonal mass media data for disaster management. The vibrant collection of features from the inbound data using online clustering algorithm uncovered sub-events(ramifications of events or crisis). The conditions extracted from incoming data and those with highest regularity received maximum importance and used for clustering. The evaluation done on Hurricane Sandy, 2012, real-data demonstrated that both online and offline clustering are similar in behavior but quality-wise online outperforms the offline clustering algorithm. Another noteworthy evaluation constitutes lower group of clusters in online clustering algorithm anticipated to ignorance of early on sub-events.

Kevin R. Canini et. al. [7] concentrates on finding which factors do users trust more to judge the reliability of writers. The experiment revealed, more quantitative the content is, more is the trust acquired. Thus, content and interpersonal structure affects trustworthiness to a great extent. Predicated on these 2 factors, an algorithm is proposed to find topical ointment expertise and rank them automatically. The comparability between the algorithm and the appropriately ranking skills' algorithm shows great results and only the proposed methodology.

As, lists rely upon crowd wisdom, Saptarshi Ghosh et. al. [8] proposed a topical ointment expert search system which uses solo feature, Twitter Lists for inferring the topical experts. The technique includes collection and mining of most general population lists of 54 million users who signed up with Twitter before August 2009. The mining of meta-data generated many issues, each individual was ranked regarding to a algorithm [17] and then relationship of member is done with the topic matching to his rank. The membership of your customer in many lists, created by many users adds certain matters to the category when a user can be an expert. Unless previous studies in the context, which uses either user's own information(bio+tweets) or network graph to remove experts of a subject, the relying of analysis only on the intelligence of crowds(Twitter Lists) makes the analysis unique. The examination shows that Cognos[8] provides greater results when compared with the official WTF for more than 52% of the queries. Another noteworthy result came out is the fact WTF relies more on organizational accounts, whereas Cognos employs personal accounts to get the info, therefore not relying only on the typical news businesses but giving identical importance to each Tweets user.

Shaomei Wu et. al. [9] added significantly by classifying users into elite and normal topically, life span of content directly proportional to type of content, and exactly how information flows indirectly to a larger audience. Lists are being used for finding elite users using snowball sampling. The elite users mined for every single of the 4 categories are located to become more active. The elite users besides constituting of only 0. 05% of total human population, 50% attention in the twitter is created by them. It is also found that wording has shorter life-span as compared to multimedia content. The two-step stream policy shows forwarding of top notch users' content either via retweet(acknowledged content) or reintroducing content(unacknowledged) to a wider audience.

The research by Naveen Sharma et. al. [10] is related to the previous review[14], that used a machine learning strategy to find the semantic subject of a web page. D. Ramage et al. [16] used LDA to analyze the content of tweets semantically. D. Kim et al. [15] applied chi-square syndication on tweets to connect them topically. This analysis made a cloud of attributes by mining the lists' meta-data and associating the mined subject areas with the associates of the list. Inferred features include information from bio, perceptions of users and issues of expertise. For checking accuracy and reliability, earth truths and human being feedbacks were considered. The examination demonstrated 94% of the assessments to be exact.


The studies discussed above rely more on self-provided information (bio) and 'pursuing' human relationships. The generation of automated summaries of Twitter users from tweets, bio, mentions etc. is quite subjective in character. The validation of above studies on wider audience with assorted topics can vary greatly the results if greater sample space is considered. However, the final results based on examining social-media data may have far-reaching results if so applied in real life such as coverage decision in federal government, business or any individual organization. There may be a need to test the outcome, sensible information, on basis of certain variables by discounting possible likelihood of irrelevant or disinterested information. The degree of inaccuracy, thus, should be analyzed in each and every analytic method in mining OSN data intelligently. Only then, it will be possible to utilize the true probable of OSN for producing intellect and situation learning.


A lot of work was already done for utilizing tweets data for various purposes. As OSN addresses data from standard users, spammers, and experts thus extraction of useful data is needed for building brilliant recommendation systems. A significant aspect for collecting credible data by means of tweets could be via 'Lists' over a certain topic, which provide a way to check out a category of users, who are believed to be topical experts within a timeline. After extracting topical ointment experience by mining lists' metadata, if their tweets are examined semantically accompanied by some annotations encouraging their impression, can primarily assist in predicting hypersensitive issues such as terrorism, riots etc. and counter-measures required to deal with them. Other useful implications of topical experts could be business forecasting, market research, financial decision-making, level of product life-cycle, opinion-polls, crime-patterns, social-trends, econometrics and response of stake-holders to specific subject areas such as plebiscite or referendum etc. However, appropriate weightage may have to be accorded in the algorithm to discount misleading information promotions by rival opinion-makers.


The OSN provides real-time data covering global audience and subject areas. To analyze data intelligently, various strategies are used, that use different qualities. From the study, it is figured lists, a public sourced feature, if created carefully can provide indications regarding subject matter of competence of its associates. The reason for favoring lists more than other traits lies in its connection with crowd-wisdom. Also, a list is the best way to differentiate top notch users from standard OSN users with crowd-sourcing as its primary feature. The other traits, like fraudulent bio, being provided by an individual, may mislead the search system. In addition, previous studies have proven that mixture of other features with lists generate more correct results.

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)