Rosette for Social Media Monitoring

Rosette for Social Media Monitoring

Enable social media analysis in over 40 languages

The rise of social media is a worldwide phenomenon, and people are using many languages to interact online. Last year, only half of all tweets were in English and more than 75% of Facebook users are outside the U.S. Many applications have been developed to ingest and analyze the data from various social media sources. Rosette®, a software development kit (SDK), enables these applications to work effectively on text in over 40 of the world’s major languages. Rosette quickly integrates with social media applications to give developers a head start in analyzing multilingual data from Twitter, Facebook, LinkedIn, and other social media channels.

The Rosette linguistics platform enables social media monitoring tools to identify language of incoming feeds, analyze sentences for sentiment analysis, extract entities for metadata, and improve search results.

Identify the Language of Tweets, Blogs, and Reviews

Cleaning and aggregating social media content starts with language identification. However, location-based and user-specified language settings for posts can be unreliable. Our language identifier has been tuned for high throughput and accuracy and identifies 55 languages. The language identifier is designed to keep up with the Internet’s unprecedented flow of data—blog entries, product reviews, and the Twitter Firehose at over 140 million tweets a day.

Analyze Text to Support Semantic and Sentiment Analysis

Semantic and sentiment analysis requires analyzing every word in a sentence. In languages such as English, Portuguese, Japanese, Spanish, and Dutch, Rosette’s linguistic analysis will:

  • Tag parts-of-speech
  • Lemmatize words (find their dictionary form)
  • Detect sentence boundaries
  • Extract noun phrases

Locate Entities to Add Metadata for Advanced Filtering

Our entity extractor populates metadata for each post, article, and social conversation with extracted entities—e.g., people, places, companies, and product names. Social media monitoring applications can then filter data based on entities in the metadata. Rosette® Entity Extractor automatically generates metadata for 18 types of entities in over a dozen languages. Developers can customize the entity extractor to detect other entities.

Supporting Sentiment Analysis at the Entity Level

Modern vendors of sentiment analysis ascribe sentiment to entities rather than to documents. This method provides a clearer view of what people are saying about brands, products, and their features. Rosette will supply any semantic or sentiment analysis system with accurate and comprehensive entity extraction in the major languages of the Americas, Europe, Asia, and the Middle East.


Cluster Posts to Streamline Search Results

Social media content aggregators can offer a more rewarding experience to subscribers with Rosette’s document clustering. Give your users the ability to review groups of near-identical conversations or posts rather than read every one. The number of items in a group can also indicate trending topics and product, or expose incidents of social media spamming.

When indexing a high volume of tweets, clustering will detect nearly identical posts, such as retweets, to avoid unnecessary processing.

Improve Search of Social Media Content

The quality of a data feed is only as good as its search. For any language searched, adding linguistic processing at index and query time increases the number of relevant search results with little degradation to precision. Our morphological analyzers produce each word’s lemma (dictionary form of a word), which informs indexing. Other methods such as stemming only look at superficial commonalities, leading to potentially unrelated results.

  • Related words share a lemma: “speak,” “speaking,” “spoke,” “speaks”
  • Common lemma: “speak”
  • Unrelated words may share a stem: “severed,” “several”
  • Common stem: “sever”

The language-aware approach of lemmatization is used by top enterprise and web search engines today.

Track Names of Products and People

Social media posts are notoriously casual, and are full of misspelled names and nicknames. Overcoming name variants is especially critical for reputation tracking or brand analysis. Our name matcher will find all relevant posts for “Madonna” even when her name is spelled “マドンナ,” “Madonna Ciccone,” or “Madona.” It handles nicknames, missing name components, spelling errors and variants, mixed order names, names in different languages, and more.


Sample name search result for “Steve Jobs” finds variations of his names, even in Arabic!


Try a Product Evaluation

Request a complete set of the Rosette software platform today.

Natural Language Processing for Over 15 Years

Basis Technology has been the industry choice for multi-language natural language processing, starting with major search engines—including Google, Yahoo!, Microsoft Bing, and Oracle Endeca. We’ve continued to refine and hone our linguistic software components to meet the new wave of language challenges inherent in social media analysis. Contact us for a free evaluation of how Rosette can make your social media analysis software internationally ready.

With English now making up only about 40 percent of global social media postings, it is becoming increasingly important for our customers to access business and market insights hidden in non-English conversations. Basis Technology allows us to quickly expand the scope of our analysis to the languages our customers demand. We chose Rosette because its accuracy and performance enables us to broaden the global coverage of our technology to discover insights from social media conversations in other languages.

Steve Winters

Vice President of Engineering, NetBase

Contact us for more information:

Learn More

Download a Product Datasheet

Fill out the form on this page to get more information

This is a unique website which will require a more modern browser to work! Please upgrade today!