Content Categorizer

Automatically classify your content according to your categories or taxonomy

Get Rosette


Quickly build custom categorizers to optimize your NLP pipeline

Nearly every business has a specific set of categories, themes, or topics that fits their tasks, and no out-of-the-box taxonomy will satisfy. That’s why Rosette® offers a Classification Field Training Kit so that users can train their own categories, or engage the BasisTech solutions team to do so. The Classification Field Training Kit builds upon the fundamental linguistic analysis of Rosette Base Linguistics, allowing users to extend categorization to process documents in 30+ languages.

There are two training methods:

  1. Keyword-based training uses keywords that are representative of each category. It leverages Wikipedia pages that are representative of the keywords to use as the training set.
  2. Machine learning from a training set involves handpicking documents that best represent each category. Rosette uses the training documents to machine learn the new taxonomy, so that future documents can be categorized with accuracy.

The first step in triaging data for any application

Categorization is a powerful tool for taming large datasets. Depending on the project, it might make sense to categorize documents before performing tasks like named entity recognition or topic extraction. For example:

  • Categorize social media posts by topic to highlight trends and segment audience
  • Identify positive or negative sentiment in a statement
  • Filter out spam in email streams
  • Tag webpages by content for targeted advertising and displaying related content
  • Detect responsive documents for eDiscovery
  • Identify the topic of customer service inquiries to properly prioritize or route questions.

Product highlights

  • Field training kit to easily add categories or language support
  • Pretrained on IAB Tech Lab Content Taxonomy for English
  • Cloud or enterprise deployments
  • Fast and scalable
  • Industrial-strength support

Tech Specs

Availability and platform support

Deployment availability:


Arts & Entertainment Family & Parenting Health & Fitness
Hobbies & Interests Law, Govt. & Politics Religion & Spirituality
Tech’y & Computing Automotive Education
Food & Drink Home & Garden Personal Finance
Real Estate Style & Fashion Business
Careers Pets Science
Society Sports Travel

Supported languages

30+ languages* via Rosette Classification Field Training Kit

* Languages must be supported by Rosette Base Linguistics.

Sample output:
  "categories": [
      "label": "ARTS_AND_ENTERTAINMENT",
      "confidence": 0.06416648,
      "score": -0.01447566
      "label": "SPORTS",
      "confidence": 0.05782175,
      "score": -0.11859164
      "label": "TRAVEL",
      "confidence": 0.05627946,
      "score": -0.14562697
      "confidence": 0.05617463,
      "score": -0.14749148
      "label": "HEALTH_AND_FITNESS",
      "confidence": 0.05582167,
      "score": -0.15379449

Try the Demo


Rosette Cloud

Sign up today for a free 30-day trial

The SaaS version of Rosette is rapidly implemented, low maintenance and ideal for users who wish to pay based on monthly call volume. Numerous bindings through a RESTful API are supported.

Rosette Server Edition

This on-premise private cloud deployment puts all the functionality of Rosette Cloud behind your secure firewall, and enables advanced user settings, access to custom profiles (user-specific configuration setups), and deployment of custom models.

Rosette Java Edition

For on-premise systems that need the low-latency, high-speed integration of an SDK, Rosette Java is the way to go. It has been deployed in the most demanding, high-transaction environments, including web search engines, financial compliance, and border security.

Rosette Plugins

Just plug in Rosette for instant high-accuracy multilingual search and fuzzy name search for Elasticsearch or Apache Solr.

Quality documentation and support

Our support team responds to customers in less than a business day, and is committed to a satisfactory resolution. Users have access to in-depth documentation describing all the features, with code examples and a searchable knowledge base.

Visit our GitHub for bindings and documentation.

Request Custom Demo

Complete this form and our customer team will reach out to schedule a demo based on your use case.



Phone: +1-617-386-2000

Select Customers