Topic Extractor

Identify keywords and significant phrases in your text, and the topics not explicitly named

Get Rosette

Overview

Quickly get the gist of your content with topic extraction

Topic extraction discovers the keywords in documents or databases that capture the essence of the text. However, unlike categorization or entity extraction, topic extraction is not constrained by a finite list of recognized entity types or categories. Instead, the topic endpoint identifies “keyphrases” and “concepts” for the given input based on frequency and linguistic patterns in the text, ranking them according to their relative importance.

Topic extraction quickly lists the keyphrases and concepts to give you the gist of an article or document. On a macro level, the same principle can be applied to a corpus of documents to understand the major ideas. Knowing the keyphrases and concepts in each document enables users to automatically tag, sort, and organize their data, making it more useful to analysts and database managers.

Keyphrases versus concepts

Keyphrases are significant phrases or words quoted from the text that Rosette® deems to be representative of the content. They are uncovered based on frequency, and consider common stop words like “and” or “that,” as well as language-specific statistical patterns of where keywords are likely to be located. Concepts are themes detected within the text that may not be explicitly named. For example, an article about the Super Bowl may have the concept of “sports” or “American football,” even if neither word appears.

Topic extraction is currently only available in English, but our on-premise tools can be custom-trained for new languages.

Product highlights

  • English only
  • Extracts keyphrases
  • Identifies concepts
  • Cloud or on-premise deployments
  • Fast and scalable
  • Industrial-strength support
  • Constantly stress-tested and improved

Tech Specs

Availability and platform support

Deployment availability:
Bindings:

Supported Languages

English
/topics endpoint
{"content": "To Sleep John Keats, 1795 - 1821
O soft embalmer of the still midnight!
 Shutting with careful fingers and benign
Our gloom-pleased eyes, embower’d from the light,
 Enshaded in forgetfulness divine;
O soothest Sleep! if so it please thee, close,
 In midst of this thine hymn, my willing eyes,
Or wait the amen, ere thy poppy throws
 Around my bed its lulling charities;
 Then save me, or the passèd day will shine
Upon my pillow, breeding many woes;
Save me from curious conscience, that still lords
 Its strength for darkness, burrowing like a mole;
Turn the key deftly in the oilèd wards,
 And seal the hushèd casket of my soul. - John Keats

This poem is in the public domain.

John Keats
Born in 1795, John Keats was an English Romantic poet and author of three poems considered to be among the finest in the English language."}

{"keyphrases": 
 [{"phrase": "lulling charities"},
 {"phrase": "O soothest Sleep"},
 {"phrase": "John Keats"},
 {"phrase": "O soft embalmer"},
 {"phrase": "hushèd casket"},
 {"phrase": "English Romantic poet"},
 {"phrase": "forgetfulness divine"},
 {"phrase": "pleased eyes"},
 {"phrase": "passèd day"},
 {"phrase": "oilèd wards"}],

"concepts": 
 [{"phrase": "John Keats",
 "conceptId": "Q82083"}]}

Topic Extraction Example

Deployment

Rosette Cloud

Sign up today for a free 30-day trial

The SaaS version of Rosette is rapidly implemented, low maintenance and ideal for users who wish to pay based on monthly call volume. Numerous bindings through a RESTful API are supported.

Rosette Server Edition

This on-premise private cloud deployment puts all the functionality of Rosette Cloud behind your secure firewall, and enables advanced user settings, access to custom profiles (user-specific configuration setups), and deployment of custom models.

Rosette Java Edition

For on-premise systems that need the low-latency, high-speed integration of an SDK, Rosette Java is the way to go. It has been deployed in the most demanding, high-transaction environments, including web search engines, financial compliance, and border security.

Rosette Plugins

Just plug in Rosette for instant high-accuracy multilingual search and fuzzy name search for Elasticsearch or Apache Solr.

Quality documentation and support

Our support team responds to customers in less than a business day, and is committed to a satisfactory resolution. Users have access to in-depth documentation describing all the features, with code examples and a searchable knowledge base.

Visit our GitHub for bindings and documentation.

Request Custom Demo

Complete this form and our customer team will reach out to schedule a demo based on your use case.

Questions?

Email: info@basistech.com

Phone: +1-617-386-2000

Select Customers