Rosette API に新しい機能「テキスト埋め込み」が加わりました。テキスト埋め込みの最もポピュラーな用途の一つは類似度計算です。
Deep Learning Powers Cross-Lingual Semantic
September 16, 2016
Text Embeddings Now Available in the Rosette API
The Rosette API team is excited to announce the addition of a new function to Rosette’s suite of capabilities:
text embedding. This endpoint returns a single vector
of floating point numbers for your input, a.k.a. an embedding of your text in a semantic vector space.
Text embeddings can be used for a variety of text analysis tasks, including judging the semantic similarity of
one or more texts across languages. Knowing the embeddings of two documents, phrases or words can allow
you to evaluate how similar they are in meaning or content.
What is semantic similarity?
While word and text embedding is still an emerging capability in the realm of natural language processing, one
of the most popular uses for text embeddings so far is similarity calculation. Our engine utilizes machine
learning to recognize similarities in context, content, and associations. For example, king correlates
to man while queen correlates to woman.
For the end user, text embeddings could power a number of different applications. Businesses engaging in
eDiscovery might use text embedding for deduplication of documents. In consumer services, review websites
like Yelp or TripAdvisor could use text embeddings to aggregate related phrases such as “the bathrooms
were spotless” or “the restroom was very clean.”
Find Related Words and Documents in 5 Languages
The Rosette API text embedding endpoint supports five languages: English, German, Spanish, Japanese, and
Chinese. It also supports cross-lingual comparisons, which allow you to calculate the similarity of words
or documents written in different languages. As a test, consider evaluating the similarity of comparable
or related words in different languages, such as “amor” and “love,” or “die Braut” and “le mariage.”
Try it Out
Once you’ve signed up (no commitment, no credit card
required) for your Rosette API account, you can try out a basic application for text embeddings using s
ome sample Python code we created. Remember, the Rosette API is free for up to 10,000 calls per month!
If you need more calls, check out our paid plans.
First, head to the Rosette API GitHub community and clone the text-embeddings-sample repo to your machine.
You should see two files (plus a README.md):
Make sure you’ve installed the latest version of our Python client binding — 1.3.2 — via
$ pip install rosette-api --upgrade
Then edit cosine_similarity.py in your favorite text editor to replace “[your key here]” with yourRosette API key.
Save, and head back to the text-embeddings-sample directory in your command line to run test_embeddings.py. It should look something like this:
$ python test_embeddings.py
We can see some interesting measures of semantic similarity between “Paris”, “France”, “London”, and “England”. Notice that “Paris” is closer in meaning to “France” than “London” is to “France.” Similarly, “England” is more semantically similar to “London” than to “France”.
Once you’ve gotten the hang of it, replace the sample input words in test_embeddings.py with your own words or longer input text to calculate the similarity between them.
You can find more details about text embedding and language coverage in the documentation. If you discover any cool results or use cases, let us know! Email email@example.com and we’ll feature your results on our blog or in our GitHub community.