Akahai is our query suggester and autocomplete engine — designed for Japanese.
Akahai takes Japanese text — while you are typing it — as its input and gives instant high quality suggestions to help the text input process, which is often a search query in many applications. Akahai is designed for Japanese and matches input across hiragana, katakana, kanji and romaji — and combinations — for a seamless autocomplete process.
Try right here, right now
We have made Akahai available for you below to give you an idea of what it does. Please note the following:
- Dictionary. The dictionary used is mined from Wikipedia article titles and the purpose of using this dictionary is to illustrate matching capabilities. Short and generic terms have not been removed.
- Queries. Write Japanese text using IME to see matching across scripts. Only single-term matching is used in this demo. Results in romaji, kana and verbatim kanji can be highlighted.
Tip Try searching for "ほっk" to get a kanji match for things 北海道 (Hokkaido)
- Advanced matching. Gives suggestions across hiragana, katakana, kanji and also romaji. Multi-term matching (match scoping) and hit highlighting and spell-checking (もしかして) is also supported.
- Morphological analysis. Automatically extracts readings for kanji through morphological analysis.
- Very, very fast. Provides high-quality and ranked suggestions at very high speeds and at low latencies.
- Integrates easily. Provides a REST/JSON interface and integrates easily with any search engine, database and web application.
- Robust and ready to suggest. Serves millions of users in Japan daily. Supports a multi-tenant deployment and also has a the administrative APIs necessary for high-volume production use.
Akahai understands input and their scrips, and can match across all these script variants.
Matching across scripts
Input such as "hokk", "ほっk" or "ホッk" or "ﾎｯk" (half-width) all match "北海道" (Hokkaido) in kanji.
Many other input variants are possible and Akahai has been designed to provide a matching that is natural for Japanese and that integrates with IMEs.
Akahai also does various normalization for character widths, whitespace, case etc.
Multi-term matching – match scoping
Multi-term matching or scoping is very useful for restricting alternatives for a keyword and to explore relevant suggestions by scoping them to the first input keyword.
For example, with the input "銀座" (Ginza) followed by a space, we can see related suggestions to Ginza.
Akahai can optionally highlight the text that matches input. This highlighting works across case, character widths and across kana.
Spellchecking — もしかして
Additional to doing query suggestions, Akahai can also do Japanese spelling corrections.
Spelling mistakes in Japanese look very different than those in English where just a character is missing or is misplaced.
Because of the changes introduced by IME, Japanese spelling mistakes typically look completely different that their correct form with unrelated kanji, etc.
A few example corrections are given in the right-side table.
All lexical assets used for dictionaries can be analyzed morphologically. In this process, we extract readings and reading variants from input which will be used for matching. As a result, users can provide lists of relevant places, people, etc. in kanji and readings will automatically be identified.
Our morphological analyzer can be customized with its own dictionaries if desired. This might be useful in the rare case it makes a mistake extracting readings, which is primarily due to sentences being short.
High performance – low latency
Akahai is designed for very high performance and low-latency matching. This is very important for a responsive and overall high quality user experience.
In order to support high-performance scenarios, Akahai is using compact and very fast memory-based data structures. Additionally, caching is done to increase performance even further.
An optional result cache is also provided for maximum query performance – also across restarts.
The cache is written to disk in case Akahai is being shut down. When Akahai is restarted this cache is loaded and frequent queries will hit the cache immediately. As a result, there is generally no need to manually pre-warm caches before Akahai can serve queries at maximum performance.
Cache sizes, expiry and eviction strategy is highly configurable.
Akahai serves millions of users in Japan on a daily basis and is proven in high-performance scenarios.
A set of management and status REST/JSON APIs provides very easy deployment and operations.
Live dictionary updates
Dictionaries can be updated on a live system without any downtime or significant effect on system downtime. Similarly, caches can also be managed separately from reloads so that new entries can naturally enter the case as older entries are evicted through timeouts.
Status monitoring provides system status and metrics on keyword counts, cache usage, zero matches, queries-per-second, etc.
Frequently Asked Questions
How much does Akahai cost?
Please contact us at moc.akilita@selas for a quote.
Can I receive a trial or evaluation license?
Most certainly! Please contact us at moc.akilita@selas and we will arrange a free trial license.
Which languages does Akahai support?
Akahai is designed to support Japanese, but Western languages can also be supported.
How do we make a dictionary for Akahai?
Akahai's dictionary can be made in several ways. We have extensive experience and tools for building suggestions dictionaries using query logs, lexical assets and also automatically extracting terms from free-text content.
We will work with you and help make sure that your lexical assets are applied to your application and content in the best way possible.