{"version":3,"sources":["webpack://atilika-com/./src/pages/en/kuromoji.tsx"],"names":["KuromojiEn","lang","title","path","name","H1","grey","spaceRt","H2","memo"],"mappings":"gPAiBMA,EAAa,kBACf,gBAAC,IAAD,CAAMC,KAAK,KAAKC,MAAM,WAAWC,KAAK,gBAClC,gBAAC,IAAD,CAAMC,KAAK,eAAX,yDACA,gBAAC,IAAD,KACI,gBAAC,EAAAC,GAAD,iBACA,mFAGJ,gBAAC,IAAD,CAAOH,MAAM,aAEb,gBAAC,IAAD,CAASA,MAAM,oBAAoBI,MAAI,GACnC,gHAGA,gBAAC,IAAD,gBAEI,iCAFJ,+BAOJ,gBAAC,IAAD,CAASJ,MAAM,0BACX,oHAIA,gBAAC,IAAD,CAAgBI,MAAI,EAACC,SAAO,GACxB,iCACM,kCADN,IACoB,sCADpB,KAEM,kCAFN,IAEoB,sCAFpB,IAEsC,sCAFtC,KAGM,6CAHN,KAIM,4CAJN,IAI8B,uCAKtC,gBAAC,IAAD,CAASL,MAAM,gBAAgBI,MAAI,GAC/B,kFACA,gBAAC,IAAD,KACI,6BACI,6BACI,0BACI,0CACA,yCAGR,6BACI,0BACI,kCACA,kCAEJ,0BACI,mCACA,kCAEJ,0BACI,mCACA,qCAOpB,gBAAC,IAAD,CAASJ,MAAM,YACX,wDACA,gBAAC,IAAD,CAAgBI,MAAI,GAChB,iCACM,mCAFV,KAKI,kCACO,oCANX,IASI,iCACM,kCAVV,IAaI,gCACK,iCAdT,KAiBI,gCACK,iCAEL,yCAIR,gBAAC,IAAD,CAASJ,MAAM,2BAA2BI,MAAI,GAC1C,6LAKA,wMAOJ,gBAAC,IAAD,CAASJ,MAAM,sBACX,uKAMJ,gBAAC,IAAD,CAASA,MAAM,cAAcI,MAAI,GAC7B,yFAGJ,gBAAC,IAAD,CAASJ,MAAM,sBACX,4IAMJ,gBAAC,IAAD,CAAcA,MAAM,OAAOD,KAAK,OAEhC,gBAAC,IAAD,CAASC,MAAM,SACX,qHAIA,0BACI,0FACA,wHAIA,oGACA,mEAEJ,6HAKA,gBAAC,IAAD,MAEA,gBAAC,EAAAM,GAAD,uBACA,yLAIA,gBAAC,IAAD,CAAMP,KAAK,QAAX,8sBAiBA,iHAGA,gBAAC,IAAD,CAAMA,KAAK,OAAX,yPASJ,gBAAC,IAAD,CAAQA,KAAK,SAIrB,WAAeQ,UAAKT","file":"component---src-pages-en-kuromoji-tsx-95bf3f4f069211dcfa40.js","sourcesContent":["/*\n * Copyright © 2017 - 2019 Atilika Inc. All rights reserved.\n */\n\nimport React, {memo} from \"react\";\nimport Code from \"../../common/Code\";\nimport Footer from \"../../common/Footer\";\nimport Intro from \"../../common/Intro\";\nimport Page from \"../../common/Page\";\nimport Meta from \"../../common/Meta\";\nimport Section from \"../../common/Section\";\nimport KuromojiDemo from \"../../kuromoji/KuromojiDemoSection\";\nimport KuromojiHero from \"../../kuromoji/KuromojiHero\";\nimport KuromojiSample from \"../../kuromoji/KuromojiSample\";\nimport Divider from \"../../common/typography/Divider\";\nimport {H1, H2} from \"../../common/typography/Headings\";\n\nconst KuromojiEn = () => (\n \n Open source Java morphological analyzer for Japanese.\n \n

Kuromoji

\n

Open source Java morphological analyzer for Japanese.

\n
\n\n \n\n
\n

\n Kuromoji can separate a block of text into distinct words, also known as morphemes.\n

\n \n 吾輩は猫である。\n    吾輩   は   猫  \n で   ある   。\n \n
\n\n
\n

\n For each word, Kuromoji assigns a part of speech like noun, verb, adjective, and so\n on.\n

\n \n \n 相撲nounparticle\n 見るverbparticleparticle\n 好きadjectival noun\n ですauxiliary verbsymbol\n \n \n
\n\n
\n

Get the base form for inflected verbs and adjectives.

\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Surface FormBase Form
食べたい食べる
楽しくない楽しい
帰りました帰る
\n
\n
\n\n
\n

Extract readings for kanji.

\n \n \n 親譲おやゆず\n \n りの\n \n 無鉄砲むてっぽう\n \n で\n \n 小供こども\n \n の\n \n 時とき\n \n から\n \n 損そん\n \n ばかりしている\n \n
\n\n
\n

\n Kuromoji comes with a Search Mode for search applications, that does additional\n splitting of words to make sure you get hits when searching for compounds nouns.\n

\n\n

\n For example, we want a search for 空港 (airport) to match 関西国際空港 (Kansai\n International Airport), but most analyzers don’t allow this since 関西国際空港 tends\n to become one token.\n

\n
\n\n
\n

\n Kuromoji support a wide range of dictionary backends to support different use cases,\n including ipadic, jumandic, and unidic among others.\n

\n
\n\n
\n

Kuromoji is licensed under the Apache License, Version 2.0.

\n
\n\n
\n

\n Kuromoji powers the Japanese language support in Apache Lucene and Apache Solr. It\n also used in Elasticsearch.\n

\n
\n\n \n\n
\n

\n Kuromoji is an easy to use and self-contained Japanese morphological analyzer that\n does:\n

\n
    \n
  • Word segmentation. Segmenting text into words (or morphemes)
  • \n
  • \n Part-of-speech tagging. Assign word-categories (nouns, verbs, particles,\n adjectives, etc.)\n
  • \n
  • Lemmatization. Get dictionary forms for inflected verbs and adjectives
  • \n
  • Readings. Extract readings for kanji
  • \n
\n

\n Several other features are supported. Please consult each dictionaries’ Token class\n for details.\n

\n\n \n\n

Using Kuromoji

\n

\n The example below shows how to use the Kuromoji morphological analyzer in its\n simplest form; to segment text into tokens and output features for each token.\n

\n {`\n package com.atilika.kuromoji.example;\n\n import com.atilika.kuromoji.ipadic.Token;\n import com.atilika.kuromoji.ipadic.Tokenizer;\n import java.util.List;\n\n public class KuromojiExample {\n public static void main(String[] args) {\n Tokenizer tokenizer = new Tokenizer() ;\n List tokens = tokenizer.tokenize(\"お寿司が食べたい。\");\n for (Token token : tokens) {\n System.out.println(token.getSurface() + \"\\\\t\" + token.getAllFeatures());\n }\n }\n }\n `}\n

\n Make sure you add the dependency below to your pom.xml before building your project.\n

\n {`\n \n com.atilika.kuromoji\n kuromoji-ipadic\n 0.9.0\n \n `}\n
\n\n