HitPaw VikPea

  • AI upscaling your video with only one click
  • Solution for low res videos, increase video resolution up to 8K
  • Provide best noise reduction for videos to get rid of unclarity
  • Exclusive designed AI for perfection of anime and human face videos
HitPaw Online learning center

A Close Look at 4 Japanese AI Model Releases Shaping the Scene

If you are wondering if there is a Japanese AI model out there that's worth competing with the latest AI models from the US or other countries, you are in for a surprise. Some of them are surely on their way to lead the race, and we'll discuss the 4 of them that everybody is talking about.

Generate Now!

Part 1. Shisa v2 405B Japanese AI Model

Shisa is one of the most powerful Japanese English bilingual AI models, released in a 4.5 billion parameter version.

Shisa v2 405B Japanese AI model

The name Shisa refers to a traditional Okinawan decoration, often in the form of a ceramic or stone statue. These statues typically depict a lion-like creature to defend from evil spirits and bring good luck. You often see them placed at the entrance of homes or buildings, particularly in Okinawa, Japan.

That is what you are testing in this video.

You cannot get it installed locally because of its size. It would require at least 800 GB of VRAM just to get it running locally to reach decent performance. You do not have that, so you can test it online.

Before that, it helps to talk a bit more about this model because there is some interesting information here.

About the New Japanese AI Model Shisa v2 405B

The same team released Shisa version one last year with 7 billion parameters. At that time, to be fair, this Japanese AI model was not good, even for simple Japanese tasks. According to the team, this version shows a lot of improvement, especially across Japanese benchmarks.

This model is multilingual with strong Japanese language capability. It is built on LLaMA 3.1, not LLaMA 3.2, 3.3, or 4. It uses the LLaMA 3.1 45 billion instruction-based model and represents a post-training effort focused on improving Japanese performance while retaining English capability.

Architecture Details

The model targets top-tier Japanese language performance domestically while also improving Korean and traditional Chinese output for tourism-related applications. The license is the LLaMA 3.1 community license, not Apache 2, because it is based on LLaMA, but it is open source.

The architecture maintains the core LLaMA 3.1 design with a very large parameter count. It uses the standard transformer architecture found in the LLaMA family and undergoes post-training to optimize multilingual output, particularly Japanese.

Training uses advanced optimization techniques such as DeepSpeed ZeRO-3 parameter handling, activation offloading, gradient accumulation, 8-bit optimizers, and sequence parallelism to manage the large parameter size. Fine-tuning this model requires a very large infrastructure.

Online Prompt Testing

Testing takes place through the website chat.shisa.ai. You test the model using several prompts. Since Japanese fluency is limited, you rely on Google Translate to prepare and understand the prompts.

Testing Shisa v2 405B Japanese AI model

If you ask the model to explain the aesthetic of wabi-sabi in the Japanese tea ceremony and analyze similarities and differences with modern minimalist design, the response you get is grounded in Japanese culture. It discusses wabi-sabi aesthetics, spiritual fulfillment, interconnected ideas, and key differences. The answer reads far better than the version released last year.

In another test, you can ask the model to translate an English business email into polite Japanese and then adapt it for a Korean business context. The Japanese translation includes proper greetings, indirect references, polite phrasing, and expressions of gratitude. The model also explains the cultural adjustments it applies. The Korean version follows the expected business tone and structure when checked through translation.

As far as reasoning goes, you can ask something like Tokyo's population density. In response, the Shisa v2 405B Japanese AI model may ask how many people would occupy Shibuya's scramble crossing if Tokyo's average population density were applied, and requests a mathematical analysis.

The response follows a step-by-step explanation. The reasoning depth does not match newer reasoning-focused models, but the output quality remains acceptable. The discussion about crowd dynamics adds useful context.

If your prompt is a bit funny and light in nature, like a fictional Japanese dating scenario, it describes a character with no romantic experience planning a first date and decorating his room with anime merchandise.

After translation, the response follows a humorous internal monologue style. It discusses confidence, room preparation, cooking, and exaggerated expectations. The tone aligns with familiar anime-style storytelling.

Part 2. Nekomata Japanese AI Model: Japanese LLM on Qwen

Nekomata is a state-of-the-art Japanese language model based on the Qwen model series. Language modeling remains a difficult topic, and GPT-4 stands as one of the most dominant language models today. It shows strong performance across many benchmarks. In evaluations such as MT-Bench and Alpaca Eval, GPT-4 is even used as a judge to assess the performance of other models.

However, GPT-4 does not solve every problem. In situations where security, customization, and limited budgets matter, non-open source language models can become problematic. Model weights are not accessible, data must be sent to external services, customization is restricted, and self-hosting to reduce cost is not possible.

Because of these constraints, open source language models become an important alternative.

Limitations of Existing Open Source Models

Many capable open source language models are English-centric or Chinese-centric. For applications in other languages, such as Japanese, multilingual models are often used, but they usually do not perform strongly enough.

In response, researchers explored using robust foundation models as bases for continual training with Japanese language data. This approach can produce stronger language models than training from scratch using smaller proprietary datasets.

Initial Japanese language models were trained from scratch with modest performance. Later models increased in size and dataset scale, but progress remained limited compared to larger foundation models.

This approach changed after the release of strong foundation models such as Llama 2. These models were trained on extremely large token counts, often exceeding one trillion tokens. It became clear that continual pre-training on top of such models could produce stronger results than training entirely from scratch.

Shift to Continual Pre-Training

Based on this shift, Japanese language models began using Llama-based foundations. This approach resulted in noticeable performance gains compared to earlier Japanese-only models.

One challenge remained. Llama 2 training data and tokenizer design focused primarily on English. This created inefficiencies during Japanese training and inference.

The Qwen model series addressed this issue. Qwen provided a stronger baseline model while also allowing easier transfer to new languages. Its design supported broader language coverage without relying on English-centered tokenization.

Qwen offers multiple model options across different sizes, modalities, and application purposes. According to third-party benchmarks, large Qwen models achieve performance close to GPT-4 while remaining open source.

Additional features include support for Flash Attention, verified quantized models, and long context lengths.

One key strength of Qwen lies in its tokenizer. The tokenizer uses a more inclusive vocabulary design.

Japanese tokenization experiments showed that Qwen's tokenizer requires significantly fewer tokens compared to Llama 2. Similar efficiency improvements were observed across several other languages. On average, token efficiency improved by around 30 percent, with higher gains in some cases.

Training Approach for Nekomata

The Qwen model family from Alibaba Cloud serves as a foundation for training Nekomata models, especially Nekomata-14B and Nekomata-7B. Qwen models are designed to handle multilingual input and to operate across diverse tasks. They have been publicly released with open weights for many sizes, including versions with 7 billion and 14 billion parameters.

Qwen's tokenizer uses a broad vocabulary designed to represent many languages, including Japanese. Compared with some earlier models, this tokenizer can encode text more efficiently, reducing the number of tokens required to represent the same content.

This improved token efficiency helps downstream tasks such as text generation, reasoning, and translation by keeping context within manageable lengths.

The Nekomata Japanese AI model series uses continual pre-training based on the Qwen foundation. Different model sizes were trained using large-scale token datasets. Sequence length, batch size, weight decay, and learning rate schedules were carefully selected.

Compared to pre-training from scratch, continual pre-training required far fewer training hours while achieving stronger performance.

Evaluation results show that larger Nekomata models outperform much larger Llama-based models on Japanese language benchmarks. Smaller Nekomata models also surpassed previous Japanese-focused models of similar size.

These results highlight the effectiveness of combining Qwen foundations with Japanese continual pre-training.

Part 3. Takane Japanese AI Model

The Japanese Takane AI model is a Fujitsu-developed large language model (LLM), and it is particularly optimized for the Japanese language and secure enterprise use. It offers high accuracy in complex business contexts by leveraging Cohere's technology and is designed for sensitive sectors like finance, government, and healthcare, with features for data privacy and customization. It excels in handling Japanese linguistic nuances, supports private deployment, and is part of Fujitsu's Kozuchi AI service.

Takane Japanese AI model

Key Features & Capabilities:

  • Japanese Language Proficiency: Achieves top scores on Japanese benchmarks (JGLUE) by addressing complexities like kanji, honorifics, and omitted subjects.
  • Secure Enterprise Focus: Designed for deployment in private environments (on-premise/hybrid cloud) to protect sensitive business data.
  • Enhanced Accuracy: Uses advanced Retrieval-Augmented Generation (RAG) to reduce hallucinations and improve factual grounding.
  • Customization: Can be fine-tuned with proprietary company data for specific workflows and tasks.
  • Foundation: Based on Cohere's Command R+ LLM and enhanced with Fujitsu's own training.
  • Target Industries: Ideal for finance, government, healthcare, and legal sectors where accuracy is critical.

Part 4. Sakana Japanese AI Models

Sakana AI is a group of researchers who are tired of designing AI that just sits there and spits out answers. Instead, they look at how things work in nature and try to borrow those ideas.

The name Sakana literally means fish in Japanese. That is not just branding. Fish move in groups, adjust to their surroundings, and react to each other. Sakana AI uses that same thinking when designing AI models. The idea is that intelligence does not come from one rigid system but from interaction, change, and adaptation over time.

Most traditional models are trained once and then frozen. They do what they do, and that's it.

Sakana AI takes inspiration from biological systems instead. In nature, intelligence shifts based on the situation. Groups react together. Patterns evolve. Their Nature-Inspired Intelligence work tries to bring that behavior into AI, so models can behave less like static tools and more like systems that adjust how they reason.

Instead of building one giant model and hoping it does everything well, Sakana AI works on Evolutionary Model Merging.

Sakana Japanese AI model merging

This technique combines multiple Large Language Models into one system. Each model brings its own strengths. Some might reason better, some might write better, and some might handle logic better. Rather than choosing one, they merge them.

They use technical methods like SVD-based mutation, which slightly alters and blends model parameters. The result is a merged model that carries traits from multiple models, similar to how evolution combines traits across generations.

The AI Scientist

The AI Scientist is one of their most talked-about projects.

Shisa AI Scientist

Instead of giving AI a narrow task, this system is designed to explore research problems on its own. It can form ideas, test them, analyze outcomes, and move forward without being spoon-fed instructions every step.

Continuous Thought Machine (CTM)

The Continuous Thought Machine, or CTM, is about how reasoning happens.

Most models jump straight to an answer. CTM does not. It reasons in internal steps called ticks. Each tick represents a moment of internal processing, similar to how neurons fire together in biological brains.

Shisa AI Continuous Thought Machine

This structure lets the model work through a problem gradually instead of rushing to a conclusion. That makes it useful for problems where intermediate thinking actually matters.

TreeQuest (Multi-LLM AB-MCTS)

TreeQuest, also known as Multi-LLM AB-MCTS, is about teamwork between models.

Instead of relying on one Large Language Model, TreeQuest coordinates several of them. It uses Monte Carlo Tree Search to decide what to do next. Sometimes it refines an existing idea. Sometimes it explores a new one.

This back-and-forth decision process often leads to stronger results than letting a single model handle everything alone.

Edge and Energy Awareness

Sakana Japanese AI model also spends time on Edge and Efficiency, meaning models that can run on smaller devices without heavy hardware.

This matters when systems cannot rely on massive servers all the time. Lower power use and lighter models allow AI to operate in places where resources are limited.

Sakana AI pays close attention to Japan-specific AI, especially language and social needs that are often ignored by global models. This includes real challenges like a shrinking workforce.

At the same time, they work with large Japanese companies such as MUFG, applying their research to areas like finance, manufacturing, and defense. Alongside that, they continue experimenting with Novel Architectures that move away from today's standard model designs.

Part 5. FAQs of Japanese AI Model

Q1. Does the Japanese government have an AI model?

A1. The Japanese government, in partnership with companies like Preferred Networks, is working to create a homegrown AI model to reduce reliance on U.S. and Chinese systems and address security and cultural bias concerns. These efforts involve using powerful infrastructure like the Fugaku supercomputer for training.

Q2. What is the best AI model for Japanese translation?

A2. Claude 3.5 comes out as the best AI model for Japanese, based on testing without any prompts. Even with no extra guidance, it handled translations more accurately than others. The meaning stayed clear, the sentence structure stayed natural, and only minor issues appeared in flow, where it scored slightly lower than a perfect result.

Q3. Which country is #1 in AI?

A3. The United States currently ranks number one in AI worldwide. It stays far ahead because of strong research activity, money invested in technology, government involvement, and public awareness. China and India follow behind. Some smaller but wealthy countries, like Singapore and the UAE, also perform surprisingly well compared to their size.

Conclusion on Japanese AI Model

Japanese AI models are clearly making headlines right now, and for good reason. Each emerging Japanese AI model discussed in this guide shows a different approach to language, reasoning, and real-world use. Together, they show how fast this space is moving and why Japan's work in AI keeps drawing global attention, both technically and culturally.

Generate Now!

Select the product rating:

HitPaw Online blogs

Leave a Comment

Create your review for HitPaw articles

Recommend Products

HitPaw Univd

HitPaw Univd

All-in-one video, audio, and image converting and editing solutions.

HitPaw Edimakor

HitPaw Edimakor

An Award-winning video editor to bring your unlimited creativity from concept to life.

download
Click Here To Install