Greetings!
We are using HTML to DOCX conversion for converting HTMLs containing Japanese characters to DOCX files. These HTMLs are using “Noto Serif JP” font, which is a google font. To ensure that it works on other systems, we are embedding this font into the file.
However, the converted DOCX file is not rendering this font properly. We are also getting the following warning messages when embedding the fonts:-
“Warning: FontEmbedding - Font subset embedded for font ‘Noto Serif JP’ due to exceeding the maximum allowed size of 41,94,304 bytes.”
This indicates that the fonts have been subset, but even with subsetting, the Japanese characters in the file should come correctly.
I have provided the Input HTML and Output DOCX files in an attached zip file. Also, I have provided a zip file of a sample .NET 8 Console App to reproduce this issue if required. If anything else is required, kindly inform us. Looking forward to hearing from you!
Thanks,
Mohit Datta
ActualInputOutput.zip (656.7 KB)
SampleNETConsoleApp => https://drive.google.com/file/d/1Ej9p6ZmQ9YMus0e0HvJ6jDd_oiem83TT/view?usp=sharing
@mohit.datta
To address the issue of the “Noto Serif JP” font not rendering properly in your DOCX files, especially when embedding the font, you can follow these steps:
-
Font Embedding: Ensure that you are correctly embedding the font in your DOCX file. The warning message you received indicates that the font has been subsetted due to its size exceeding the allowed limit. This means that only a portion of the font is being embedded, which may lead to missing characters, especially for complex scripts like Japanese.
-
Load Noto Fallback Settings: To improve the rendering of Japanese characters, you can utilize the Noto fallback settings provided by Aspose.Words. This feature allows the document to use fallback fonts when certain characters are not available in the primary font. Here’s how you can implement this in your .NET application:
using Aspose.Words;
using Aspose.Words.Fonts;
// Load your document
Document doc = new Document("path/to/your/document.docx");
// Create font settings and load Noto fallback settings
FontSettings fontSettings = new FontSettings();
fontSettings.FallbackSettings.LoadNotoFallbackSettings();
// Assign the font settings to the document
doc.FontSettings = fontSettings;
// Save the document
doc.Save("path/to/save/modified_document.docx");
-
Check Font Sources: Make sure that the font source is correctly set up in your application. You can specify additional font folders where Aspose.Words can look for fonts. This can help in ensuring that the required fonts are available during rendering.
-
Font Compatibility: If the problem persists, consider checking for compatibility issues with the font file itself. Ensure that the font file is not corrupted and is compatible with the Aspose.Words library.
-
Testing on Different Systems: Since you mentioned that the issue occurs on other systems, ensure that the “Noto Serif JP” font is installed on those systems as well. If the font is not available, the fallback mechanism will be triggered.
By following these steps, you should be able to resolve the rendering issues with the “Noto Serif JP” font in your DOCX files. If you continue to experience problems, consider reaching out to Aspose support with specific details about your implementation and any error messages you encounter.
Sources:
[1]: Load Noto Fallback Settings | Aspose.Words Document Processing API
[2]: FontSettings Class | Aspose.Words for .NET
@Professionalize.Discourse
This issue is occurring because the font files are too large. Hence, the files are getting subset. However, the subsetting seems to not be correct because the Japanese characters don’t match with their HTML counterparts.
Kindly suggest if it would be possible to improve the automatic subsetting in this scenario.
@mohit.datta MS Word limits the font size for full embedding. If font size exceeds this value, it embeds only used characters, despite a save option is set to ‘FullEmbedding’. Aspose.Words does the same.
As I can see font subset is embedded properly. I have used the following code for testing:
// 'C:\Temp\fonts' folder contains Noto Serif JP font
FontSettings.DefaultInstance.SetFontsSources(new FontSourceBase[] { new SystemFontSource(), new FolderFontSource(@"C:\Temp\fonts", true) });
Document doc = new Document(@"C:\Temp\in.html");
doc.FontInfos.EmbedTrueTypeFonts = true;
doc.FontInfos.SaveSubsetFonts = false;
doc.WarningCallback = new FontSubstitutionWarningCallback();
doc.Save(@"C:\Temp\out.docx");
Then I use the following code to convert the resulting document to PDF:
// Remove all font sources so only embedded fonts are used for rendering.
FontSettings.DefaultInstance.SetFontsSources(new FontSourceBase[] { });
Document doc = new Document(@"C:\Temp\out.docx");
doc.WarningCallback = new FontSubstitutionWarningCallback();
doc.Save(@"C:\Temp\out.pdf");
As I can see in both DOCX and PDF Japanese characters are rendered properly.
out.docx (180.4 KB)
out.pdf (154.9 KB)
Hello @alexey.noskov
Thanks for the quick reply! Regarding the subsetting, I downloaded the DOCX you attached, and for me, the characters differ from those in HTML still
. This is visible clearly when comparing the name in this document (on the top right column). I have attached a screenshot where I have highlighted the difference in glyphs for reference.
Kindly confirm if you are facing the same at your end as well.
@mohit.datta No, I do not see such problem on my side. Even in the attached PDF the characters looks the same as in source HTML:
@alexey.noskov PDF looks fine in my case as well. The font is not working at our side for DOCX files.
Would you have any idea why fonts are not looking the same as HTML in the DOCX files downloaded in our end, and how we can solve for us and our users?
@mohit.datta It is hard to answer this question. This can be related to MS Word version or the rules applied by MS Word when the document contains embedded fonts. On my side both MS Word and PDF documents look the same.
Could you please try converting the output DOCX document to PDF using MS Word in the environment where fonts are displayed improperly? This will help us to understand which font MS Word uses instead of the embedded ones.
@alexey.noskov Thanks for the suggestion. I tried converting the DOCX to PDF using MS Word, and it is using some random fonts (CIDFont+F1, CIDFont+F2, etc). I have attached the converted PDF and the screenshot below.
FontIssue.pdf (668.1 KB)
@mohit.datta You are using “Print to PDF”. Could you please try using “Save As PDF”?
@alexey.noskov ohh okay. I misunderstood. 
This is the file I am getting with Save As PDF option.
FontIssue-SaveAsPDF.pdf (717.3 KB)
@mohit.datta Could you please try opening the following document on your side?
ms.docx (3.3 MB)
Does it look correct on your side? This is the same document saved with MS Word with small modification.
@alexey.noskov Yes, this is looking fine at my side as well.
What is the modification that you did, and would it be possible for us to do this programmatically?
@mohit.datta I made modification in MS Word in the document (added whitespace saved then removed whitespace and saved the document again). So I do not think there is way to resolve the problem programmatically. We will further investigate the problem.
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): WORDSNET-28233
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
Thanks for your prompt support @alexey.noskov ! Kindly update us once this issue is fixed.
1 Like