Convert every document from doc to chm

     
*

Introduction

Word2CHM is a open source C# program which can convert a Microsoft Word document (in 2000/2003 format) to a CHM document. It requires HTML Help Workshop and Microsoft Word 2003.

This is a screen snapshot.

Background

Many people write customer help documents with Microsoft Word, because Microsoft Word is very fit to lớn write documents include text, images & tables.

Bạn đang xem: Convert every document from doc to chm

But many customers did not want read help documents in Microsoft Word format, but they like CHM format. So it is useful to convert a Microsoft Word document to lớn a CHM document. This is why I built Word2CHM.

Word2CHM

In Word2CHM, there are three steps in converting a Microsoft Word document khổng lồ a CHM document. The first is to convert a Microsoft Word document to lớn a single HTML tệp tin, the second is to split a single HTML file lớn multi HTML files, và third is khổng lồ compile multi HTML files to lớn a single CHM file.

First, Convert Microsoft Word Document to lớn a Single HTML File

Microsoft Word application supports OLE automatic technology, a C# program can host a Microsoft Word application, open Microsoft Word binary document and save sầu as a HTML file.

There is some sample C# code that hosts a Microsoft Word application.


C#
Copy Code

private bool SaveWordToHtml(string docFileName, string htmlFileName) // check doc tệp tin name if (System.IO.File.Exists(docFileName) == false ) this.Alert("File "" + docFileName + "" not exist!"); return false; // kiểm tra output directory string dir = System.IO.Path.GetDirectoryName(htmlFileName); if (System.IO.Directory.Exists(dir) == false ) this.Alert("Directory "" + dir + "" not exist!"); return false; object trueValue = true; object falseValue = false; object missValue = System.Reflection.Missing.Value; object fileNameValue = docFileName; // create word application instance Microsoft.Office.Interop.Word.Application tiện ích = new Microsoft.Office.Interop.Word.ApplicationClass(); // phối word application visible // if something is error and quit , user can cthua trận word application by self. app.Visible = true; // open document Microsoft.Office.Interop.Word.Document doc = tiện ích.Documents.Open( ref fileNameValue, ref missValue, ref trueValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue); // save a html file object htmlFileNameValue = htmlFileName; object format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatFilteredHTML; doc.SaveAs( ref htmlFileNameValue , ref format, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue, ref missValue); // cthua document & release resource doc.Close(ref falseValue, ref missValue, ref missValue); ứng dụng.Quit(ref falseValue, ref missValue, ref missValue); System.Runtime.InteropServices.Marshal.ReleaseComObject(doc); System.Runtime.InteropServices.Marshal.ReleaseComObject(app); return true;In this C# source code, it is important khổng lồ Hotline function ReleaseComObject. Using ReleaseComObject function, a program can release all resources use by Word application.

Xem thêm: Lý Thuyết Đồ Thị Của Hàm Số Y Ax2 (A &Ne; 0), Toán 9 Bài 2: Đồ Thị Của Hàm Số Y = Ax^2 (A ≠ 0)

In many programs which host Microsoft Word application (also Excel application), when program does not need Word application, program can Call Quit function of Word application. But sometimes, the word process is still alive sầu, this can lead lớn a very serious resource leak. Using ReleaseComObject can reduce this risk.

Second, Split a Single HTML File lớn Multi HTML File

The HTML tệp tin generates a Word application that includes all content of a Word document. For example, a Word document contains the following content:

I save sầu this document as a filtered HTML tệp tin, the HTML tệp tin source code is as follows:


XML
File0.html
HTML

C#

int index2 = strBody.IndexOf(">");int index3 = strBody.IndexOf(" + Nativemàn chơi + ">");//read text in as topic titlestring strTitle = strBody.Substring(index2 + 1, index3 - index2 - 1);while (strTitle.IndexOf(") >= 0) int index4 = strTitle.IndexOf("); int index5 = strTitle.IndexOf(">", index4); strTitle = strTitle.Remove(index4, index5 - index4 + 1);strBody = strBody.Substring(index3 + 5);index = strBody.IndexOf(");if (index == -1) index = strBody.Length;//read topic contentstring strContent = strBody.Substring(0, index);Using this C# code, Word2CHM splits HTML tệp tin by using HTML tag H1, H2, H3 and Hn. And mix each HTML document’s title as nội dung between HTML tag Hn.

Third. Compile Multi HTML files lớn a Single CHM File

Word2CHM cannot compile multi HTML tệp tin to lớn a single CHM tệp tin by itself. It calls “HTML Help workshop” lớn generate CHM file. HTML Help workcửa hàng is a sản phẩm of Microsoft. It can compile multi HTML tệp tin lớn a CHM file, It saves settings in a help project tệp tin whose extension name is hhp. Word2CHM uses the following C# source to lớn generate HHP tệp tin.


C#

using (System.IO.StreamWriter myWriter = new System.IO.StreamWriter( strHHP, false, System.Text.Encoding.GetEncoding(936))) myWriter.WriteLine(""); myWriter.WriteLine("Compiled file=" + System.IO.Path.GetFileName(strCHM)); myWriter.WriteLine("Contents file=" + System.IO.Path.GetFileName(strHHC)); myWriter.WriteLine("Default topic=" + this.DefaultTopic); myWriter.WriteLine("Default Window=main"); myWriter.WriteLine("Display compile progress=yes"); myWriter.WriteLine("Full-text search=" + (this.FullTextSearch ? "Yes" : "No")); myWriter.WriteLine("Binary TOC=" + (this.BinaryToc ? "Yes" : "No")); myWriter.WriteLine("Auto Index=" + (this.AutoIndex ? "Yes" : "No")); myWriter.WriteLine("Binary Index=" + (this.BinaryIndex ? "Yes" : "No")); myWriter.WriteLine("Title=" + this.Title); myWriter.WriteLine(""); foreach (CHMNode node in nodes) if (HasContent(node.Local)) if (myFiles.Contains(node.Local) == false) myFiles.Add(node.Local); foreach (string fileName in myFiles) myWriter.WriteLine(fileName); Word2CHM also generates HHC file to describe topic structure of CHM file. HHC file in XML format, Word2CHM uses the following C# code to generate HHC XML nội dung.


C#

System.Xml.XmlDocument doc = RootElement.OwnerDocument;System.Xml.XmlElement ulElement = doc.CreateElement("UL");RootElement.AppendChild(ulElement);foreach (CHMNode node in nodes) System.Xml.XmlElement liElement = doc.CreateElement("LI"); ulElement.AppendChild(liElement); System.Xml.XmlElement objElement = doc.CreateElement("OBJECT"); liElement.AppendChild(objElement); objElement.SetAttribute("type", "text/sitemap"); AddParamElement(objElement, "Name", node.Name); if (HasContent(node.Local)) AddParamElement(objElement, "Local", node.Local.Replace("\", "/")); if (HasContent(node.ImageNumber)) AddParamElement(objElement, "ImageNumber", node.ImageNumber); if (node.Nodes.Count > 0) ToHHCXMLElement(node.Nodes, ulElement); After generating an HHP. tệp tin & HHC tệp tin, Word2CHM calls HHC.exe to open HHP file và generates CHM file, usually HHC.exe exists in directory “C:Program FilesHTML Help Workshop”. There are C# sources to generate CHM tệp tin.


C#

ProcessStartInfo start = new ProcessStartInfo(compilerExeFileName, """ + strHHP + """);start.UseShellExecute = false;start.CreateNoWindow = true;start.RedirectStandardOutput = true;start.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;System.Diagnostics.Process proc = System.Diagnostics.Process.Start(start);proc.PriorityClass = System.Diagnostics.ProcessPriorityClass.BelowNormal;this.strOutputText = proc.Standardđầu ra.ReadToEnd();After completing these three steps, Word2CHM can convert a Word document to lớn a CHM tệp tin.


Chuyên mục: