Introduction to the Knowledge Builder

This topic provides an introduction to the Knowledge Builder feature, indexing, and expression criteria.

Overview

Knowledge Builder is the document search, retrieval, and viewing component of Cypress. Knowledge Builder enables any authorized user in your business to search, retrieve, and view documents stored in an Cypress DocuVault, regardless of the platform or application used to create the document. Knowledge Builder also offers a variety of features for retrieved documents (e.g., printing, copying, saving, and finding specific keywords). Knowledge Builder also adds indexing capabilities for a variety of Cypress modules.

Knowledge Builder simplifies the task of finding and using the knowledge contained within documents. Users no longer need to keep physical files and folders stacked on their desks. Users no longer need to know the location of an electronic file in order to view its contents. Users no longer need the original application to view a document. Administrators no longer need to implement multiple storage solutions unique to particular platforms or applications. Knowledge Builder provides a single, common method for searching, retrieving, and viewing all corporate knowledge, even if documents are created on a variety of platforms, including mainframe, UNIX, AS/400, and Windows.

Think of the Knowledge Builder as a window on the Cypress DocuVault. The DocuVault is the repository (archive) for all documents processed by Cypress. The DocuVault stores all information necessary to recreate a document with 100% fidelity. Documents are stored as individual pages, but the collection of pages are logically grouped within the DocuVault as a document. When you search a DocuVault using Knowledge Builder, you are actually searching and retrieving individual pages that contain the information you are looking for. Your search results return a document, but the contents of that document will be only those pages that match your search criteria. This approach insures that you only get the information you need, and eliminates wasted time searching through unwanted pages of information.

Knowledge Builder provides an interface for indexing, searching the contents of one or more DocuVaults, refining those searches (if desired), and displaying the retrieved pages of information. Knowledge Builder is also used to create indexes that can be applied to report files processed by Cypress’s Report Distribution Manager (RDM) and the resulting subreports.

Indexing is a way to associate information on a page with a search term or category (e.g., date, account number, or product ID). Indexes create a link between search terms and pages stored in a DocuVault. For example, you might want to retrieve all pages stored in a DocuVault related to a specific account. To retrieve these documents, instruct Knowledge Builder to search the Account Number index for a specific string. Any page in the DocuVault that has set the Account Number index to that string will be returned for viewing.

As documents are processed by Cypress, they are indexed according to the requirements you’ve defined. You can instruct Cypress to index every word in a document (called full-text indexing) and/or index only specific strings according to a custom index you create (called data indexing). Additionally, Cypress automatically indexes select attributes for every document it processes regardless of the methods you choose (system indexes). Manual indexing is also supported, allowing Cypress users to associate their own index terms with documents and pages, regardless if the term appears in the document or page.

Implementing an Indexing Strategy

You should consider how to best index documents based on the volume of data you will be storing in a DocuVault and the needs of your users. These are some suggested strategies, depending on the size of your site and the types of indexes you expect to implement.

Large-scale Customers

Large-scale customer sites typically choose full-text indexing for general LAN-based office documents and data indexing for large report applications processed by RDM. While this approach requires users to know which search method to use for certain documents, it provides payoffs for storage efficiency.

For example, indexing every word in large report application that runs every day can result in indexing far more data than is required, and will consume a good deal of disk space. Data indexing, on the other hand, only indexes specific information on a page, not every word on a page. This selective indexing is made possible by a Cypress construct called a region. A region is an area (X,Y coordinate pairs) on a page whose contents are captured by Cypress. Regions are used to extract data from a page to be associated with a search term (2002/11/16, for example, can be stored in the Date index). However, regions must appear in the same location throughout a document. While regions are simple to use for highly-structured report files and forms, they are not ideal for general office documents as the structure and organization of these documents varies greatly.

Small- to Medium-sized Customers

Small- to medium-sized customer sites might wish to consider full-text indexing all documents. Full-text indexing generally offers the best ease-of-use as users simply enter the words that must, may, or must not appear on a page. Even though you will be indexing more data than necessary for highly-structured report applications, index data is highly compressed and will not likely pose any additional storage requirements.

Implementing Full-text Indexing and Data Indexing

If you plan to index your documents using both techniques, make it clear to your users which types of documents were text-indexed and which types were data-indexed. Although text queries can be performed on any document, it is generally much more efficient to retrieve by data index if data indexes are available.

Implementing End-User (Manual) Indexing

Knowledge Builder enables documents and pages to be manually indexed, allowing users to create and associate their own index terms with documents and pages being viewed. These terms can then be saved in the full-text index or a data index. If you do not want to give users the ability to add terms to the indexes, you need to clear the Append option on the associated Security tab.

System Indexes

This table describes the built-in system indexes that you can use to create Knowledge Builder or Cypress.Web queries:

Index	Description	Data Type
Document Class	Defines common attributes and rules, such as indexes, retention, security, etc.	Numeric
Document Class Revision Number	The unique sequence number for a Document Class.	Numeric
Document Creation Time	The time at which a document was captured in a DocuVault.	Timestamp
Document Creator	The name of the logged-on user who submitted the document to Cypress.	Text
Document Database Name	The name of the DocuVault where the document was originally archived. The Database Name of the document may not be the same as the DocuVault where the document resides due to document replication. If document replication is enabled, you must use both the Document Database Name and Document ID to uniquely identify a document.	Text
Document Distribution Group	The name of the RDM report group that processed the document.	Text
Document Distribution Report	The name of the RDM report definition that processed the document.	Text
Document Distribution Sub-Report	The name of the RDM subreport definition that processed the document.	Text
Document ID	The unique identifier to a document within one DocuVault. You can use this index in a query criteria to find a document. If document replication is enabled, you must use both the Document Database Name and Document ID to uniquely identify a document.	Numeric
Document Logical Page Count	The number of logical pages within the document. Logical pages are single, nonphysical pages that are fully composed. You can have one or more logical pages positioned on a presentation page.	Numeric
Document Partition Number	The number of the partition that contains this document. Defaults to 0 if the document is not in a partition.	Numeric
Document Presentation Page Count	Reserved for future use. Currently returns the same value as Document Production Page Count.	Numeric
Document Production Page Count	The number of production pages within the document. A production page is a page (side) delivered to a destination, and can be composed of one or more logical pages. For example, a duplex document with four logical pages on each side results in two production pages.	Numeric
Document Scheduled Deletion Time	The time at which the document is to be deleted from a DocuVault.	Timestamp
Document System Type	The system type of the document: 1: Cypress DDOC document 2: Original source file that you can launch in its native application 3: Discrete Cypress DDOC document 6: Transparent original source file that you can output to a printer using the transparent device DLL. You cannot launch this file in its native application.	Numeric
Document Title	The file name of the document submitted to Cypress.	Text
Email Body Full Text	The full text index for the e-mail body.	Text
Email Subject Full Text	The full text index for the e-mail subject.	Text
Email Sender Address	The e-mail address of the sender.	Text
Email Sender Domain	The domain address portion of the sender’s e-mail address.	Text
Email Receiver Address	The receiver’s e-mail address, which includes the addresses listed in the To, Cc and Bcc fields.	Text
Email Receiver Domain	The domain address portion of the receiver’s e-mail address.	Text
Workflow Document Class	Used to identify the workflow in document-created events.
Workflow Document Status	Used for polling archived documents.

Expression Criteria

Cypress uses expression criteria to control many aspects of the system. For example, you can use expression criteria to identify formats when you enable Auto Format, to identify reports and subreports, to split subreports, or to index, archive, and deliver the documents created by a Report Distribution Manager subreport.

You can use a simplified programming language to create expression criteria. Expression criteria can be simple, using operators similar to those used in C and C++. For example, if you want one of your subreports to create a new document each time a page contains the heading Branch Loans Summary, you could enter this expression in the subreport’s Break Criteria field:

region_text("Heading") == "Branch Loans Summary"

String Operators

This table describes the operators you can use with strings when creating expression criteria:

Operator	Description
==	Equal to
!=	Not equal to
<	Less than
<=	Less than or equal to
>	Greater than
>=	Greater than or equal to
match	Like
mismatch	Not like

The match and mismatch operators work only with strings. Like == and !=, they test for equality, but they treat all occurrences of the asterisk (*) and the question mark (?) in the second operand as wildcard characters. The asterisk will match zero or more occurrences of any character, and the question mark will match a single occurrence of any character.

To include a quotation mark as part of a string literal, use two sets of quotation marks together. For example, if you want to include the quotation marks in the string “Alice in Wonderland”, enter this in the expression criteria:

""Alice in Wonderland""

Numeric Operators

This table describes the operators you can use with numbers when creating expression criteria:

Operator	Description
==	Equal to
!=	Not equal to
<	Less than
<=	Less than or equal to
>	Greater than
>=	Greater than or equal to
+	Plus
-	Minus
*	Multiplied by
/	Divided by
%	Modulo

Boolean Operators

A Boolean expression is any expression that can evaluate to true or false. This table describes the operators you can use on Boolean expressions when creating expression criteria:

Operator	Description
&&	AND
\|\|	OR
!	NOT