Oracle8i interMedia Text Reference
Release 8.1.5

A67843-01

Library

Product

Contents

Index

Prev Next

D
Loading Examples

This appendix provides examples of how to load text into a text column. It also describes the structure of ctxload import files:

SQL INSERT Example

A simple way to populate a text table is to create a table with two columns, id and text, using CREATE TABLE and then use the INSERT statement to load the data. This example makes the id column the primary key, which is required constraint for a Text table. The text column is VARCHAR2:

create table docs (id number primary key, text varchar2(80));

To populate the text column, use the INSERT statement as follows:

insert into docs values(1, 'this is the text of the first document');
insert into docs values(12, 'this is the text of the second document');

SQL*Loader Example

The following example shows how to use SQL*Loader to load mixed format documents from the operating system to a BLOB column. The example has two steps:

Creating the Table

This example loads to a table articles_formatted created as follows:

CREATE TABLE articles_formatted ( 
  ARTICLE_ID   NUMBER PRIMARY KEY , 
  AUTHOR       VARCHAR2(30), 
  FORMAT       VARCHAR2(30), 
  PUB_DATE     DATE, 
  TITLE        VARCHAR2(256), 
  TEXT         BLOB 
);

The article_id column is the primary key, which is required in a Text table. Documents are loaded in the text column, which is of type BLOB.

Issuing the SQL*Loader Command

The following command starts the loader, which reads the control file LOADER1.DAT:

sqlldr userid=demo/demo control=loader1.dat log=loader.log

Example Control File: LOADER1.DAT

This SQL*Loader control file defines the columns to be loaded and instructs the loader to load the data line by line from LOADER2.DAT into the articles_formatted table. Each line in LOADER2.DAT holds a comma separated list of fields to be loaded.

-- load file example
INFILE 'loader2.dat'
INTO TABLE articles_formatted
APPEND
FIELDS TERMINATED BY ','
(article_id SEQUENCE (MAX,1),
 author CHAR(30),
 format,
 pub_date SYSDATE,
 title,
 ext_fname FILLER CHAR(80),
 text LOBFILE(ext_fname) TERMINATED BY EOF)

This control file instructs the loader to load data from LOADER2.DAT to the articles_formatted table in the following way:

  1. The ordinal position of the line describing the document fields in LOADER2.DAT is written to the article_id column.

  2. The first field on the line is written to author column.

  3. The second field on the line is written to the format column.

  4. The current date given by SYSDATE is written to the pub_date column.

  5. The title of the document, which is the third field on the line, is written to the title column.

  6. The name of each document to be loaded is read into the ext_fname temporary variable, and the actual document is loaded in the text BLOB column:

Example Data File: LOADER2.DAT

This file contains the data to be loaded into each row of the table, articles_formatted.

Each line contains a comma separated list of the fields to be loaded in articles_formatted. The last field of every line names the file to be loaded in to the text column:

Ben Kanobi, plaintext,Kawasaki news article,../sample_docs/kawasaki.txt,
Joe Bloggs, plaintext,Java plug-in,../sample_docs/javaplugin.txt,
John Hancock, plaintext,Declaration of Independence,../sample_docs/indep.txt,
M. S. Developer, Word7,Newsletter example,../sample_docs/newsletter.doc,
M. S. Developer, Word7,Resume example,../sample_docs/resume.doc,
X. L. Developer, Excel7,Common example,../sample_docs/common.xls,
X. L. Developer, Excel7,Complex example,../sample_docs/solvsamp.xls,
Pow R. Point, Powerpoint7,Generic presentation,../sample_docs/generic.ppt,
Pow R. Point, Powerpoint7,Meeting presentation,../sample_docs/meeting.ppt,
Java Man, PDF,Java Beans paper,../sample_docs/j_bean.pdf,
Java Man, PDF,Java on the server paper,../sample_docs/j_svr.pdf,
Ora Webmaster, HTML,Oracle home page,../sample_docs/oramnu97.html,
Ora Webmaster, HTML,Oracle Company Overview,../sample_docs/oraoverview.html,
John Constable, GIF,Laurence J. Ellison : portrait,../sample_docs/larry.gif,
Alan Greenspan, GIF,Oracle revenues : Graph,../sample_docs/oragraph97.gif,
Giorgio Armani, GIF,Oracle Revenues : Trend,../sample_docs/oratrend.gif,

Structure of ctxload Thesaurus Import File

The import file must use the following format for entries in the thesaurus:

phrase
 BT broader_term
 NT narrower_term1
 NT narrower_term2
. . .
 NT narrower_termN

 BTG broader_term
 NTG narrower_term1
 NTG narrower_term2
. . .
 NTG narrower_termN

 BTP broader_term
 NTP narrower_term1
 NTP narrower_term2
. . .
 NTP narrower_termN

 BTI broader_term
 NTI narrower_term1
 NTI narrower_term2
. . .
 NTI narrower_termN

 SYN synonym1
 SYN synonym2
. . .
 SYN synonymN
 
 USE synonym1  or  SEE synonym1 or  PT  synonym1

 RT related_term1
 RT related_term2
. . .
 RN related_termN

 SN text

 language_key term
phrase

is a word or phrase that is defined as having synonyms, broader terms, narrower terms, and/or related terms.

In compliance with ISO-2788 standards, a TT marker can be placed before a phrase to indicate that the phrase is the top term in a hierarchy; however, the TT marker is not required. In fact, ctxload ignores TT markers during import.

A top term is identified as any phrase that does not have a broader term (BT, BTG, BTP, or BTI).


Note:

The thesaurus query operators (SYN, PT, BT, BTG, BTP, BTI, NT, NTG, NTP, NTI, and RT) are reserved words and, thus, cannot be used as phrases in thesaurus entries.  


BT, BTG, BTP, BTI

are the markers that indicate broader_termN is a broader (generic|partitive|instance) term for phrase.

NT, NTG, NTP, NTI

are the markers that indicate narrower_termN is a narrower (generic|partitive|instance) term for phrase.

If phrase does not have a broader (generic|partitive|instance) term, but has one or more narrower (generic|partitive|instance) terms, phrase is created as a top term in the respective hierarchy (in an interMedia Text thesaurus, the BT/NT, BTG/NTG, BTP/NTP, and BTI/NTI hierarchies are separate structures).

SYN

is a marker that indicates phrase and synonymN are synonyms within a synonym ring.


Note:

Synonym rings are not defined explicitly in interMedia Text thesauri. They are created by the transitive nature of synonyms.  


USE SEE PT

are markers that indicate phrase and synonym1 are synonyms within a synonym ring (similar to SYN).

The markers USE, SEE or PT also indicate synonym1 is the preferred term for the synonym ring. Any of these markers can be used to define the preferred term for a synonym ring.

RT

is the marker that indicates related_termN is a related term for phrase.

SN

is the marker that indicates the following text is a scope note (i.e. comment) for the preceding entry.

broader_termN

is a word or phrase that conceptually provides a more general description or category for phrase. For example, the word elephant could have a broader term of land mammal.

narrower_termN

is a word or phrase that conceptually provides a more specific description for phrase. For example, the word elephant could have a narrower terms of indian elephant and african elephant.

synonymN

is a word or phrase that has the same meaning for phrase. For example, the word dog could have a synonym of canine.

related_termN

is a word or phrase that has a meaning related to, but not necessarily synonymous with phrase. For example, the word dog could have a related term of wolf.


Note:

Related terms are not transitive. If a phrase has two or more related terms, the terms are related only to the parent phrase and not to each other.  


language_key term

term is the translation of phrase into the language specified by language_key.

Alternate Hierarchy Structure

In compliance with thesauri standards, the load file supports formatting hierarchies (BT/NT, BTG/NTG, BTP, NTP, BTI/NTI) by indenting the terms under the top term and using NT (or NTG, NTP, NTI) markers that include the level for the term:

phrase
   NT1 narrower_term1
      NT2 narrower_term1.1
      NT2 narrower_term1.2
          NT3 narrower_term1.2.1
          NT3 narrower_term1.2.2
   NT1 narrower_term2
   . . .
   NT1 narrower_termN

Using this method, the entire branch for a top term can be represented hierarchically in the load file.

Usage Notes for Terms in Import Files

The following conditions apply to the structure of the entries in the import file:

Usage Notes for Relationships in Import Files

The following conditions apply to the relationships defined for the entries in the import file:

Examples of Import Files

This section provides three examples of correctly formatted thesaurus import files.

Example 1 (Flat Structure)

cat
 SYN feline
 NT domestic cat
 NT wild cat
 BT mammal
mammal
 BT animal
domestic cat
 NT Persian cat
 NT Siamese cat
wild cat
 NT tiger
tiger
 NT Bengal tiger
dog
 BT mammal
 NT domestic dog
 NT wild dog
 SYN canine
domestic dog
 NT German Shepard
wild dog
 NT Dingo

Example 2 (Hierarchical)

animal
   NT1 mammal
        NT2 cat
             NT3 domestic cat
                  NT4 Persian cat
                  NT4 Siamese cat
             NT3 wild cat
                  NT4 tiger
                       NT5 Bengal tiger
        NT2 dog
             NT3 domestic dog
                  NT4 German Shepard
             NT3 wild dog
                  NT4 Dingo
cat
 SYN feline
dog
 SYN canine

Example 3

35MM CAMERAS
 BT MINIATURE CAMERAS
CAMERAS
 BT OPTICAL EQUIPMENT
 NT MOVING PICTURE CAMERAS
 NT STEREO CAMERAS
LAND CAMERAS
 USE VIEW CAMERAS
VIEW CAMERAS
 SN Cameras with through-the lens focusing and a range of 
 SN movements of the lens plane relative to the film plane
 UF LAND CAMERAS
 BT STILL CAMERAS

Structure of ctxload Text Load File

The ctxload text load file must use the following format for each document, as well as any structured data associated with the document:

<TEXTSTART: col_name1=doc_data, col_name2=doc_data,...col_nameN=doc_data>
text. . . 
<TEXTEND>

where:

<TEXTSTART: ... >

is a header marker that indicates the beginning of a document. It also may contain one or more of the following fields used to specify structured data for a document:

col_name

is the name of a column that will store structured data for the document.

doc_data

is the structured data that will be stored in the column specified in col_name.

text

is the text of the document to be loaded or the name (and location, if necessary) of an operating system file containing the text to be loaded.


Note:

The data in text (either a string of text or a file name pointer) is treated by ctxload as literal data, including any non-alphanumeric characters or blank spaces that may occur. As a result, you must ensure that text exactly represents the data you wish ctxload to process.

For example, if you use ctxload to load text from separate files, the file names in the load file must exactly represent the name(s) of the operating-system file(s) containing the text. If any blank spaces are included in a file name, ctxload cannot locate the file and the text is not loaded.  


<TEXTEND>

indicates the end of the document.

Load File Structure

The following conditions apply to the structure of the load file:

Load File Syntax

The following conditions apply to the syntax utilized in the text load file:

Example of Embedded Text in Load File

The following example illustrates a correctly formatted text load file containing structured employee information, such as employee number (1000, 1024) and name (Joe Smith, Mary Jones), and the text for each document:

<TEXTSTART: EMPNO=1000, ELNAME='Smith', EFNAME='Joe'>
Joe has an interesting resume, includes...cliff-diving.
<TEXTEND>
<TEXTSTART: EMPNO=1024, EFNAME='Mary', ELNAME='Jones'>
Mary has many excellent skills, including...technical,
marketing, and organizational.  Team player.
<TEXTEND>

Example of File Name Pointers in Load File

The following example illustrates a correctly formatted text load file containing structured employee information, such as employee number (1000, 1024) and name (Joe Smith, Mary Jones), and a file name pointer for each document.

<TEXTSTART: EMPNO=1024, EFNAME='Mary', ELNAME='Jones'>
mjones.doc
<TEXTEND>
<TEXTSTART: EMPNO=1000, EFNAME='Joe', EFNAME='Smith'>
jsmith.doc
<TEXTEND>


Note:

To use the load file in this example, you would have to specify the -separate argument when executing ctxload.  







Prev

Next
Oracle
Copyright © 1999 Oracle Corporation.

All Rights Reserved.

Library

Product

Contents

Index