Building Maps of Hyperspace

Célia Ghedini Ralhagif and Anthony G. Cohngif


Division of Artificial Intelligence
School of Computer Studies
University of Leeds, Leeds LS2 9JT, UK
{ghedini, agc}@comp.leeds.ac.uk
tel. +44 (113) 233 5482 fax: +44 (113) 233 5468

Abstract:

This paper addresses some particular issues related to the difficult task of automatically locating and structuring information available on a distributed multimedia hypertext system like the World-Wide Web (WWW). It presents a new approach to this problem which coordinates aspects of automatic computation of relations between nodes in hyperspace through dynamic linking using intelligent mapping of the domain material and by the application of spatial reasoning. The results reported here originate from a multi-disciplinary research that involves cognitive aspects of human memory recovery and association, automatic linking of knowledge from a wide variety of sources (expressed in multiple formats), and an adequate visual interface to display large maps of supporting material.

The original purpose of this research was to help the user during the very early stages of development of knowledge-based systems with an intelligent hypermedia environment. But we also thought it would be very useful to Internet users to have an automatic structurer of information to help with the visualization of the hyperspace and to help focus the search.

Keywords:

attributes, contextual probes, dynamic linking, hypermedia, hyperspace, hypertext, probe, similarity threshold, qualitative spatial reasoning, spatial relations.

Introduction

Despite its information-age pedigree, hypertext as an information retrieval model has a long history. Originally introduced by Vannevar Bush, extended and refined by Theodore Nelson, D. Englebart, and others, hypertext systems were initially defined as a valuable interactive approach for presenting text and graphic information by allowing users to jump from a given subject to related ideas ([5],[9]). Ted Nelson, considered to be the modern originator of the term, tends to include all media in the term `hypertext'. However, purists reserve this term for text-only-based systems.

The basis of hypertext technology has changed little since its launch. The focus continues to be on making information available to people in a way that contributes to their understanding. The advent of powerful personal computers and the ability to handle media such as images, sound and video has made the technology more accessible. With this expanded media capability has come the term hypermedia ([13]).

Before hypertext and hypermedia technology the searching process was limited by its own logic. People had to learn to work within parameters laid down by the programmers who originally developed the tools being used. The requested information could be accessed locally or broadly distributed throughout the Internet. Whatever the case, the steps taken during the search were all pre-defined, forcing people to choose between gradually narrowing choices on fixed screens of menus.

Hypertext and hypermedia are ways to transcend some of the searching process limitations. Using hypertext, you can move through structured information - not sequentially, but by making jumps between the various kinds of data you need. With hypermedia you can link information expressed in multiple formats. One of the major attractions of these technologies is the fact that they support a natural method of processing information. Cognitively speaking, they work rather similarly to the human mind, which does not function in a structured linear way, but rather by means of ``association''.

This aspect of association is crucial since nobody can anticipate every necessary hyper connection. So if the hyper links built are less than optimum, then browsing runs into the same frustrations inherent in the original simple searching processesgif. This research project has concentrated on the association aspect: we have built a model for the automatic computation of ``relatedness'' between hyper links based on conceptual connectivity or semantic proximity (dynamic linking).

In the traditional kinds of hyperspace tools referred to above, the ability to automatically build semantic maps of the hyperspace is missing. However, evidence gained from experiments show that readers of a hypertext information model cognitively represent its structure in their mindsgif. This spatial cognitive map created by readers has implications for the structure of hypertext material, as well as for the types of navigation tools that should be provided. This research project is concerned with the creation of these spatial cognitive maps in the sense that it simulates the mapping process through the use of qualitative spatial reasoning.

An Overview of the Approach

Building an automatic information structurer involves a data collection activity that implies finding ways to link the large amount of information that may be expressed in many different formats. Our approach to this problem is two fold. First we build a method of dynamic linking; secondly, if nodes are viewed as (spatial) regions then these dynamic links can be used as a way of topologically structuring the space (we model the topology of the hyperspace with qualitative spatial relations based on a primitive concept of `connection' between spatial regions).

Dynamic Linking

Dynamic linking is a method for the automatic computation of ``relatedness'' between nodes in hypertext where conceptual connectivity or semantic proximity is considered. This method is specially important in our approach as it deals with large hypertext/hypermedia systems where the activation of a node (a fragment of text, a frame, a screen) will automatically indicate to the user the most relevant nodes without the need for the link between the two nodes ever to have been explicitly coded.

In our approach, the use of dynamic linking to build the hyperspace map is based on a modified version of the computer model of human memory proposed by Hintzman [8]. Since the computer model used is a metaphor for the theory, it is appropriate to briefly overview the theoretical ideas that the model is intended to represent.

As Hintzman points out, his model originated from Richard Semon's Theory [17] which assumed that each learning experience leaves behind its own memory trace or ``engram''. Semon argued that the contents of consciousness are produced by a kind of resonant state (homophonygif) in which the common properties of the traces stand out and their distinctive properties are marked, so that what appears in consciousness is an abstraction, rather than the content of a particular memory trace.

Hintzman's contribution has been to cast this theory into the form of a computer model [7]. The simulation model of episodic memory, dubbed Minerva 2, was applied to the learning of concepts, as represented by the schema-abstraction task. In presenting the model here only the schema-abstraction aspects of his work will be covered.

In a Hintzman model a memory trace is a record of an experience or episode which is represented as a trace or vector, an ordered list of features or attributes. Every conscious experience gives rise to its own memory trace, no matter how similar it may be of an earlier one. Thus, phenomena that are repeated but nevertheless command attention will be represented in memory over and over again.

A probe is an active representation of an experience, in primary memorygif which is communicated in parallel to all traces in secondary memorygif. Each trace is assumed to be activated according to its similarity to the probe. Thus, traces sharing many properties with the probe are activated strongly, whereas traces that overlap little with the probe are activated hardly at all.

Hintzman's simulation model of human memory was used since it offers an approach to the problem of content-addressable retrieval from a large information base. One difference in the representation of attributes is that Hintzman uses n-tuples of +1's, -1's and 0's for each memory trace and probe to represent the features present, not present , or indeterminate. Hintzman was partly concerned with the phenomenon of loss of memory elements so he used a three-state attribute system whereas we represent simply the set of those present. Our representation makes it easier to delete or add attributes to frames, although it fails to distinguish between properties being not present and not known to be present. However, we have not needed to make this distinction in our applications.

We use Hintzman's model in the following waygif. An information base is composed of a set of framesgif (e.g. pages of information). Associated with each frame is a set of attributes. The constituent of a frame and its attributes constitutes a memory trace in Hintzman's model. A probe consists of a set of attributes which are then matched against all traces. The matching traces constitutes the set of dynamic links for the probe. These matching traces can be classified according to the degree of match.

For the process of trace retrieval, an index of similarity is computed between the probe and each trace. Each trace is associated with a frame of information, such as a paragraph, a page or a complete text depending on granularity specifications.

  
Figure 1: Trace Activation.

Fig. 1 shows the process of trace activation. Attributes are and memory traces/frames are shown from top to bottom. Allied to each frame is the URLgif address, the link name and a set of attributes (which are not all presented in the diagram for clarity purposes).

We now turn to the problem of measuring the degree of activation of a trace since the usefulness of a model (similarly to human memory) for patterned information depends on its ability for selective recall of the desired items. Thus, associative recall ought to consider some form of the concept of similarity [11]. We use the Tanimoto Similarity (S) measure in a similar manner to the work described in Chapter 12, Towards Intelligent Hypertext of [14]. If x and y are the set of attributes of the probe and memory trace respectively, then this similarity equation has the following form:

This function is independent of the length of the attribute sets and we have adopted it as one basic measure of frame activation. Note that and the degree of similarity grows directly with the value of S (e.g. represents higher similarity than S = 0).

In our similarity algorithm for each frame retrieval, an index of similarity is computed between the set of attributes from the probe and each frame. In this way, the set of activated frames constitutes the set of dynamic links associated with a particular probe. The user can set a similarity threshold below which frames are regarded as not activated. We might notice that the whole activation process is highly context-specific: how a concept emerges will vary according to which episodic or memory traces have been activated by the probe.

Qualitative Spatial Relations

We now turn to the problem of intelligent mapping of the domain material already created by dynamic linking which is done through the application of a theory of qualitative spatial reasoning. This intelligent navigation aid is intended to give a bird's-eye view of the content of a large multi-dimensional information space. This bird's-eye view is intended to help users during the navigation process.

When computing the similarity index between attributes (list of features) of a frame and the probe set, operations are carried out which can be used to induce a topological structure on the information space. A set of spatial relations are used based on the theory of space and time developed in a series of papers. The revised theory can be found in [16]gif. The theorygif has evaluated, extended and implemented a theory of space and time based on Clarke's ([2],[3]) calculus of individuals based on ``connection'' and is expressed in the many sorted logic LLAMA ([4]).

The basic part of the theory assumes a primitive dyadic relation: read as ` connects with ' which is defined on regions. Two axioms are used to specify that is reflexive and symmetric. In terms of points incident in regions, holds when the topological closure of the regions and share a common point. With the relation , eight jointly exhaustive and pairwise disjoint dyadic relations are defined. These relations describe differing degrees of connection between regions from being externally connected to sharing mutual parts and being identical.

In our application (similarly to Lehmann's work [12]) we currently use just five basic spatial relationsgif formed by disjoining certain pairs of the eight mentioned above. Given the set of attributes from an activated frame and the set of attributes from the probe we may say that the two sets either coincide , overlap , have proper containment , as is asymmetric supports an inverse , or are disjoint (empty intersection).

These five predicates can be formally defined as below, where and are auxiliary predicates used to define the others:

The qualitative closeness between the set of attributes from activated frames and the set of attributes from the probe corresponds to the left-to-right order in Fig. 2. Although, all relations may provide information to the user we may say that to have an empty intersection is worst, to overlap is better, to have proper containment is still better, and to coincide is perfection. Proper containment and its inverse are incomparablegif.

  
Figure 2: This figure depicts the possible transitions between the five binary spatial relations, assuming a notion of continuity [12].

Integrating Dynamic Linking and Spatial Relations

The use of Hintzman's simulation model of human memory in a hypermedia environment can be integrated nicely with the above spatial reasoning techniques. The idea of `relatedness' expressed by dynamic linking of frames of information can be viewed as implementing a conceptual `connection' between them, and thus providing an interpretation for the relationship.

A framework which computes similarities between frames of information, making use of their attributes in the exploitation of conceptual connectivity is useful to build an automatic map of hyperspace which is intended to help to direct the user.

Up to now we have considered a probe as an external new region with attributes itself (see Fig. 1). But we can change our view slightly if we consider contextual probes instead, as show in Fig. 3. Thus if a particular frame `Frame/Trace 2' is of interest at some time, we may want to know what other frames are related to it. In this case the attributes of `Frame/Trace 2' are the probe. A variant of this is that we might want to apply a `filter' - i.e. ignore some of the attributes or perhaps weight some more strongly than others.

  
Figure 3: Contextual Probe Computation.

The Tanimoto similarity measure with thresholding give us a way to compute which frames are related (and how strongly) to a probe (or to each other) and the spatial predicates then allow further structure to be placed on the information space by classifying links as one of , , or . However, we might want to impose further structure on the information space. Consider, in particular the case of a set of frames all of which are unattached () to each other. Can we impose any structure on these frames? We could define a proximity measure which tell us how `unattached' they are. One way of doing this would be to compute the shortest path between two frames as measured by , but to normalize the weight as a value between 0 and 1 we take its reciprocalgif.

Note that when then . A more sophisticated measure would take account the similarity index as well as the path length. Thus we could define:

Fig. 4 illustrates an example of the use of the equations above. There are four frames where each relation has its own similarity measure. Applying the first equation , and , so the higher the value of the closer the frames are. Applying the second equation: , and which shows that although , . This means that F1 is more proximal to frame F4 than to F3 using whereas they were equidistant under the N measure.

  
Figure 4: Use of proximity measures.

Overview of the System

The main idea of the system is to build the spatial cognitive maps of the hyperspaces influenced by the human cognitive mapping process. For this purpose the system has, first, to learn the internal lexicon of the material stored at the specific site. The internal lexicon is considered people's long-term memory for words and there is evidence to suggest that this is constructed as a network of associations, which provides semantic information [6]. This network of associations is the resulting hyperspace built through the use of the dynamic linking method already described. Secondly, the system has to reason over the hyperspace to build the spatial relations which will result in the spatial cognitive map.

The learning of the internal lexicon is achieved through the automatic extraction of attributes or key words from the homepage file and its links of a specific site. Simple linguistic processing techniques are presently being used for this purpose based on the frequency of content words. A stopping list of function words such as `the', `and', `of', and so on is used to avoid very common words. A program written in Snobol languagegif was developed for the automatic extraction of lexicons. However, the approach used is very naive since linguistic context does not contain semantic information alone, it also contains syntactic or grammatical structure.

An important assumption in the present system is that associative connections are bi-directional since the dynamic linking method applied expresses similarity between frames. This is a very important aspect since unidirectional associative links do not provide a proper basis for semantic models, but for coding syntactic information. In normal language use, syntactic information occurs in a regular order whereas the order of semantic information can vary.

In summary, the implemented system works as follows. From any URL address given as input the system will extract the attributes and build the frames. Through the use of a similarity threshold (which can be set by the user) and dynamic linking techniques the system builds the hyperspace. Over this hyperspace spatial relations will be applied to map the space. The resulting set of spatial relations is displayed graphically in the form of Venn diagrams.

The main implementation has been done using SICStus Prolog 2.1 #9gif under the X Windows environment on a Sun/Sparc Station. In addition to the SICStus Prolog Library, external storage of Prolog clauses (External Database) is used to handle large number of frames and the Graphics Manager is used to deal with windows, views for drawing and structured text displaying buttons, menus and text input. The libWWW-perlgif is used to recover a HTML file via Internet from any URL address. XMosaicgif is used to display the HTML files which are source to the system.

An Illustrative Example

The intention here is to use an example close to people involved with the conference in order to illustrate the system.

Consider the IMI95 URL address (http://www.di.uminho.pt/cnw3.html) as input to the systemgif. The created frames are the ones pointed by the following links: Homepage, Call for Papers, Important Dates, Contact, International WWW Conference Committee, Registration of Interest, Minho University, International WWW Conference Series, Departamento de Informática of Universidade do Minho, General Information, Registration and Accommodation.

Suppose the user is interested in knowing more about paper submission to the conference, so he specifies the probe [paper, information] with no similarity threshold. In other words, no restriction to activate frames dynamically is imposed by the user. The dynamic links and spatial relations resulted are show in Fig. 5. Note that there are no equal frames to the probe, only overlapped, proper contained and disjointed ones.

  
Figure 5: Topological Map with .

Furthermore, if a similarity threshold of, for example 0.05 (), is fixed with the same probe then the resulting diagram is show in Figure 6. Whether with no links were built.

  
Figure 6: Topological Map with .

Note that the creation of this unique dynamic link (i.e. Minho University) is understandable since the user imposed a restriction of connection between the probe and the frames within the hyperspace previously created.

Finally, suppose the user is not interested in a specific probe or even a contextual probe, but wants to visualize the map of the hyperspace created from the IMI95 URL address. Figure 7 illustrates on a Venn diagram format for this case.

  
Figure 7: Venn Diagram of the Hyperspace.

As an alternative to the presented diagram above the same hyperspace is presented through the graph of Figure 8.

  
Figure 8: Node Graph of the Hyperspace. Note that the connections shown are the dynamic links (with no threshold). Each link is labelled by the appropriate spatial relation holding between the linked frames.

Conclusions and Further Work

The figures in this paper were drawn manually from the symbolic information given by the system. The implementation work at the moment is focusing on visualization techniques to display large maps. Methods to construct Venn diagrams for `n' classes [1] and layout algorithms to draw directed graphs are some topics being treated. Also, the visualizations, such as those in figures 5, 6 and 7, do not take account of the similarity closeness on proximity measures. However, one could imagine displaying this information by the degree of overlap of the regions, distance between regions or thickness of the link. It may also be worthwhile considering whether more fine grained spatial relations, such are those proposed in [16] could be usefully employed.

Other very important future work includes refining the actual method of automatic extraction of attributes and references from texts. Better techniques and more intelligent ones should be applied to incorporate domain knowledge to the statistical regularity of `content words' (they vary their occurrence frequency within specific domains as compared to general language). Thus multi-word term lists to the representation of texts may use a framework able to incorporate different sorts of information. The possibility of using semantic information via classification tags present in on-line dictionaries, or via the semantic information available in WordNet [15] should be investigated and evaluated. Also the possibility of using phrases instead of single words as attributes should be taken into consideration. Ultimately corpus based parsing applying statistical methods may be evaluated as a possible solution.

Finally, a very hard problem which is still subject of many research projects is how to deal with nontextual information such as the automatic extraction of attributes from graphics, sound and video frames. This paper reports on work still in its early stages. A prototype implementation has been built in order to perform some experiments and evaluate the techniques. Clearly much further work is required before a really useful tool can be built, but we believe that these ideas may prove to be a useful approach to the challenging task of producing navigation aids for hypertext systems.

References

1
D. E. Anderson and R. B. Angell. ``Venn Diagrams for n Classes'', Journal of Symbolic Logic, vol. 31, pp. 152-153, 1966.

2
B. L. Clarke. ``A Calculus of individuals based on connection'', Notre Dame Journal of Formal Logic, 23(3), pp. 204-218, 1981.

3
B. L. Clarke. ``Individuals and points'', Notre Dame Journal of Formal Logic, 26(1), pp. 61-75, 1985.

4
A. G Cohn. A more expressive formulation of many sorted logic, Journal of Automated Reasoning, 3(2), pp. 113-200, 1987.

5
E. J. Conklin. ``Hypertext: an introduction and survey''. IEEE Computer, 20(9), pp. 17-41, 1987.

6
K. Hapeshi. ``Simulating the Development of the Language Lexicon''. AISB Quarterly, Winter 1994/95 No. 90, 1995.

7
D. L. Hintzman. ``MINERVA 2: A simulation model of human memory''. Behaviour Research Methods, Instruments and Computers, 16:96-101, 1984.

8
D. L. Hintzman. ``Schema abstraction'' in a multiple-trace memory model. Psychological Review, 93:411-28,1986.

9
R. E. Horn. ``Mapping Hypertext'', The Lexington Institute.

10
W. James. The Principles of Psychology. New York: Holt, 1890.

11
T. Kohonen. Content-addressable memories. Springer-Verlag. New York, 1987.

12
F. Lehmann and A. G. Cohn. ``The EGG/YOLK Reliability Hierarchy: Data Translation and Model Integration Using Ordered Sorts with Prototypes''. In Proceedings of the 3 International Conference on Knowledge Management (CIKM94). Gaithersburg, Maryland, ACM Press, 1994.

13
H. Maurer. ``Why Hypermedia Systems are Important''. In Proceedings of the 4 Internation Conference on Computer Assisted Learning. Springer-Verlag, 1992.

14
R. McAllese. Hypertext: Theory into Practice. Intellect Ltd - Blackwell Scientific Publications Ltd, 1989.

15
G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross and K. Miller. Introduction to WordNet: an On-line Lexical Database, (WordNet documentation), 1993.

16
D. A. Randell, A. G. Cohn and Z. Cui. ``An Interval Logic for Space based on `Connection''', John Wiley (ed.), Proceedings ECAI, Chichester, pp. 394-398, 1992.

17
R. Semon. Mnemic Psychology. (B. Duffy, Trans.) London: George Allen & Unwin, 1923. (Original work published 1909).

18
P. Simons. Parts: A Study in Ontology. Clarendon Press, Oxford, 1987.

About this document ...

Building Maps of Hyperspace

This document was generated using the LaTeX2HTML translator Version 95.1 (Fri Jan 20 1995) Copyright © 1993, 1994, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html /a/home/csirisa_b/pg/ghedini/pctex/myarticles/imi95/imi95.tex.

The translation was initiated by C G Ralha on Thu Jun 8 16:28:13 BST 1995

...Ralha
Supported by a scholarship from CNPq of the Brazilian Government and leave of absence from CEF.

...Cohn
Gratefully acknowledges the support of the EPSRC under grant GR/H/78955.

...processes
As a consequence of this various global indexing search engines such as WebCrawler, JumpStation, Aliweb, WWWW (Worm), Wandex and Open Text Web Index have been built.

...minds
Experiments described in Chapter 7, Lost in Hyperspace: Cognitive Mapping and Navigation in a Hypertext Environment of [14].

...(homophony
The term homophony suggests music sung or played in unison, and Semon illustrated the notion with a chorus of voices metaphor.

...memory
Adopted James's [10], terminology to primary memory as the active representation of the current experience.

...memory
Secondary memory is the vast pool of largely dormant memory traces.

...way
This idea is related to the work of a group at The Scottish HCI Centre. They have developed StrathTutor a hypertext system designed for tutoring applications which was described in Chapter 12 - Towards Intelligent Hypertext of [14].

...frames
The frame associated with each memory trace is intended to represent on-line documentation on a specific domain of knowledge expressed through Hyper Text Markup Language (HTML). The HTML format is used in the WWW initiative as one simple format for providing linked information throughout the globe.

...URL
The Uniform Resource Locator (URL) consists of a unique identifier for a file on the Internet. The WWW uses URLs to specify the location of files on other servers. A URL includes the name of the protocol to be used for retrieving the file, the address of the server and the location of the file.

...[#Randell92##1#]
Publication of the Qualitative Spatial Reasoning going at Leeds University can be found at http://agora.leeds.ac.uk/spacenet/publications.html)

...theory
The word `theory' means a set of formal axioms which specifies the property and relations of a collection of entities.

...relations
In fact, strictly speaking these five relations define a mereological system [18] rather than a topological system.

...incomparable
See Figure 4 of [12] for examples.

...reciprocal
Different measures could be applied here to normalize this function, but we have used the reciprocal.

...language
Snobol is a general programming language to manipulate text and search for patterns in a simple and natural manner. The version used was SPARC SPITBOL 3.7(2.47 I/O) Copyright © 1987, 1991 Robert B. K. Dewar and Catspaw, Inc.

...#9
SICStus Prolog is a portable implementation of Prolog, written at SICS ( Swedish Institute of Computer Science).

...libWWW-perl
LibWWW-perl is a library of Perl 4 packages which provides a consistent programming interface to the WWW. It is freely available on the Internet and it is Copyright © 1994 of Regents of the University of California.

...XMosaic
NCSA Mosaic for the X Window System is an Internet information browser and WWW client. It was developed at the National Center for Supercomputing Applications at the University of Illinois in Urbana-Champaign.

...system
The following example used data placed at the referred site on the 2862#62 April 1995. All figures in this section are coloured.



C G Ralha
Thu Jun 8 16:28:13 BST 1995