AustLII Home | Databases | WorldLII | Search | Feedback

University of Technology, Sydney Law Review

UTS Law Review (UTSLR)
You are here:  AustLII >> Databases >> University of Technology, Sydney Law Review >> 2004 >> [2004] UTSLawRw 8

Database Search | Name Search | Recent Articles | Noteup | LawCite | Help

Wittfoth, Armin; Chung, Philip; Mowbray, Andrew; Greenleaf, Graham --- "Can One Size Fit All?: AustLII's Point-in-time legislation Project" [2004] UTSLawRw 8; (2004) 6 University of Technology Sydney Law Review 117


CAN ONE SIZE FIT ALL?: AUSTLII’S POINT-IN-TIME LEGISLATION PROJECT

Armin Wittfoth, Philip Chung, Andrew Mowbray, and Graham Greenleaf*

I

[1]

n a previous paper we raised the prospect of applying a uniform approach to representing legislation especially with a view to constructing a point- in-time legislation navigation system. In this paper, we will further explore our representation model and report on some issues encountered in implementing such a system. In particular, we will look at the inherent structure of legislation and discuss the “presentational” and “structural” approaches to legislation markup. The importance of structure, how it aids automatic processing, and where it needs to be supplemented with additional information discussed from the point of view of implementing

a historical representation of legislation.

TERMS

In this paper the term “structure” is used in a way that implies containment. Alternatively structure may be conceived of in hierarchical terms. We distinguish between a structural approach and a merely presentational approach.

“Schema” does not refer to XML schema (XSD) documents specifically, but rather that which XSDs, and DTDs describe. That is to say a grammar of elements, such that it can be determined where in the structural hierarchy any element can and cannot occur.

“Divisional element,” is used to describe any legislative element, such as a chapter, part, division, schedule, etc., into which a legislative instrument might be divided. A divisional element is to be distinguished from a “provision,” an element containing the actual legislative text.

Inherent Structural Representation in Legislation

At least within Australian jurisdictions, legislation is inherently structured and moreover obeys a schema more or less uniform across the various jurisdictions.

At its simplest an act might be nothing more than a simple collection of provisions. However, drafting practices have evolved to a stage at which it would be rare to encounter such structure in all but the shortest of modern acts. Instead acts are usually divided up into parts, schedules, divisions, subdivisions and the like. Of course these divisions need not be applied to every act: chapters appear in some but not others, as do divisions and subdivisions. However, any reasonably experienced user of Australian legislation could make some general observations about the structure of that legislation.

Such a user might describe the general structure in these terms: “An act usually contains a long title, perhaps some chapters, parts, schedules, possibly a dictionary. A chapter contains only parts; a part might contain some divisions, or alternatively sections; a division can contain subdivisions or sections; and so on.” These statements amount to a description of the schema of an act. In making these observations our experienced user of legislation is doing just what a XML designer would do in drawing up a DTD. Thus the above statements might roughly be expressed in a DTD as:

<!ELEMENT act (longtitle, (chapter | part | section)+, schedule*, dictionary?)>

<!ELEMENT chapter (part+)>

<!ELEMENT part (division | section)+>

<!ELEMENT division (subdivision | section)+>

and so on.

This DTD fragment might be read as follows: “There is an element called act. An act contains a long title followed either by one or more choices from chapter, part, or section. This in turn is followed by zero or more schedules, and a single dictionary or no dictionary at all. There is an element chapter which contains one or more parts. An element part...” and so on.

And that’s basically half the work of designing an XML representation— at least a naive one—for Australian legislation. As can be seen, such a schema remains true whether or not an act happens to be organised with

any particular arrangement of divisional elements.

At least one alternative schema is in use for Australian legislation, namely the order/rule (or more recently part/rule) arrangement especially employed in drafting rules of courts. However, assuming that this arrangement is itself schematic, this alternative poses no special difficulties, whether dealt with separately or within a single overall schema. In fact it is this kind of schema, where Rule numbers start anew in each Order, which benefits most from expression in a structural format.

It is this view of documents as potentially hierarchical and schematic,

which was adopted by the designers of SGML.2 Consequently legislative documents conforming to the structure and schema described above should be seamlessly expressible in SGML or XML.

For example, following the above schema (with the help of some additional elements), the beginning of NSW Act 25 of 1977 might be described in XML as follows:

<act title=”Animals Act” jurisdiction=”nsw” ref=”1977-25”>

<longtitle>

<text>An Act relating to liability for damage caused by animals.</text>

</longtitle>

<part number=”1” title=”Preliminary”>

<section number=”1” title=”Name of Act”>

<text>This Act may be cited as the

<ref link=”nsw1977-25.xml”>Animals Act 1977</ref>.</text>

</section>

...

</part>

...

</act>

Listing 1

Alternatively, without reliance upon attributes, and without conforming to the above DTD fragment, the same might be expressed as follows:

<act>

<jurisdiction>nsw</jurisdiction>

<title>Animals Act</title>

<year>1977</year>

<ref>1977-25</ref>

<section>

<title>Long Title</title>

<text>An Act relating to liability for damage caused by animals.</text>

</section>

<part>

<number>1</number>

<title>Preliminary</title>

<section>

<number>1</number>

<title>Name of Act</title>

<text>This Act may be cited as the <ref

link=”nsw1977-25.xml”>Animals Act 1977</ref>.</text>

</section>

...

</part>

...

</act>

Listing 2

Structural v. Presentational Approaches to

Legislative Mark up

The distinction between structural and presentational might be taken to imply, misleadingly, that a structured approach is less suited to presentation than a nominally presentational approach, which is not so. SGML was originally designed as part of an integrated text processing system which proceeded on the basis that a distinction between publishing and information retrieval was “artificial,” and that both functions “could best be served by an integrated system.”3 Part of the attraction of structured mark up languages is that they do lend themselves to the sophisticated presentation of documents.

Nonetheless, it is in practice useful to adopt this terminology to distinguish between the two major forms of marking up text being employed today.

To illustrate what is meant by a presentational approach, the beginning of NSW Act 25, of 1977, might be expressed using XML, but without

(much) structure, as follows:

<act jurisdiction=”nsw” ref=”1977-25”>

<act-title>Animals Act 1977</act-title>

<section-heading>Long Title</section-heading>

<section-text>An Act relating to liability for damage caused by animals.</section-text>

<part-heading>Part 1 &mdash; Preliminary</part-heading>

<section-heading>1.&nbsp;&nbsp;Name of Act</section-heading>

<section-text>This Act may be cited as the <ref

link=”nsw1977-25.xml”>Animals Act 1977</ref>.</section-text>

...

</act>

Listing 3

Note that in contradistinction to the structured version above, the title and the text of any provision are separate entities, bearing no particular relationship to each other than contiguity. In listing 1 part and section titles are included among the attributes of their respective elements. In listing 2, although title (along with the other erstwhile attributes) are expressed as elements, these elements are contained within the element they describe, along with the text that might belong to that element. Here, however, elements that bear such special relationships to each other, simply float around independently, as structural equals, each a child of the act among so many other children of the act.

Moreover the location of any particular provision (that is to say in which part it is contained), is not to be discerned from its structural position, but only from examining elements proximal to that provision, and inferring, for instance, that the preceding part-title element indicates the active part.

This is exactly the kind of mark up that would be expected to result from the use of a style template on a WYSIWYG word processor. Users of word processors are familiar with the procedure: write a line, select the appropriate “style” and that line has been marked as “Heading1”, or

“Section-Title” (depending on the particular style template employed). This is how word processors and style templates are used, day in and day out. Certainly this is the kind of data supplied to AustLII by the majority of Parliamentary Counsel’s Offices (PCOs).

Now it might be argued that the data in listing 3 does conform to some kind of structure after all. That it is, at least, possible using the information contained in listing 3 to build the structure of listings 1 and 2, by some automated process. However this is so not because of the structure of that listing, but again, because of the schematic nature of Australian legislation.

The more naively a style template represents that structure, the easier

it will be to apply the inherent legislative schema to build a structured representation of that legislation. That is to say abstraction, for instance the representation of chapter, part, division, and subdivision titles, by some generic divisional title style, will—in the absence of structure—make it all the more difficult to translate a presentational document to a structured one. If the mark up varied too far from the natural schema, it might become more efficient to contextually check the marked up text itself, rather than the “tags”. Fortunately this is unlikely, given that different divisional elements will, in practice, be rendered differently, and thus

a presentational style will call for a distinction precisely between those elements.

Examples from New South Wales and Queensland

New South Wales and Queensland are two exemplars of the two approaches discussed above.

QUEENSLAND—PRESENTATIONAL APPROACH

The Queensland PCO supplies AustLII with data in the native format of

a proprietary word processing/page layout application. (Indeed most of AustLII’s data streams arrive in either the native or exchange formats of such applications.) While this format is convertible by that application to XML, the resulting XML is not well structured, but reflects the ‘style- tagging’ (that is to say presentational) approach typically applied with most WYSIWYG word processors. The style sheet employed by the Queensland PCO does however, reflect the traditional elements into which legislation

is divided. Consequently the data reflects the implicit legislative schema described above, which in turn allows the legislative structure to be divined from the non-structural data.

Such an approach to building a structural representation of legislation will, in practice, be both less straightforward and less robust. The schema

[121]

is, after all, implicit and is not enforced as it might be if explicitly expressed as a DTD (or an XSD). There is no strictly “invalid” way to apply any given tag (even though there may, from the point of view of the PCO’s internal guidelines, be a “wrong” way to apply it). A presentational bias might, for instance, lead a drafter to apply a style not exactly meant for a particular element, if in that instance, it results in a better looking output. There may be tags which do not relate to the legislative schema, and sit uneasily within it.

Queensland’s style template makes use of tags such as “Chapter- Heading”, “Part-Heading”, “Schedule-Heading”, “Division-Heading”, Section-Heading”, “Subsection”, “Paragraph”, “Subparagraph” etc. These are all usefully schematic. What, however, is one to make of a tag such as

“Normal-Text”, which is applied in a variety of contexts? Again one might expect that a “Subsection” could only follow a “Section-Heading” tag, or that a “Paragraph” tag would exclusively follow a “Subsection” tag, but this is not so. (Though in this case the usage of ‘”Subsection” employed by the Queensland PCO is not so much aschematic, as counter-intuitive). A

“Definition” tag might sometimes appear directly as a child of “Subsection”

or “Paragraph” and sometimes a “Normal-Text” element separates the

“Definition” from the “Subsection”, and so on.

Notwithstanding such difficulties the information contained in Queensland data files has proven to be rich enough, and to reflect the implicit schema of legislation enough, to enable (at least across a set of sample acts) the translation of the data into the static version of AustLII’s uniform representation.

NEW SOUTH WALES—STRUCTURAL APPROACH

New South Wales, on the other hand, makes data available to AustLII as XML conforming to its exchange DTD.4 Exchange DTD is an abstraction of the act, subordleg, and epi, along with the common leg_shared DTDs used by the NSWPCO internally. These internal DTDs represent more directly the familiar legislative elements.

One of the original objectives of this DTD was to express legislation originating outside New South Wales but attached, as it may be from time to time, to New South Wales legislation.5 For instance, Common- wealth instruments, or even international agreements might be scheduled to New South Wales acts. Given this objective the DTD is relatively permissive.

Exchange documents were also seen as the most appropriate form in which to make its legislative data available to users. The thinking here was that the exchange DTD could be kept more or less stable against possible changes to the PCO’s “internal” standards and that external users would

[122]

only have to deal with a single format, not the three (Act, Subordleg, and

EPI).[6]

Additionally the LDDS sought to lead by example, holding out the hope that the exchange DTD might be adopted as a common inter-jurisdictional schema for expressing the internal schemata of the various PCOs who might adopt structured document mark up languages in the future. This would not only facilitate the exchange of legislation between PCOs, but had “the potential to reduce the cost of data management for organisations such as AustLII and commercial publishers who receive legislation data from multiple jurisdictions.”[7]

The abstraction involved in the exchange DTD is to divide legislative elements into two broad, but non-exhaustive, classes, tier and level.8 Level is intended to mark up “the major hierarchical components of a document”. Such “major hierarchical components” are those that “would [normally] be included in a contents listing for a print or electronic publication.”9 On the other hand tier applies to hierarachical structures beneath that of level,

“such as subclauses, [which] would not appear in a table of contents”.[10]

Presentational considerations are never too far away.

As it happens AustLII’s own “uniform representation” similarly relies on an abstraction involving two broad, but non-exhaustive, classes of legislative elements, segments and clauses. This division, however, is based on

a different view of legislative structure from that envisaged by the exchange DTD. This is that legislation is basically divisible into a series of containers which serve purely structural purposes (e.g. chapters, parts, divisions, etc.) and clauses which, along with subclausal elements, contain, or potentially contain, the actual legislative text. Again this is not entirely divorced from presentation, for the clause level, is from AustLII’s perspective, the level at which discrete bits of legislation are served.[11]

Structured Format—What Does it Promise? What Does it Deliver?

As noted above, SGML was devised to satisfy both the requirements of publication (presentation) and text processing more generally.12 As has been observed “capturing the logical structure of documents whether it be

[123]

using XML or SGML or a combination of both, makes possible automatic processing that previously was only a dream.”[13]

The processing of interest here, is of course, that required to establish point-in-time access to legislation. Essentially this is a problem of capturing changes to legislation over time, and then recording these changes, preferably themselves in a structured format, in order to permit the further processing required for any particular point-in-time application. The real question then is not which particular application or user interface (UI)

is created, but rather how to generate, from whatever data is available, a document archiving changes to instruments over time. Such a document can potentially lend itself to a variety of point-in-time applications. For example, the data captured in this way may be used to create historical navigation systems, or alternatively time aware legislation link generation for case law databases. This in turn will depend on refinements to document format, especially as regards the supply of time metadata, which case law

is expressed in.

DETECTING LEGISLATIVE CHANGES BY STRUCTURE

A walk through of the structure of an instrument will make it obvious when

a provision has been repealed or added and, through the use of textual comparison, where an amendment to an extant provision has been made. For the purposes of this walk through it is assumed that a provision is exclusively identified by its number.

Consider, for example, an imaginary act which in its current structure includes a Part 2, Division 1, containing three sections 17, 18, and 19. If in an updated instrument, a section 17A has been added, this will be obvious from the change in membership of this element of the structure.

Part 2 Part 2

| |

| |

Div 1 Div 1

|
|
|
|
|
|
|
s17
s18
s19
s17
s18
s19
s17A

| |

Similarly, the subsequent repeal of section 18 will be immediately

apparent once the membership of this division is examined.

Part 2 Part 2

| |

| |

Div 1 Div 1

|
|
|
|
|
|
|
s17
s17A
s18
s19
s17
s17A
s19

| |

[124]

In terms of repeals, it must be noted, common practice is not actually to remove the provision from the Act, but rather textually to amend it simply to read “Repealed.” This change in status might also be recorded as part of the metadata of the element’s mark up. In New South Wales

it is now the practice to do both. This kind of repeal is easily dealt with, especially where such metadata is supplied. The point here is that either method of removing a provision (or a divisional element), can be dealt with by employing a structural approach.

Textual amendments can be handled in an analogous manner. That

is to say just as elements which are defined to contain other elements are inspected for changes in membership, elements which are defined to contain text are checked for textual equivalence. In other words, textual amendments can be conceived of in structural terms.

A question might now be raised as to how much this use of structure actually facilitates the analysis of changes to an acts provisions. After all a flat comparison for membership and textual amendment could simply be made over the provisions of an instrument as a whole, without dividing an act into its divisional elements. (Such treatment would resemble AustLII’s traditional approach to serving legislation. Attention is focused on serving provisions. Divisional elements are regarded merely as text occurring in

a table of contents.)

This is to be contrasted with the situation apropos order/rule (part/rule) type instruments, where the position within the structure, and therefore structure itself, are vital to “identifying” any particular rule. Here the consequence of AustLII’s approach is that the provision number must be

“normalised”, by using decimalised names which reflect their position in the structure: Order 23 Rule 4 would be renamed Section 23.4, for instance. In the first place, this structural approach is capable of capturing changes

(such as additions and repeals) to the divisional elements themselves. Once the focus shifts to how instruments change over time, it becomes more convenient to treat these divisional elements as objects in their own right, rather than merely tracking their changes as if they were nothing but textual amendments to the instrument’s table of contents.

Secondly there are clear computational efficiency gains in testing for membership in several smaller divisions as opposed to testing over an instrument as a whole.14 Given the large number of instruments which must be processed and especially the size of some of those instruments

(consider the Water Act 2000 (Qld ) which has around 1,100 sections, but

is divided into nine chapters, each further divided into numerous parts and divisions), this greater efficiency is potentially significant, even on today’s hardware.

[125]

This raises the question of whether the division of an instrument (not organised in rule/order style) into smaller segments actually allows us to make any meaningful comparisons for list membership. Clearly randomly dividing a longer list into several smaller ones, while it might decrease processing time, is in no way guaranteed to result in a true picture of the changes of membership over the list as a whole.

Legislative divisions are not random, and comparisons for membership over time are, at the very least, meaningful. Indeed, if legislative amendment were restricted only to repeals, additions, and textual amendments, these comparisons might exhaustively describe changes over time. However, legislation changes in a number of ways which render a purely structural approach inadequate.

Most obviously, legislative change also involves changes to the structure of an instrument itself, not simply involving the addition or repeal of divisional elements. This immediately suggests the inadequacy of placing total reliance on structure for analysing historical changes of legislation- like instruments.

Consider for example the addition of Division 8A to Part 3 of the NSW Crimes Act 1900, No.40 (NSW) in July 2002. This amendment basically removes section 60 from Division 8, creates the new Division 8A for what remains basically the same section 60 (there is also some textual amendment of that provision) and surrounds that section with some new related provisions.

Part 3 Part 3


|
|

|
|

Div 8
Div 9

Div 8
Div 9

|
|

|
|
|
| |
|
|
|
|

| |

s58 s59 s60 s61 s58 s59 s61

Div 8A

|

| | | |

s60AA s60 s60A s60B

From the point of view of a purely structural analysis, however, it appears as if Division 8, section 60 has been repealed and a new Division

8A containing a number of provisions has been added. That is to say the historical connection between the two versions of section 60 (and in fact all versions prior to and after this change) has been lost.

Such structural change might be so extreme as to stymie any attempt at automatically tracking the changes. The structure of an act might be changed entirely, with the numbers of all or nearly all provisions and divisional elements changing simultaneously. This kind of change occurred in New South Wales with the changes to the Criminal Procedure Act 1986,

[126]

No.209 in July 2003. A new structure based on chapters (previously parts were the highest divisional element) was introduced, provisions, inter alia, of the old Justices Act 1902, No. 27 were incorporated, and there was wholesale renumbering. Short of the availability of unique identifiers

(UIDs), to link the new provisions to their ancestors, (many of which are located in a different act), it is difficult to see how any automated process could track these changes. One might question whether an instrument, so radically restructured, ought even to be considered the same instrument. Even without changes to the structure, however, there are a number

of situations in which the historical relationship between provisions can become confused when (and not only when) relying on a purely structural analysis.

Such confusion could arise in the situation where a particular provision

(or divisional element) is repealed and another having the same number

is added to the same element. This leads potentially to a type-I error, that is, the newer provision is incorrectly identified as a newer, albeit textually amended, version of the repealed provision. Fortunately it is uncommon for a provision to be repealed from an instrument and an identically numbered provision added, in the same publication of an instrument. In other words, an intervening repealed state indicates that two identically numbered provisions are not the same provision at different points in time. Nevertheless this is a situation which implementations of point-in-time legislation systems won’t necessarily handle. (This is not such a problem for navigation applications as a user will quickly perceive the lack of historical linkage between completely unrelated provisions.)

Legislative provisions, or divisional elements, might also be renumbered. This raises the possibility of a type-II error, that is, two “equivalent”, but differently numbered, versions of a provision will not be identified as being the same provision at different points in time. The error arises even if the equivalent, but differently numbered, provisions remain within the same element. Short of the use of (historically) persistent UIDs by the original publishers of the data, their relationship could only be established if the amended legislation carried some pointer connecting the older provision number with the new version. This might be done on the face of the act

(e.g. as text appended to the old provision number in a table of contents, directing the reader to the new provision number), or via the mark up. New South Wales legislation ordinarily (that is outside situations such as that described in relation to the Criminal Procedure Act 1986, above) does both.

It’s no surprise perhaps that document analysis based purely on structural principles, (without the use of UIDs or other metadata), should be inadequate. As Goldfarb, one of the original developers of SGML, noted, “generic coding” (coding a document solely by the structural relationship of various “generic identifiers”—i.e. tags or elements), is

“conceptually insufficient”. In reality “documents are complex objects,

[127]

and they have other attributes that a markup language must be capable of describing.”[15]

This is the reason for the inclusion of “attributes” among the “conceptual and notational tools [SGML] provides for dealing with this hierarchical structure.”16 Moreover the ability to identify elements uniquely was early recognised as necessary for computational applications of document encoding.17 This is now reflected by the inclusion within the definition of SGML and later XML, of ID as a specified data type. The problem of tracking the history of a provision across different versions might be approached as one of the provision’s identity over time.

The extent to which a historical legislation database ought to rely on identity, on structural analysis, or on some other scheme such as tracking changes, and in what combination, depends on the frequency of the various kinds of changes encountered. This is an empirical question which the data being gathered by AustLII in tracking changes to New South Wales legislation may go some way to answering.

If the kind of structural change noted above in the example of section

60 of the Crimes Act (NSW) were the norm, one might question the use of structure at all. If, on the other hand, additions, repeals and textual amendments predominate, it would seem unwise to abandon the ease and efficiency of structural processing. It must be remembered also that the large bulk of an instrument does not change at all from version to version.

Identifying Legislative Elements

The walk through above demonstrated that structural analysis cannot be separated from the question of identifying provisions. Some of the problems encountered demonstrate as much the inadequacy of the means of identifying provisions as they do the inability to rely solely upon “generic coding”.

Users of legislation are in the habit of identifying a provision, or other legislative element, by its number.

AustLII’s current software, for instance, relies on provision numbers as unique identifiers within any instrument, of a given provision. The importance AustLII attaches to provision numbers is reflected in how provisions are served, that is, in a static HTML file named for the particular provision, stored in a directory representing the particular instrument. This means a provision can be accessed via a URL such as:

http://www.austlii.edu.au/au/legis/nsw/consol_act/ca190082/

s12.html

which is the URL for section 12 of Act 40 of 1900.

[128]

This is arguably one of AustLII’s great strengths, allowing a provision to be easily referenced by users. In moving to any historical representation of legislation the problem of how (if at all), this ease of use feature can be maintained against the need to address equivalent (but differently numbered) provisions over time will need to be solved.[18]

In practice the provision number isn’t even entirely satisfactory in static use as it is not always unique. For example, a number of Queensland acts were, for some time in 2002 and 2003, electronically published with two identically numbered provisions at the end of the instrument. Thus, between November 2002 and May 2003, the Electricity Act 1994 ended in two separate Parts 4, each containing a section 304, like this.[19]

Chapter 14 Transitional and Validation Provisions\...

Part 4 Transitional Provisions of Integrated Planning Act etc.... Section 304 Application of the Acts Interpretation Act, s 20

Part 4 Validation Provision

Section 304 Validation of particular acts by...

Similarly the Workers Compensation Act 1987, No 70 (NSW) had, at one point in time, two divisions 1A under Part 4, apparently a result of an ambiguity in drafting technique. An element inserted between two elements numbered “1” and “2” is properly called “1A” while an element perpended to an element numbered “1” is also called “1A”. This situation would usually be dealt with by calling the perpended element “1AA”[20]

On one level, recipients of legislative data, such as AustLII, are not really in a position to declare instances such as these to be errors. It does not fall to an external user to correct any apparent error. Downstream publishers are not editors in regard to legislation. Given these irregularities appear on the face of the act it is for AustLII to publish them. For a system, such as AustLII’s, however, which depends upon the uniqueness of provision numbers, this proves to be a challenge.

Inasmuch as these are drafting errors, this is arguably a non-issue. They are, after all, errors, and do not prove, all things being equal, that provision numbers are not an effective identifier for provisions (the application of UIDs could be similarly flawed). This is an issue, which must nonetheless be kept in mind when designing complex legislative processing systems, which might hang by the slender thread of provision number.

With the introduction of time the shortcomings of provision and

divisional numbers becomes obvious. Renumbering means that an

[129]

equivalent section can be associated with a variety of numbers over its lifetime. As was noted above, some link, whether in the form of a persistent UID inserted by the original producer of legislation,21 or, more likely a reference from the new number back to the old, is in practice required to maintain the historical relationship between the various versions of a renumbered element. The information needed to derive this link, included in a specifically defined tag, is available for New South Wales legislation in the various XML formats including that conforming to the exchange DTD. This cannot necessarily be guaranteed for the data originating from all Australian jurisdictions.

NSW Exchange documents, do include an ID for legislative elements, however they merely replicate the structure of the instrument. Thus the second subclause of s59 of the Crimes Act 1900, No. 40, has, at the time of writing, an ID value of “pt.3-div.8-sec.59-sub.2”. This is very useful for testing the integrity of the document, that each element is exactly where

it should be, and for avoiding the duplication of numbers seen above. But for historical purposes it clearly adds nothing.

As has been seen above, a similar situation can arise upon adopting

a structural approach, where, although the number doesn’t change, a restructuring of the instrument can leave it in a different element. Again in the absence of any built-in persistent UID, it is necessary to attempt to locate “missing” elements by other means. Given that these cases only potentially arise where an apparent repeal has occurred, the search for such missing elements is immediately narrowed.

Finally it is interesting to note that just as element numbers no longer serve to identify elements within an instrument, once time is taken into consideration, instrument titles (the usual means by which instruments are identified) fail in turn.

Any instrument can, through the course of its life, be known by several names. Thus the Police Act 1990 (NSW), was originally the Police Service Act 1990; the old Auctioneers and Agents Act 1941 (NSW) was first renamed the Auctioneers, Stock and Station and Real Estate Agents Act 1941 and then eventually the Property, Stock and Business Agents Act 1941 (and has now been replaced by the Property, Stock and Business Agents Act 2002); the Indy Car Grand Prix Act 1990 (Qld) is now called the Gold Coast Motor Racing Events Act 1990, to give a few examples.

However in this case, while the instrument name is not capable of identifying a legislative instrument over time, there does exist an in-built unique persistent identifier for instruments in the form of the traditional

[130]

designation, Act N of YYYY. Thus no matter what name they have been known under, the Police Act has always been NSW Act 47 of 1990, the Auctioneers and Agents Act remained NSW Act 28 of 1941 and the Gold Coast Motor Racing Events Act is still Qld Act 60 of 1990.

Interestingly the file naming conventions of the New South Wales PCO follow this traditional designation. The Crimes Act 1900 (NSW), Act 40 of 1900, ships under the name 1900- 40.xml. On the other hand the Queensland PCO bases the file name on the current names of the legislation. Similarly AustLII uses a file naming convention based on a hash of the name and year of any piece of legislation. Clearly the NSW approach

is more appropriate for identifying instruments over time. It is necessary to draw a further distinction here, however, because these numbers are unique only within separate sequences of acts and of regulations, (i.e. there are acts which have the same year and numbers as regulations).

Concluding Remarks

A prototype point-in-time legislation navigation system has been implemented using the New South Wales data. This development has led to an appreciation of the use of structured data in automated legislative processing. As the project aims to develop a system potentially capable of capturing the data of all Australian jurisdictions, the question has been raised as to whether “one size can fit all”.

In order to answer this question, the project is examining the data supplied by the Queensland PCO. Unlike the New South Wales data, which is already supplied in a structured format (XML), Queensland data arrives in a word processing format, created using a “style-tagging” or presentational approach. The Queensland data is, nonetheless, promising, and it has been possible to convert a sample set of documents to AustLII’s internal static format. It should be possible to derive the archived format from several such static instances. Moreover a number of common amendment events will be detectable from structural comparison alone. Whether there is sufficient information in the “presentational” data to capture all, or nearly all, amendment events across jurisdictions generally remains to be determined.


*Armin Wittfoth is Project Officer, AustLII; Philip Chung is Executive Director, AustLII

and Lecturer in Law, University of Technology, Sydney; Andrew Mowbray is co-Director, AustLII and Professor of Law, University of Technology, Sydney; Graham Greenleaf is co- Director, AustLII and Professor of Law, University of New South Wales.

ACKNOWLEDGEMENT: This research is supported by an Australian Research Council

(ARC) SPIRT Grant (now Discovery: Linkage Grants) entitled “Automating Improvements to the Authority and Interactivity of Legislation-Like Instruments, to Benefit Government and Business Performance.”

[1] A Wittfoth, P Chung, A Mowbray, and G Greenleaf, “Towards a Uniform Representation of Multi-Jurisdictional Legislation-Like Instruments” (2001) Proceedings, Law via the Internet Conference 2001, AustLII, University of Technology, Sydney.

[2] GF Goldfarb, “Design Considerations for Integrated Text Processing Systems” (1973)

Technical Report No 320-2094 <http://www.sgmlsource.com/history/G320-2094/ G320-

[2094].htm>.

3 Ibid.

4 Parliamentary Counsel’s Office New South Wales DTD Design Specification (2001) LDDS

SGML DTD and Data Conversion Project, Exposure Draft 1.0.

[5] Ibid, section 8.1.1

6 Parliamentary Counsel’s Office, New South Wales Functional Requirements for DTD

Development (2001) LDDS SGML DTD and Data Conversion Project, Version 4.3, s8.34.2.

[7] Ibid, s8.34.3.c.

[8] NSWPCO, above n 4, s8.1.4.

[9] Ibid, s8.1.4.

[10] Ibid.

[11] The earliest sample data AustLII received from the LDDS was, however, not in exchange format, but in the internal Act format. AustLII’s own DTDs then, were uninfluenced by the exchange DTD. Instead their development was influenced by working with the Act data and from the basic design criterion to keep the internal DTDs as generic as possible, in order to facilitate the expression of data from a range of jurisdictions supplied in a range of formats; and partly the fact that AustLII regards the clausal level as atomic (ie. this is the smallest

unit into which an instrument is broken up).

[12] Goldfarb, above n 2.

13 T Arnold-Moore, ‘XML for Legislation Drafting, Management and Web Delivery – How

structured document representation facilitates automatic processing’ (1998) XML’98

<http://www.mds.rmit.edu.au/~tja/papers/XML98.htm> .

14 List comparison is potentially an O(n2) operation (given truncating list comparison

algorithms, it’s actually somewhat better). It should be obvious therefore that the sum of a number (k) of smaller list comparisons will require less processing time than a single large comparison over all the elements of the smaller lists, that is n[2]> k(n/k)[2]for all k>1 .

15 GF Goldfarb, ‘Introduction of Generalized Markup’ (1986)

<http://www.sgmlsource.com/history/AnnexA.htm> . Originally published as ‘A Generalized

Approach to Document Markup’, SIGPLAN Notices (1981).

[16] Ibid.

[17] Ibid.

[18] The simplest approach might be to use the current version of the legislation, cited with the current section number, as a gateway into the historical database.

[19] This defect in the Electricity Act 1994, 64 was evident in Reprints 4C (1 Nov 2002);

[4]D (20 Dec 2002); 4E (29 May 2003) and rectified in Reprint 5 (1 July 2003). Another example is the District Court of Qld Act 1967, 42 which ended in two Sections 140 in Reprints 4G (16 August 2002) and 4H (1 May 2003). In Reprint 4I (9 May 2003) this has been rectified with the provisions now numbered ss140 and 141, though curiously in reverse order (ie the old 2nd s140 is now s140 and the old first s140 is now s141).

[20] See for example District Court Rules 1973, 1, 39, and also the example from Pt3 Div8A of the Crimes Act cited above.

21 NSW Exchange documents do include an ID for legislative elements, however they merely

replicate the structure of the instrument. Thus the second subclause of s59 of the Crimes Act has, at the time of writing, an ID value of “pt.3-div.8-sec.59-sub.2”. This is very useful for testing the integrity of the document, that each element is exactly where it should be, and for avoiding the duplication of numbers seen above. But for

historical purposes it clearly adds no information not already available from the structure of the document itself.


AustLII: Copyright Policy | Disclaimers | Privacy Policy | Feedback
URL: http://www.austlii.edu.au/au/journals/UTSLawRw/2004/8.html