Ironically in today’s information world, getting hold of proper data is not as easy as it should be. There may be several sources to choose from, in international organisations, government and companies. Too often, though, they generate inconsistent information, even for straightforward numbers like GDP growth. The Internet is awash with suppliers, and though it is the tried and trusted sources that researchers and policymakers turn to (and indeed pay for), no database can be taken for granted.
Even at the OECD, the quality and reliability of our data depend on handling and processing information that flows in and out of the organisation every day from around the world. Statisticians are not mere number librarians, but have a central role to play in ensuring that the entire process, from conception to dissemination, is robust and reliable, both for themselves and their clients.
Maintaining quality data in the face of heavy demands is an enormous challenge. There is defining and gathering the statistics; ensuring relevance, veracity and comparability; providing the right metadata such as definitions, sources and caveats; and then disseminating and updating, all in a fast-moving technological environment.
Information technology has naturally been a tremendous boon to this work. The OECD is taking advantage of the opportunities it presents, and a number of software tools for management and exchange of statistical information are being developed. For example, the OECD and UN Statistics Division have a shared responsibility for collection of annual foreign trade data, and when a new system goes into service in 2005, data will periodically be replicated from one site to the other, creating a consistent annual foreign trade database for both organisations.
The OECD’s latest venture is even more impressive: a state-of-the-art statistical information system is now being rolled out, with the aim of improving the efficiency of data collection and eliminating errors and inconsistencies in statistical outputs. Moreover, the new system will shorten publishing cycles, and should make it easier to locate and retrieve our statistical products online.
The architecture of the new statistical information system consists of three layers: a production layer for collection, validation, processing and management of statistical data and related metadata; a storage layer where validated statistics and related metadata are placed; and a dissemination layer for generating statistical publications and interactive electronic products. The three layers are supported by a workflow system that automates statistical and publication processes wherever possible, and tracks the steps involved.
The design makes innovative use of new information technology standards, such as Web Services (which enable applications running on diverse computer platforms to communicate with each other) and XML (eXtensible Markup Language, which facilitates the interchange of data among various applications). The use of these standards permits a "loose coupling" of components of the statistical information system. So if, for example, it was decided to redevelop the central data warehouse (OECD.Stat) in the middle of the system because of a change in the underlying database software, it would be possible to do this without having to change either the tools for preparing validated data for the data warehouse, or the tools used for publishing validated data from the warehouse. Furthermore, the use of Web Services and XML means the statistical information system is fully prepared for global data sharing initiatives, such as Statistical Data and Metadata eXchange (SDMX).
OECD.Stat is the central repository or “intelligence warehouse” where validated statistical data and metadata are stored. The OECD.Stat warehouse has been designed to preserve the decentralised nature of OECD directorates’ statistical activities, while making their data and metadata part of a coherent corporate system. It will in due course become the sole coherent source of statistical data and related metadata for the organisation’s statistical publications, whether in print or electronic form.
StatWorks is a tool for managing the organisation’s production databases, with functions including data collection, transformation, processing (e.g. aggregation), validation, and ultimately export of validated data to the OECD.Stat warehouse for dissemination. By providing a standard hosting environment for managing production databases, StatWorks is progressively replacing a multiplicity of software platforms presently in service. The validation process is one of the most important features of StatWorks, while its common toolset helps reduce training and support requirements.
MetaStore is the OECD’s corporate metadata facility, and is designed to improve the efficiency of metadata preparation, storage, access, management and incorporation with the OECD’s statistical products. MetaStore is designed to overcome problems of fragmented metadata located in numerous production databases. The implementation of MetaStore provides a common environment for managing this metadata, helping ensure coherence of definitions, eliminating duplication and resolving inconsistencies. Like OECD.Stat, MetaStore can accommodate any kind of metadata related to any level of detail of the corresponding statistics, such as recent updates, methodology, seasonal adjustments, or breaks in series.
Of course, the OECD is also a major publisher of statistical information, which is where PubStat comes in. Technology can be used to gather the statistical data and metadata for a publication, and dispatch it to different uses, whether print or online. And by limiting the risk of human intervention along the way, the margin of error is narrowed. PubStat reduces time-to-publish, not least because it reduces the time that statisticians spend dealing with dissemination and formatting issues. It provides the database manager with an interface for managing content and for generating an output file, combining statistical data and metadata with contemporary, eye-catching publication layout instructions. It is a veritable publishing robot.
This all sounds ideal, if you can find the data. But as everyone knows, locating statistics of interest can be a real challenge for users not already familiar with the structures of the parent production databases. That is why a state-of-the-art presentation tool (“browser”) and new interface are being developed for access to the OECD.Stat warehouse. The new browser will not only be easier to use, but also incorporate advanced search features and allow multiple combinations of data, such as combining environmental indicators with statistics on economic growth.
Putting it all together has been a challenge, and the new statistical information system has been developed following the best professional practices and according to well established IT standards. Each component is thoroughly tested before being put in service. The OECD intends to share its software developments, such as StatWorks and MetaStore and their design principles, with national and international organisations worldwide, particularly where management of decentralised statistical activities is involved. Indeed, sharing arrangements are already under way with FAO, ILO, UNCTAD and UNESCO. Such co-operation will not only minimise duplication of effort, but also, in the spirit of “open source” software, generate feedback so the system can continue to be improved for the benefit of all.
Lee Samuelson, OECD Information Technology and Network Services,and Lars Thygesen, OECD Statistics Directorate
©OECD Observer No 246/247, December 2004-January 2005