Panel Discussion at COLING 2002

Semantic Web: A New Challenge for Language Technology

The vision of the Semantic Web as pictured by Berners-Lee, Hendler and Lassila in their famous 2001 Scientific American article has become a driving force for many large and small initiatives aiming at turning the wealth of nearly unstructured digital information into a semantically structured global knowledge base. New initiatives are underway, welcomed and supported by industry, government agencies and academic associations. The Semantic Web seems to become the major hype of the first decade of our new millenium.

What is the relevance of the Semantic Web program for our discipline? Can it provide realistic tasks and useful resources that traditional AI could not deliver? Will it render some of our new language technologies obsolete that were dedicated to the exploitation of unstructured data? Will the Semantic Web need automatic language processing in order to succeed?

The panel will concentrate on but not be restricted to the following issues:

  1. The employment of language technology for the construction of useful ontologies:
    One of the shortcomings of hand-crafted AI ontologies was their artificial nature. Useful ontologies do rarely meet the high aesthetic standards of philosophers or domain-specialized theoreticians. Can data-oriented language technology facilitate the detection of useful ontologies that reflect the needs and daily tasks of their users?
  2. The exploitation of Semantic Web ontologies for LT applications such as information extraction:
    Domain modelling is a serious bottleneck for many language technology applications. Can the Semantic Web movement help us by providing well-designed ontologies for a multitude of knowledge domains?
  3. The challenge of (partially) automating the detection and annotation of concepts:
    One of the major shortcomings of the original Semantic Web vision is its reliance on extensive hand annotation of large volumes of digital resources. As we know from daily experience, content developers (authors) do not even exploit the modest means for encoding meta-information that is provided by HTML. They do not have the time and patience to find and insert the most useful hyperlinks. How can one expect that the web will become semantified by human annotation?
  4. The utilization of the Semantic Web as a resource for machine learning in NLP:
    Supervised learning from hand-annotated texts plays a major role in language technology research and development. Will the Semantic Web movement create large volumes of annotated texts? Can these texts be used for machine learning techniques that improve topic detection, information extraction, question answering and other language technologies? Can systems for automatic annotation be trained in a bootstrapping fashion?
  5. The relationship between the Semantic Web and multilinguality:
    The planned dense semantic markup will facilitate cross-lingual navigation and information retrieval. Will the semantic web really contribute to overcoming language barriers by making information better accessible across languages? Will contents in all languages be annotated and crosslinked at the same time and in comparable proportions? What is the role of language technology in this process? Will the Semantic Web help to reduce the knowledge gap among or will this gap be widened?
  6. The Semantic Web and language variation:
    Most knowledge technologists have given up on the idea of one comprehensive ontology for all users and all purposes. Preference is given today to the vision of a whealth of ontologies with many partial overlaps and mappings. To establish the association between users, situations and the appropriate ontologies may be an issue for knowledge management, but the association of the appropriate ontologies to texts could also become a topic for language technologists. Will ontologies be marked for certain variants of language such as historical variants, sociolects or professional and genre specific jargons? Can those variants be automatically detected?

Contributors in alphabetic order:
Paul Buitelaar DFKI, Saarbrücken
Ed Hovy ISI, Marina del Rey
Chu-Ren Huang, Academia Sinica, Taipei
Nancy Ide Vassar College, Poughkeepsie

Coordinator
Hans Uszkoreit, DFKI and Saarland University