Abstract:WITH the phenomenal growth of the Web, there is an ever-increasing volume of information being published on numerous Web sites. This vast amount of accessible information has raised many new opportunities and challenges for knowledge discovery and data engineering researchers. For programs that seek to analyze Web content, the heterogeneity in authorship and the consequent lack of structure are formidable hurdles. Discovering and extracting novel and useful knowledge from Web sources call for innovative approaches that draw from a wide range of fields spanning data mining, machine learning, statistics, databases, information retrieval, artificial intelligence, and natural language processing. In Web search, although general-purpose search engines are very useful, finding specific or targeted information can still be a frustrating experience. Highly effective, domainspecific, and personalized search techniques are not yet mainstream. In e-commerce, a whole range of online techniques are also needed to support such applications. For example, in online shopping, there are no human shop assistants to help customers. Instead, automated techniques are needed to learn from the behaviors of users in order to provide effective recommendations and assistance. Mining, extracting, and integrating Web information are challenging problems as well because there is still no mature technique to integrate information from structured (stored database), ad hoc structured (shopping sites), and unstructured (product reviews) sources. Clearly, format standards for semistructured data will not solve all of these problems. This special issue of IEEE Transactions on Knowledge and Data Engineering brings together some of the latest research results in the field. It presents seven papers which deal with a wide range of problems. All of the accepted papers propose some novel and/or principled techniques to solve these problems. Of the seven papers, three focus on domain specific and personalized Web search, one proposes a principled technique for collaborative filtering, one studies Web page cleaning for identifying informative structures and content blocks in Web pages, one studies classification of Web pages based on positive and unlabeled training examples, and one studies the clustering of XML data for efficient storage and querying of such data. The first paper by Michelangelo Diligenti, Marco Gori, and Marco Maggini studies Web page scoring for Web search and resource discovery. Current methods for the purpose are mainly based on the analysis of hyperlinks. The structure of the hyperlinks is the result of collaborative activities of the community of Web authors. Web authors usually like to link resources they consider authoritative, and authority emerges from the dynamics of popularity of the resources on the Web. This paper proposes a general probabilistic framework based on random walk of links for Web page scoring that incorporates and extends many existing models. Their results show that the proposed framework is effective and is particularly suited for focused or vertical search. The second paper by Satoshi Oyama, Takashi Kokubo, and Toru Ishida describes an interesting technique for domain specific Web search. The basic idea is to find a set of domain specific keywords (which the authors call keyword spices) that can be used as the context of the search queries in the domain. A nice algorithm based on text classification is given for identifying a reasonably complete set of such keyword spices. To perform text classification, it collects training pages from the Web through a search using an initial set of keywords of the domain. The main advantage of the proposed method is that it does not need to collect and index domain specific pages as most domain specific search engines do. The work is also related to research in query expansion and modification, but deals with a slightly different problem and offers different approaches. The third paper by Fang Liu, Clement Yu, and Weiyi Meng also studies Web search, more specifically, personalized Web search. Since general-purpose search engines do not consider user’s interests, their search results may not be interesting to a specific user. Personalized search aims at carrying out search for each user incorporating his/her interests. In this paper, the authors propose to employ a user profile and a general profile to constrain the search. The user profile is learned from the user’s search history, which contains the user interested categories and weighted terms in the categories. The general profile is built using the categories from the Open Directory Project. The key advance of the technique is that it maps each user query to some categories. At the search time, the system first uses the profiles to infer the categories of the search terms in question. Then, the search terms are augmented with each category as the context to perform search. The search results are then merged to produce a single result ranking. A comprehensive experimental evaluation is described in the paper. The fourth paper by Hung-Yu Kao, Shian-Hua Liu, JanMing Ho, and Ming-Syan Chen focuses on the cleaning of 2 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 1, JANUARY 2004

Perspective: Protocells and the Path to Minimal Life

Building the blueprint of life

Protocell Self-Reproduction in a Spatially Extended Metabolism-Vesicle System

Evolutionary Abilities of Minimalistic Physicochemical Models of Life Processes

DNA-empowered synthetic cells as minimalistic life forms

Life as a Guide to its Own Origins

Protocell Effects on RNA Folding, Function, and Evolution

Modelling the prebiotic origins of regulation & agency in evolving protocell ecologies

Compartmentalization as a ubiquitous feature of life: from origins of life to biomimetics

Origin of life from a maker's perspective -- focus on protocellular compartments in bottom-up synthetic biology

New Insights on the Chemical Origin of Life: The Role of Aqueous Polymerization of N‐carboxyanhydrides (NCA)

A PDE Model for Protocell Evolution and the Origin of Chromosomes via Multilevel Selection

How to make a minimal genome for synthetic minimal cell

Guest Editors' Introduction: Special Section on Mining and Searching the Web

Droplet-Templated Synthetic Cells

Motility at the origin of life: Its characterization and a model

Thermally Driven Membrane Phase Transitions Enable Content Reshuffling in Primitive Cells

Multistable protocells can aid the evolution of prebiotic autocatalytic sets

Minimal Out-of-Equilibrium Metabolism for Synthetic Cells: A Membrane Perspective

Design principles, growth laws, and competition of minimal autocatalysts

A tuneable minimal cell membrane reveals that two lipid species suffice for life