Abstract:In the practice of data mining (DM) and ata warehousing (DWH), real-life data rrive in various different form ats, and without putting them into an acceptable shape, even the most intelligent DM/DWH tool would be useless. SumatraTT (Transformation Tool) is an original univers al data pre-processing tool allowing to access and transform data stored in various types of datasources (e.g. plain text, SQL etc.). We briefly review the concept of the syst em and summarize its recent developments. The paper briefly overviews the connectivity with inductive logic programming (ILP) systems and then informs on more recently added features consisting of new data interfaces , scripting features, and templates. The usage of Sumatra TT on an example application is hortly demonstratied. After a brief touch upon ear-future plans, we finally discuss ome questions typically arising at the first usage of SumatraT T. 1 OUR MOTIVATION AND GOALS DM algorithms [6] are being designed by researchers and SW houses all over the world. There are many of them and their offer is continuously growing. Their different ava ilable implementations differ in principles as well as in tiny details uch as the format used for the input data. Moreov er, most often the data subjected to DM have not been coll ected for DM purposes primarily; on the contrary they serve .g. as a company archive. Consequently, the format of such data cannot meet the requirements of a specific DM algorithm most often. The challenge of a DM task is in finding the algorithm which will reveal interesting observations in the considered ata. But o reach this goal many experiments have to be done. One cannot decide in advance which set of DM tools or which derived attribut es will prove most useful for the given problem. Thus origina l data have to be processed or transformed in different wa ys to make them usable by the chosen DM algorithms. Forma t of data has to be changed, data has to be cleaned, filte red, aggregated, etc. This is the purpose of data transformation systems which have recently appeared as independent SW tools supporting DM process itself [14]. This is an important step simplifying data preparation processes and supporting experiments with real ife data. The common pain of state-of-art data transformation systems is t heir insufficient generality. Our goal is to overcome this problem by designing a developing a system • that allows for virtually any customization with respect to different data standards and requirements on the transformation, • but at the same time provides ultimate ase-of-use in cases where only standard procedures are required. The former goal can be achieved by providing a dataprocessing oriented scripting language, and the latter goal by providing templates of the common procedures and standard interfaces to many kinds of datasources. These ideas form the design principles of the data preprocessing tool SumatraTT described briefly in the next section. Most industries benefit from appropriate standardization. Positive reaction of the research community to the a ctivity of the PMML group roves this is the case of the resear ch field concerned with decision support systems, too. Prediction Model Mark-up Language is being developed to simplify exchange or sharing the results „between compliant vendors ́ applications ... so that proprietary issues and incompababilities are no longer a barrier“ ( http://www.dmg.org/pmmlspecs_v2/ ). Similar view can be taken towards data-transformations. We hope that our study of data transfomations using SumatraTT will complemet to development of a standard for data transfomation, namely to the Data Transformation Ma rkup Language (DTML) supporting e.g. reuse of the same data by different algorithms through seemless import, rapi d development of derived attributes, etc. 2 THE CONCEPT OF SUMATRA SumatraTT (Transformation Tool) is a metadata-driven, platform independent, extensible, and universal data processing tool [3]. The mentioned features have been achieved by building the tool as an interpreter of the transformation-oriented scripting language called Sumatra [2]. The Sumatra language is afully interpreted Java-like language combining data access, metadata access, and common programming constructions. Furthermore, it supports the RAD (Rapid Application Development) technology by means of the library of reusable transformation templates . The principal scheme of SumatraTT is hown in Figure 1. As can be seen in the figure, the central part of Sumat raTT is the Metadata repository module. Basically, the reposi tory plays two roles. It is the central storage consisting of descriptions of all data sources and ata transformations to be used. Moreover, the repository contains data objects interconnecting the abstract data access level in the Sumatra interpreter with real-life data sources. This intermediated connection helps to unify data ccess to v ery different data sources (e.g. SQL-based data sources, plain text files, etc). Such unification makes the process of transformation script development easier and data source independent. Moreover it separates the transformation "logic" from the data connection problems. In the case of very complicated ata pre-processing task, the development of a data transformation script can be rather time consuming. SumatraTT allows to speed up this process by using reapplicable transformation templates. The idea of reusable templates is based on the library of solved t ypes of tasks. E.g. there is adata set containing time series and we need to calculate astatistical characterization of t he data. If this is carried out for the first time, a new template has to be developed. But the next time, the statistical transforma tion script can be developed via parametric modification of the xisting template within afraction of the time re quired before. Every pre-processing task realized using SumatraTT consists of design and run-time phases. It corresponds to a client-server architecture where the design phase consi sts of the definition of all data sources and the development of transformation scripts on the client side. Regarding a typical user who is an expert in data mining or data warehousing but who is not a programmer, the design phase can be carried out using graphical user interface. The GUI allows to interactively realize both the dat a definition and script development by simple clicking on wizards. On the other hand, the run-time phase corresponds to ascript execution on the server side. Fro m the user's perspective, the execution can be invoked immediately or scheduled for a later run.

Tailored vocational rehabilitation for people with a work disability pension in The Netherlands; an in-depth data analysis of the content and outcomes of vocational rehabilitation trajectories of the Social Security Institute

Dutch Dataset Vocational Rehabilitation for Chronic Musculoskeletal Pain: Baseline Patients' Characteristics and Program Eligibility

A roadmap for sustainable implementation of vocational rehabilitation for people with mental disorders and its outcomes: a qualitative evaluation

Experiences of workers with long-term disabilities on employer support throughout the RTW process in The Netherlands: a qualitative study

Return to Work of Disability Insurance Beneficiaries Who Do and Do Not Access State Vocational Rehabilitation Agency Services

Double Burden of Disability and Poverty: Does Vocational Rehabilitation Ease the School‐to‐Work Transition?

Vocational Rehabilitation Services and Employment Outcomes for People with Disabilities: A United States Study

Work resumption after vocational rehabilitation: A follow-up two years after completed rehabilitation

Psychological Service Utilization and its Impact on Return to Work in Vocational Retraining Centers: A Cohort Study

Clinical and Socio-demographic Variables Associated with the Outcome of Vocational Rehabilitation Programs: A Community-Based Italian Study

Effect evaluation of a vocational rehabilitation program for young adults with chronic physical conditions at risk for unemployment: A controlled clinical trial

Development of a Multimodal, Physiotherapist-Led, Vocational Intervention for People with Inflammatory Arthritis and Reduced Work Ability: A Mixed-Methods Design Study

Determinants of sustainable work participation after spinal cord injury in The Netherlands

Vocational rehabilitation from the client's perspective using the International Classification of Functioning, Disability and Health (ICF) as a reference

VR Employment Outcomes of Individuals with Autism Spectrum Disorders: A Decade in the Making

Vocational Rehabilitation of Young Adults with Psychological Disabilities

Basic ideas and advantages of the method of analytical regularization in wave optics: Overview

Process Evaluation of Individual Placement and Support and Participatory Workplace Intervention to Increase the Sustainable Work Participation of People with Work Disabilities

' A new sense of my former self' - transforming the self through vocational rehabilitation for people with acquired brain injury

Elements of Return-to-Work Interventions for Workers on Long-Term Sick Leave: A Systematic Literature Review

SumatraTT : Towards aUniversal Data Preprocessor