1. Introduction
Data cleaning and transformation are critical steps in preparing data for analysis. Properly cleaned and transformed data ensures the accuracy and validity of your statistical results. This section covers techniques for identifying and handling missing data, recoding variables, and computing new variables in IBM SPSS Statistics.
2. Identifying and Handling Missing Data
2.1. Identifying Missing Data
Descriptive Statistics:
Analyze > Descriptive Statistics > Frequencies
.OK
.Missing Value Analysis:
Analyze > Missing Value Analysis
.OK
to generate a report that provides detailed information about the pattern and extent of missing data.2.2. Handling Missing Data
Exclude Cases with Missing Data:
Data > Select Cases
.If condition is satisfied
and click If
.NOT MISSING(Age) AND NOT MISSING(Gender)
).Continue
and OK
.Replace Missing Values:
Transform > Replace Missing Values
.OK
to replace the missing values.Imputation:
Analyze > Multiple Imputation > Impute Missing Data Values
.3. Recoding Variables
3.1. Recode into Same Variables
Transform > Recode into Same Variables
.Old and New Values
.1
-> Male
, 2
-> Female
).Continue
and OK
.3.2. Recode into Different Variables
Recode into New Variables:
Transform > Recode into Different Variables
.Old and New Values
to define the recoding rules.Continue
and OK
.Example:
18-25
-> 1
, 26-35
-> 2
, etc.).4. Computing New Variables
4.1. Compute Variable
Create New Variables:
Transform > Compute Variable
.Target Variable
field.Numeric Expression
field (e.g., BMI = Weight / (Height^2)
).OK
to create the new variable.Use Functions:
MEAN(var1, var2, var3)
).4.2. Example Calculations
Sum Scores:
Total_Score = Score1 + Score2 + Score3
).BMI Calculation:
BMI = Weight / (Height^2)
).5. Data Transformation
5.1. Standardizing Variables
Analyze > Descriptive Statistics > Descriptives
.Save standardized values as variables
.OK
to generate Z-scores for the selected variables.5.2. Categorizing Continuous Variables
Transform > Visual Binning
.Continue
.OK
to create the binned variable.5.3. Aggregating Data
Data > Aggregate
.OK
to generate the aggregated dataset.Practical Exercises:
Datasets:
Website Links:
Website Links:
Website Links:
Website Links:
Quiz 1: Identifying and Handling Missing Data
Quiz 2: Recoding and Computing Variables
Transform > Compute Variable
in SPSS?Assignment 1: Use a healthcare dataset to identify and handle missing data. Document the steps and rationale for the methods used.
Assignment 2: Recode age groups into categorical variables and compute a new variable (e.g., BMI) using provided weight and height data.
Assignment 3: Standardize a set of continuous variables and create visualizations to compare the distributions before and after standardization.
These detailed notes, resources, and exercises provide a comprehensive guide to data cleaning and transformation in IBM SPSS Statistics, essential for ensuring data accuracy and reliability in healthcare and medical research.