Guides &
Reference.
Practical explanations, worked examples, and ready-to-use syntax for everyday research tasks.
Weighting
How to correct for sample imbalance and interpret the results
What is weighting and why does it matter?
The fundamental concept explained
No survey sample is a perfect mirror of the population. Even with the best sampling design, some groups end up over- or under-represented in your final data — because they were harder to reach, more or less likely to respond, or simply because random variation in a finite sample created imbalances.
Weighting corrects for this by mathematically adjusting each respondent's contribution to the data, so that the final weighted dataset better reflects the true population structure.
A simple way to think about it: if 20% of your sample are aged 18–24 but they represent 30% of the real population, then each 18–24 respondent should count for more than one person. Weighting assigns them a weight greater than 1 to make up the shortfall.
You survey 400 people. The population is 50% male / 50% female, but your sample came out 40% male / 60% female.
| Gender | Sample % | Population % | Weight |
|---|---|---|---|
| Male | 40% | 50% | 50/40 = 1.25 |
| Female | 60% | 50% | 50/60 = 0.83 |
Each male respondent now counts as 1.25 people, and each female respondent counts as 0.83 people. The weighted data reflects a 50/50 split.
When should you weight?
You should weight whenever your sample profile differs meaningfully from the population on variables that are likely to relate to your key measures. Common weighting targets include age, gender, region, social grade, and education level.
Weighting is not always necessary — if your sample already closely matches the population on all key dimensions, weighting may change very little and introduce unnecessary complexity.
Types of survey weights
Design weights, post-stratification, rim weighting, and more
Design weights
Design weights correct for known differences in the probability of selection rather than differences in response rates. If one stratum was sampled at twice the rate of another, respondents in that stratum receive a weight of 0.5 to compensate.
Design weights are applied before any post-stratification and are determined by the sample design, not the achieved sample profile.
Post-stratification weights (cell weighting)
The most common form in commercial market research. You define target cells (e.g. male 18–34, female 35–54) and calculate a weight for each cell based on the ratio of population proportion to sample proportion.
Limitation: Cell weighting requires enough respondents in every cell to produce stable weights. With many demographic variables, cells can become very small, leading to extreme weights.
Rim weighting (raking)
Rim weighting — also called iterative proportional fitting or raking — allows you to weight to multiple variables simultaneously without needing population data for every combination of those variables.
Instead of weighting to age × gender × region cells, you weight to the age margins, gender margins, and region margins separately, iterating back and forth until the sample converges on all targets at once.
Step 1: Weight the data to match the age distribution. This will slightly disturb the gender balance.
Step 2: Re-weight to match the gender distribution. This slightly disturbs the age balance.
Step 3: Repeat until both distributions are within tolerance (typically <0.1% of target).
Most rim weighting converges in 5–20 iterations. The result is a single composite weight per respondent that simultaneously satisfies all margin targets.
Calibration weights
A generalisation of rim weighting that uses regression-based methods to bring survey estimates into alignment with known population totals. More common in official statistics and large-scale government surveys than commercial research.
Applying weights — a worked example
Step-by-step from raw data to weighted results
You've completed a survey of 500 adults asking whether they support a proposed new policy. You want to weight by age and gender to match the national population.
Step 1 — Get your population targets
Source population data from official statistics (ONS in the UK, Census Bureau in the US). For this example:
| Group | Population % | Sample n | Sample % |
|---|---|---|---|
| Male 18–34 | 14% | 52 | 10.4% |
| Male 35–54 | 18% | 88 | 17.6% |
| Male 55+ | 18% | 70 | 14.0% |
| Female 18–34 | 14% | 78 | 15.6% |
| Female 35–54 | 18% | 102 | 20.4% |
| Female 55+ | 18% | 110 | 22.0% |
Step 2 — Calculate cell weights
Weight = Population % ÷ Sample %. For Male 18–34: 14% ÷ 10.4% = 1.346. For Female 55+: 18% ÷ 22% = 0.818.
Step 3 — Apply weights to your analysis
Each respondent in Male 18–34 now counts as 1.346 people. When you calculate the % who support the policy, you multiply each response by that person's weight before summing.
Unweighted: 58% support the policy (290 out of 500 said yes)
Weighted: 54% support the policy
The 4pp difference exists because younger males — who were under-represented and support the policy less — receive higher weights in the weighted data, pulling the total figure down.
Capping weights
When and how to trim extreme weights
Extreme weights — where a single respondent is weighted up to represent 4, 5, or even 10 times their actual count — create a serious problem. A very heavily weighted respondent has a disproportionate effect on your results, and if their answers are in any way atypical, the weighted data can be substantially distorted.
What is capping?
Weight capping (also called weight trimming) sets a maximum value that any single weight can take. Respondents who would have received a weight above this cap are instead given the cap value, and the remaining weight is redistributed to others in the same marginal group.
Common capping thresholds
There is no universal rule, but common thresholds in commercial research are:
- 3.0× — frequently used in online panel research with broad population targets
- 4.0× — a more permissive cap sometimes used when sample sizes allow
- 0.33×–3.0× — a symmetric cap that limits both high and low extremes
Some researchers express this as a ratio of the highest to lowest weight: a 4:1 max ratio means no weight should be more than four times any other weight in the same dataset.
The trade-off
Capping reduces variance (making your results more stable) but introduces bias (your weighted data no longer perfectly matches population targets). This is always a trade-off — there is no cap that eliminates both problems simultaneously.
Your sample has 8 men aged 18–24 but they represent 12% of the population (60 of 500). Each receives a weight of 7.5 — meaning each one counts as 7.5 people.
Problem: If one of these 8 respondents gave an unusual answer, it accounts for 7.5 votes. A single outlier can swing your national figure by several percentage points.
With a cap of 3.0: Each is capped at 3.0. You now only represent 24 of the 60 needed from this group. The remaining 36 are redistributed across other cells, introducing a small bias but dramatically reducing the influence of any single respondent.
Recommendation: If a large number of respondents hit the cap, reconsider whether your population targets are appropriate or whether you need to boost that subgroup in your sample design.
How to decide on a cap
Look at the distribution of your pre-cap weights before deciding. If only 2–3 respondents exceed 3.0 and most weights sit between 0.5 and 2.5, a cap of 3.0 will have very little impact on your population fit. If 15% of respondents would exceed 3.0, the cap is doing heavy lifting and your sample design may need review.
Checking and diagnosing your weights
Weight efficiency, DEFF, and distribution diagnostics
Always check before reporting
Before using weighted data in analysis, it is good practice to run a set of diagnostics. Extreme or poorly distributed weights can silently distort your findings.
Check 1 — Weight distribution
Run a frequency table of your weight variable. Look at the minimum, maximum, mean, and standard deviation. A well-behaved weight variable in a 1,000-person survey typically has: mean ≈ 1.0, most weights between 0.4 and 2.5, and very few (if any) above 3.0.
Check 2 — Effective sample size
The effective sample size (ESS) tells you how much statistical power your weighted data actually has. It is calculated as (Σw)² / Σw². An ESS of 750 from a nominal sample of 1,000 means you have lost 25% of your precision through weighting.
Use our Effective Sample Size calculator to check this for any dataset.
Check 3 — Design effect (DEFF)
DEFF = n / ESS. A DEFF of 1.3 means your margins of error should be calculated using the effective n, not the nominal n. Always report margins of error based on effective sample size in weighted data.
Check 4 — Profile after weighting
Run a crosstab of each weighting variable after weighting is applied. The weighted profile should match your targets. If it doesn't converge, your weighting algorithm may not have reached a solution, or there may be conflicting targets.
✓ Mean weight ≈ 1.0 (or ≈ population/sample size if not normalised)
✓ Max weight below your chosen cap
✓ Weight efficiency above 70% (ideally above 80%)
✓ Weighted profile matches population targets on all weighting dimensions
✓ Margins of error calculated using effective n, not nominal n
Common weighting mistakes
What to watch out for in practice
Using unweighted n in significance tests
Perhaps the most common error. Significance tests depend on sample size — using the nominal n (1,000) rather than the effective n (e.g. 780) will make differences appear more statistically significant than they actually are. Always base significance calculations on effective n for weighted data.
Weighting to out-of-date population data
If your population benchmarks are from a census five years ago and demographic patterns have shifted meaningfully, your weights may be correcting for a population that no longer exists. Check that your population data is current and appropriate for the population you are studying.
Weighting to the wrong population
If your survey is of homeowners, weighting to the general adult population profile may introduce bias rather than reduce it — homeowners have a different demographic profile. Always weight to the profile of the target population, not the general population.
Ignoring small cells
Cell weighting with small cells (fewer than ~15–20 respondents) produces unstable weights with high variance. Consider collapsing age or geographic categories, or switching to rim weighting which handles small cell sizes more gracefully.
Reporting weighted counts as unweighted
Always make clear in charts and tables whether counts shown are weighted or unweighted. Bases should typically show the unweighted n — so readers can assess reliability — while percentages are calculated from weighted data.
Sampling
Building samples that give you data you can trust
Sample frames
The foundation of any good sample design
A sample frame is the list or mechanism from which your sample is drawn. In an ideal world it is a complete and accurate list of every member of your target population. In practice, perfect frames rarely exist.
Common sample frames in market research
- Online panels — large pre-recruited pools of respondents who have agreed to take surveys. The most commonly used frame in commercial research. Not probability-based.
- Electoral registers — used for face-to-face probability surveys of adults in the UK. Covers ~90% of eligible adults.
- Postcode address files (PAF) — a near-complete list of UK postal addresses, used for household surveys.
- Customer databases — used when surveying existing customers of a company. Coverage depends on the completeness of the CRM.
- Phone number databases — increasingly incomplete due to mobile phone adoption and declining landline usage.
Frame error
Coverage error occurs when the frame does not cover some part of your population. If your frame is an electoral register and you want to survey 16–17 year olds (who are not on it), your sample will have zero coverage of that group regardless of how carefully you sample from the frame.
Always ask: who is missing from my frame, and does that matter for this study?
You want to understand internet usage among UK adults. Your frame is an online panel.
Problem: By definition, online panels exclude people without internet access — the very group most likely to have low internet usage. Your frame has systematic undercoverage of the population segment most relevant to your research question.
Solution: Consider supplementing with face-to-face interviewing for offline respondents, or acknowledge the limitation explicitly in your methodology.
Probability vs non-probability sampling
The most fundamental distinction in sampling design
Probability sampling
Every member of the population has a known, non-zero probability of being selected. This is the gold standard because it allows you to make statistically valid inferences about the population and calculate unbiased margins of error.
Pro Theoretically unbiased estimates
Pro Valid, calculable margins of error
Con Expensive and slow to execute
Con Often requires a complete sampling frame
Con Non-response still introduces bias in practice
Non-probability sampling
Respondents are selected without a known probability of inclusion. The vast majority of online market research uses non-probability samples — most online panels are opt-in, and quota-based sampling does not give every population member an equal or known chance of inclusion.
Pro Fast and cost-effective
Pro Flexible and scalable
Con Margins of error are technically not valid
Con Self-selection bias is hard to detect or correct
Note MoE is still widely reported as a useful guide to precision
Most commercial online surveys use non-probability panels and report margins of error as if they were probability samples. This is pragmatic but technically imprecise — the MoE captures only sampling variation, not the potential bias from systematic differences between panel members and the general population.
This doesn't mean the data is bad. It means that context matters, and that weighting and careful quota setting are particularly important for panel-based research.
Types of probability sampling
SRS, stratified, cluster, and systematic explained
Simple Random Sampling (SRS)
Every member of the population has an equal and independent chance of selection. You draw names from a hat (metaphorically). The simplest approach and the basis for standard MoE formulas.
Pro Simple and unbiasedCon May under-represent small subgroups by chance
Stratified sampling
The population is divided into meaningful subgroups (strata) — e.g. by region or age — and a random sample is drawn from each stratum. This guarantees adequate representation of each group and is more efficient than SRS when strata differ meaningfully on the outcome of interest.
Pro Ensures subgroup coveragePro Often more precise than SRS for same n
Cluster sampling
The population is divided into clusters (e.g. schools, neighbourhoods), a random sample of clusters is selected, and then all (or a random subset of) members within those clusters are surveyed. Often used when a complete list of individuals doesn't exist but a list of groups does.
Pro Practical when no complete frame existsCon Less statistically efficient — respondents within clusters are similar
Systematic sampling
Select every k-th person from a list (e.g. every 10th name from a customer database). Simpler to execute than random sampling but can introduce periodicity bias if the list has a regular pattern.
You want to survey 400 people across 4 UK regions, and the North East accounts for only 4% of the population (16 people under SRS). With only 16 North East respondents, your subgroup analysis for that region will have a margin of error of ±24% — essentially useless.
With stratified sampling: You deliberately oversample the North East to achieve 80 respondents there, then apply a design weight of 0.2 to correct for the oversampling in national totals. You now have meaningful regional data and valid national figures.
Non-probability approaches
Quota sampling, convenience, and panels
Quota sampling
Interviewers (or systems) are given quotas to fill — e.g. "recruit 50 men aged 18–34 and 50 women aged 18–34". Sampling continues until all quotas are filled. This controls the sample profile but does not give every population member an equal chance of selection.
Quota sampling is the dominant approach in commercial online research. It controls for known demographic variables but cannot control for unmeasured differences between those who choose to take surveys and those who don't.
Convenience sampling
Recruiting whoever is easiest to reach — website intercepts, social media links, employee surveys. Fast and cheap but highly susceptible to self-selection bias. Best suited to exploratory or qualitative-in-spirit quant work rather than population inference.
Online panels
Pre-recruited pools of respondents who receive survey invitations by email. Panel members self-selected into the panel and tend to be more survey-aware, digitally engaged, and demographically skewed than the general population. Weighting corrects for some of this but not all.
Quality of panel data varies significantly by provider. Key factors to consider: panel size, recruitment methodology, engagement rates, deduplication practices, and response time distribution.
Watch out for: very fast completion times (under 3 minutes for a 15-minute survey), straight-lining on grid questions, failing attention check questions, and open-text responses that are clearly nonsensical. These indicate low-quality respondents that should be screened out before analysis.
Sample size — rules of thumb
How big does my sample actually need to be?
It depends on what you need to do with the data
There is no universal answer, but here are the practical considerations:
- National totals only: 1,000 is the industry standard in the UK and US. This gives ±3.1% MoE at 95% confidence.
- Subgroup analysis: You need adequate n within every subgroup you plan to analyse. Rule of thumb: 100 per subgroup for reporting, 50 as an absolute minimum.
- Tracking surveys: If you're measuring change over time, you need enough power to detect meaningful shifts. A 3pp change requires roughly 1,000–1,500 per wave to be reliably detectable at 95% confidence.
- Experimental designs (A/B): Use a power calculation to determine sample size. This requires specifying the minimum detectable effect size and acceptable error rates.
The subgroup trap
The most common sample size mistake in commercial research is designing to a national total without thinking about subgroups. A sample of 500 gives ±4.4% MoE nationally. But if you want to compare ABC1 vs C2DE (roughly 55%/45% split), you're working with effective subgroup bases of ~275 and ~225 — with MoE of ±5.9% and ±6.5% respectively. A 5pp gap between these groups is not significant.
You need to report results for 6 UK regions and want at least 100 per region. London is 15% of the adult population — so you'd naturally get 150 per region in a nationally representative 1,000-person sample. But the North East is only 4% — you'd get just 40. You need to boost the North East specifically and apply design weights.
Use our Target Sample Size calculator to plan your subgroup requirements before fieldwork.
Common pitfalls
- Non-response bias: Low response rates don't just reduce sample size — they introduce bias if non-responders differ systematically from responders.
- Attrition in trackers: In longitudinal studies, dropout is rarely random. Track who drops out and compare their earlier responses to completers.
- Over-claiming from small bases: Reporting percentages on bases below 50 without a caveat is a common but serious error. Consider suppressing data or using starred asterisks as a warning.
SPSS Syntax
Ready-to-use syntax — click any block to copy
Labelling variables and values
VARIABLE LABELS, VALUE LABELS
Good labelling is the difference between a data file that is self-explanatory and one that requires the questionnaire to be open at all times. Label everything before analysis begins.
Variable labels
VARIABLE LABELS Q1 'Overall satisfaction with the service' Q2 'Likelihood to recommend (NPS)' Q3_1 'Reason for visit: Browse' Q3_2 'Reason for visit: Purchase' Q3_3 'Reason for visit: Return item' WVAR 'Weighting variable' .
Value labels
VALUE LABELS Q1 1 'Very satisfied' 2 'Fairly satisfied' 3 'Neither satisfied nor dissatisfied' 4 'Fairly dissatisfied' 5 'Very dissatisfied' -1 'Don''t know' . VALUE LABELS Q3_1 Q3_2 Q3_3 0 'Not selected' 1 'Selected' .
Defining missing values
/* Mark -1 (Don't know) and -2 (Not applicable) as user-missing */ MISSING VALUES Q1 Q2 (-1, -2). MISSING VALUES Q3_1 TO Q3_10 (-1, -2).
Applying a weight
WEIGHT BY, turning weights on and off
The WEIGHT BY command tells SPSS to use a specified variable as a weight for all subsequent procedures. It remains active until you explicitly turn it off.
Apply a weight
/* Apply the weighting variable WVAR */ WEIGHT BY WVAR.
Turn weighting off
/* Turn weighting off — all subsequent procedures use unweighted data */ WEIGHT OFF.
Check weighting is applied correctly
/* Run frequencies on a known demographic variable to check weighted profile */ WEIGHT BY WVAR. FREQUENCIES VARIABLES = AGE_GROUP GENDER REGION /ORDER = ANALYSIS.
Always include WEIGHT BY WVAR. at the top of every syntax file that uses weighted data. If you open the file later and re-run only part of it, the weight should be the first thing that executes.
Applying a filter
FILTER, SELECT IF, USE
Filters restrict analysis to a subset of respondents. There are two main approaches: FILTER (which excludes but retains cases) and SELECT IF (which permanently deletes excluded cases). Use FILTER in almost all circumstances — SELECT IF is irreversible without re-reading the data.
Apply a simple filter
/* Filter to respondents who visited a store in the last month (Q1 = 1) */ FILTER BY Q1 = 1. /* Alternative: create a filter variable then apply it */ COMPUTE FILTER_VAR = (Q1 = 1). VARIABLE LABELS FILTER_VAR 'Filter: Visited store last month'. FILTER BY FILTER_VAR.
Filter using multiple conditions
/* Filter to ABC1 respondents aged 18-44 */ COMPUTE FILT_ABC1_1844 = (SOCIAL_GRADE <= 2 AND AGE >= 18 AND AGE <= 44). FILTER BY FILT_ABC1_1844.
Turn off the filter
/* Return to all respondents */ FILTER OFF. USE ALL.
Checking a routing from a multicoded question
Verify who was shown a routed question
A common task: Q5 was only shown to respondents who selected code 3 ("Online") at Q3, which is a multicoded question. You need to verify the routing was applied correctly and then produce Q5 results among the correct base.
Step 1 — Understand the data structure
Multicoded questions are typically stored as a series of binary variables: Q3_1, Q3_2, Q3_3... where 1 = selected and 0 = not selected. Q3_3 = 1 means the respondent selected code 3 at Q3.
Step 2 — Check the routing
/* Count Q5 responses by whether they were routed via Q3_3=1 */ /* Q5 should only have valid answers (non-missing) where Q3_3=1 */ CROSSTABS /TABLES = Q3_3 BY Q5 /FORMAT = AVALUE TABLES /CELLS = COUNT ROW /COUNT ROUND CELL.
Step 3 — Produce Q5 results on the correct base
/* Filter to those who selected Online at Q3, then run Q5 */ WEIGHT BY WVAR. FILTER BY Q3_3 = 1. FREQUENCIES VARIABLES = Q5 /ORDER = ANALYSIS. FILTER OFF. USE ALL.
You run the crosstab and find 15 respondents have a valid answer at Q5 but Q3_3 = 0. This means either: (a) the routing was applied incorrectly in the survey, or (b) the data has been processed incorrectly. Investigate before continuing — these 15 cases should either be cleaned or excluded from Q5 analysis.
Creating sum and index variables
COMPUTE, SUM function, handling missing values
Common tasks: counting how many codes a respondent selected at a multicoded question, summing scores across a battery, or creating an index variable.
Count the number of codes selected (multicoded question)
/* Q3 is multicoded with 8 options (Q3_1 to Q3_8, coded 0/1) */ /* Create a variable that counts how many options each respondent selected */ COMPUTE Q3_COUNT = Q3_1 + Q3_2 + Q3_3 + Q3_4 + Q3_5 + Q3_6 + Q3_7 + Q3_8. VARIABLE LABELS Q3_COUNT 'Q3: Number of reasons selected'.
Sum a battery (handling missing values correctly)
/* SUM() function ignores missing values; arithmetic (+) propagates them */ /* Use SUM() when some respondents may have missing values on individual items */ COMPUTE SCORE_TOTAL = SUM(Q10_1, Q10_2, Q10_3, Q10_4, Q10_5). VARIABLE LABELS SCORE_TOTAL 'Q10: Total score (sum of 5 items)'. /* Mean score across the battery */ COMPUTE SCORE_MEAN = MEAN(Q10_1, Q10_2, Q10_3, Q10_4, Q10_5). VARIABLE LABELS SCORE_MEAN 'Q10: Mean score across 5 items'.
Create an index (subgroup vs total)
/* Index = (subgroup % / total %) x 100 */ /* Example: create an awareness index — score above 100 = above average awareness */ /* First get total awareness %, then compute index per subgroup in your reporting tool */ /* In SPSS this is typically done at the output/table stage, not as a computed variable */ /* Create a binary awareness variable (Q4=1 means aware) */ COMPUTE AWARE = (Q4 = 1). VARIABLE LABELS AWARE 'Awareness (binary)'. VALUE LABELS AWARE 0 'Not aware' 1 'Aware'.
Frequencies and crosstabs
FREQUENCIES, CROSSTABS with column %
Basic frequencies
WEIGHT BY WVAR. FREQUENCIES VARIABLES = Q1 Q2 Q3 /FORMAT = NOTABLE /BARCHART FREQ /ORDER = ANALYSIS.
Crosstab with column percentages (the standard format)
/* Q1 (rows) by gender (columns), showing column % */ WEIGHT BY WVAR. CROSSTABS /TABLES = Q1 BY GENDER /FORMAT = AVALUE TABLES /STATISTICS = CHISQ /CELLS = COUNT COLUMN /COUNT ROUND CELL.
Multiple dependent variables, multiple breaks
/* Run Q1, Q2, Q3 by gender, age group, and region simultaneously */ WEIGHT BY WVAR. CROSSTABS /TABLES = Q1 Q2 Q3 BY GENDER AGE_GROUP REGION /FORMAT = AVALUE TABLES /CELLS = COUNT COLUMN /COUNT ROUND CELL.
Recoding variables
RECODE, collapsing categories, creating age groups
Always recode INTO a new variable (RECODE ... INTO) rather than overwriting the original. This preserves your raw data.
Recode a satisfaction scale to top/bottom box
/* Q1 is a 5-point satisfaction scale (1=Very satisfied, 5=Very dissatisfied) */ /* Create top 2 box (1-2) and bottom 2 box (4-5) variables */ RECODE Q1 (1,2=1)(3=2)(4,5=3)(MISSING=SYSMIS) INTO Q1_RECODED. VARIABLE LABELS Q1_RECODED 'Q1: Satisfaction (3-point)'. VALUE LABELS Q1_RECODED 1 'Top 2 box (satisfied)' 2 'Neither' 3 'Bottom 2 box (dissatisfied)'.
Create age groups from a continuous age variable
RECODE AGE (18 THRU 24 = 1) (25 THRU 34 = 2) (35 THRU 44 = 3) (45 THRU 54 = 4) (55 THRU 64 = 5) (65 THRU HIGHEST = 6) (MISSING = SYSMIS) INTO AGE_GROUP. VARIABLE LABELS AGE_GROUP 'Age group (6-band)'. VALUE LABELS AGE_GROUP 1 '18-24' 2 '25-34' 3 '35-44' 4 '45-54' 5 '55-64' 6 '65+'.
Net / combine codes
/* Combine awareness codes 1 (prompted) and 2 (spontaneous) into a single net */ COMPUTE AWARE_NET = (Q4 = 1 OR Q4 = 2). VARIABLE LABELS AWARE_NET 'Awareness net (prompted + spontaneous)'. VALUE LABELS AWARE_NET 0 'Not aware' 1 'Aware (net)'.
Syntax tips and best practices
Habits that will save you time and prevent errors
Always start your syntax file the same way
/* ================================================ PROJECT: [Project name] WAVE: [Wave number / date] ANALYST: [Your name] CREATED: [Date] NOTES: [Any important context] ================================================ */ /* 1. Open the data file */ GET FILE = 'C:\Projects\MyProject\data\w3_data.sav'. /* 2. Apply weight */ WEIGHT BY WVAR. /* 3. Set output options */ SET TVARS LABELS. SET TNUMBERS VALUES.
Key shortcuts and habits
- Ctrl+A, Ctrl+R — select all, run all. Quick way to re-run your full syntax.
- Ctrl+R on a selection — run just the highlighted block.
- Comment liberally — use
/* comment */to explain every section. Your future self will thank you. - Save syntax, not just output — output is a snapshot; syntax is the source of truth.
- Never overwrite raw variables — always RECODE INTO a new variable name.
- EXECUTE after COMPUTE — if you COMPUTE a variable and immediately use it in another COMPUTE, insert EXECUTE. in between to force evaluation.
You write: COMPUTE A = B + 1. then COMPUTE C = A * 2. — SPSS may not have processed A before it evaluates C, leaving C with system missing values. Fix: insert EXECUTE. between the two COMPUTE statements to force execution.