Table 1

Summary of variables in the analysis
Category Variables used in the analysis Notes and comments
Medical school outcome data Outcome first year 4 pt Medical school data on student performance in their first academic year for the three cohorts. Not all schools provided data for all cohorts - 11, 11 and 9 schools providing data for the 2007 to 2009 cohorts, for 1,661, 1,710 and 1,440 students. In the same cohorts, UKCAT was used for selection by 23, 25 and 26 medical schools. The overall number of students from the 12 schools varied from 87 to 945 (median = 335, mean = 401, SD = 243). Medical schools were asked to provide several items of information on each student, although not all schools provided all information. Data were collected by the UKCAT Consortium Office, and not by the researchers. Measures used were as follows: OutcomeFirstYear4pt : Outcome of the first year on a four-point scale (Passed all exams at first attempt; passed after re-sitting exams; repeating the first year; and leaving the course); OverallMark, TheoryMark, and SkillsMark: Averaged percentage marks in medical school assessments. OverallMark, based on all assessments, was available for 4,510 students, one school providing only OutcomeFirstYear4pt, and occasional students elsewhere not having percentage marks; in each case a proxy OverallMark was calculated as a normal score, using SPSS’s Rank Cases/Normal Scores command. Separate marks were also available for ‘Theory’ and ‘Skills’ assessments, the definition of theory and skills being left to medical schools. TheoryMark and SkillsMark were available for 2,075 and 3,184 students. Because percentage marks are not necessarily comparable across schools, OverallMark, TheoryMark and SkillsMark were standardized to a mean of zero and SD of one within medical schools and cohorts.
Overall mark
Theory mark
Skills mark
Prior educational achievement Alevel_number_total We will describe the analysis of A-levels in some detail. Other examinations show minor variations from the analysis of A-levels which we will then describe.
Alevel_number_total Alevel_Totalbest
Alevel_TotalPoints A (Advanced) levels. Scored as A = 10, B = 8, C = 6, D = 4, E = 2, Else = 0. A* grades at A level were not awarded during the study period. Measures were only calculated for students with three or more A-levels, others being set as missing. Fourteen measures separate measures were obtained, described further in the Technical Report [34]. General Studies was not counted in the overall totals, means and so on, but was analyzed separately, as its status is unclear. The measures (with their names in bold), were: Alevel_number_total: Number of non-General Studies A-levels, of the 2,764 entrants, 41.8% had 4 or more; Alevel_Totalbest: Sum of the three highest A-level grades, which was 73.0% of students, was the maximum score of 30 (that is, AAA), with 21.3% scoring 28 (AAB), 5.0% scoring 26 (ABB/AAC), 0.6% scoring 24 (BBB or equivalent), and four candidates scoring 20, 16, 16 and 10. Alevel_TotalPoints: Total points achieved by a student for all of A-levels, which for those taking three A-levels was the same as the previous measure; Alevels_Taken_1_or_more_Biology, Alevels_Taken_1_or_more_Chemistry, Alevels_Taken_1_or_more_Physics, and Alevels_Taken_1_or_more_Maths): a series of ‘dummy variables’, scored as 1 if the subject had been taken and 0 if it had not. 95.7%, 99.1%, 24.8% and 63.3% of A level students had A levels in Biology, Chemistry, Physics and Math. Alevels_highest_Biology, Alevels_highest_Chemistry, Alevels_highest_Physics, and Alevels_highest_Maths: Highest grade attained by a student on Biology, Chemistry, Physics and Math subjects; except for Math, students mostly had taken only one exam in each category; Alevels_Taken_1_or_more_NonScience was a 1/0 dummy variable indicating that a student had A-level(s) other than in the core sciences of Biology, Chemistry, Physics or Math (or General Studies). A total of 49.9% of the students had at least one non-science A-level; Alevels_Taken_1_or_more_GeneralStudies: A 1/0 dummy variable indicating whether a student had taken General Studies A-level; 26.0% had done so; Alevels_highest_GeneralStudies: For students taking General Studies, the highest grade attained, 46.9% having an A grade;
In addition equivalent variables for other qualifications are named in similar ways but with Alevel… replaced by ASlevel…, GCSE…, SQAhigher…, SQAhigherPlus… and SQAadvHigherPlus… .
AS (Advanced Subsidiary) levels. Variables are similar to those for A-levels except that they are named ASlevel… rather than ALevel… Scored as for A levels (A = 10, B = 8, C = 6, D = 4, E = 2, Else = 0). Measures are similar except that students had to have taken at least four AS-levels, and totals were for the best four AS-levels achieved. For reasons which are not clear, fewer students had 4+ AS-levels (n = 1,877) than had 3+ A-levels (n = 2,764). AS-level grades showed more variability than A-levels, only 56.3% of students scoring a maximum 40 points for their best grades, compared with 73.0% of students gaining 30 points from their best three A-levels.
GCSE (General Certificate of Education). Variables are broadly similar to those for A-levels except that they are named GCSE… rather than ALevel… GCSE results were only available for the 2009 entry cohort. Single subjects were scored as A* = 6, A = 5, B = 4, C = 3, D = 2, E = 1, else = 0 and double Science and other subjects were scored as A*A* = 12, A*A = 11, and so on, and counted as two GCSEs taken. Very few students had eight or fewer GCSEs, and therefore overall scores were therefore based on the nine best grades. GCSE scores were available for 930 students, and were more variable than A-levels or AS-levels, only 16.6% of students having the maximum of 54 points (equivalent to 9 A* GCSEs). Scores were calculated for the four individual core sciences, and score were also calculated for Combined Science (taken by 32.8% of students). GCSE_Number_NonScience_Exams: Because all students had taken several non-science subjects, this variable was the number of non-science subjects taken.
Scottish Highers. Measures are broadly similar to those for A-levels, except that names begin SQAhigher… Grades were scored as A = 10, B = 8, C = 6 and D = 4. Students were only included who had five or more grades at Highers, the five highest being summed. Other differences from A-levels are that there is no General Studies component, and almost all students will take a non-science Higher. Results for Scottish Highers were available for 769 students, 72.4% gaining a maximum score of 50 points based on best five grades.
‘Scottish Highers Plus’. This is a construction of our own, reflecting the fact that although Scottish Highers are scored by UCAS and by most Scottish universities as A, B, C and D, the UCAS grades are actually A1, A2, B3, B4, C5, C6 and D7. These results, with two bands at each grade, are treated as meaningful by many English universities (although not, it would seem, Scottish universities), and therefore we also scored Highers on a basis of A1 = 10, A2 = 9, B3 = 8, B4 = 7, C5 = 6, C6 = 5 and D7 = 4. We have named this as ‘Scottish Highers Plus’, and variable names begin SQAhigherPlus… . These results have a wider range of scores, only 19.9% of students gaining the maximum 50 points.
Scottish Advanced Highers. Variable names begin SQAadvHigherPlus… . Many Scottish universities seem not to require Advanced Highers, an argument against their use being that only selective schools have the resources or provide the possibility of studying Advanced Highers, and hence there are concerns about widening access. We note, however, that in this group of students, of 478 applying from the state sector, 93.1% had one or more Advanced Highers, compared with 81.8% of 237 nonstate sector entrants. Overall, 573 students in the present survey had at least two Advanced Highers (that is, 74.5% of the 769 students with Highers), and a further 108 had one Advanced Higher. We, therefore, also assessed the predictive value of Advanced Highers. Scoring was as for “Scottish Highers Plus” (that is, A1 = 10, A2 = 9, B3 = 8, B4 = 7, C5 = 6, C6 = 5 and D7 = 4), with scores calculated for individual core science subjects, along with highest overall score attained. 22% of the 694 students with at least one Advanced Higher had a maximum of 10 points on their best Advanced Higher, and 22.6% had 7 or fewer points.
Overall measures of educational attainment. As described in the text, an overall measure of educational attainment was calculated for each student, EducationalAttainmentGCE or EducationalAttainmentSQA for GCE and SQA assessments. These variables were based on a set of eight or ten measures respectively, with missing values replaced by the EM algorithm, and then the first principle component extracted. A single variable, EducationalAttainment was created which was the z score of either EducationalAttainmentGCE or EducationalAttainmentSQA, whichever was not missing. Because the present analysis is interested in measures within medical schools, EducationalAttainmentGCE and EducationalAttainmentSQA were also standardized to have a mean of zero and SD of one within each medical school cohort, to produce the variables zEducationalAttainmentGCE and zEducationalAttainmentSQA. We also used a dummy variable, SQAorGCE, to indicate whether entrants had taken Scottish or other qualifications. Note that in the paper on Construct Validity [35] the unstandardized measures were used, in order that information on applicants as well as entrants could be on a common scale.
UKCAT measures zUKCATtotal Data were provided by the UKCAT consortium, with some additional measures calculated by HIC in Dundee. The overall measure of performance was the total score, UKCATtotal, and there were also scores on the four subscales UKCATabstractReasoning, UKCATdecisionAnalysis, UKCATquantitativeReasoning, and UKCATverbalReasoning. Each of the measures was also standardised as a z-score within medical schools and cohorts, to give zUKCATtotal, with the four subscales being zUKCATabstractReasoning, zUKCATdecisionAnalysis, zUKCATquantitativeReasoning, and zUKCATverbalReasoning, There was also information on the date of taking UKCAT, the variable UKCATdayOfTakingPctileRank giving relative date of taking the test within cohorts, low scores indicating early takers of the test. Not all candidates answered all questions, in most cases probably because they ran out of time, and as a result on average had lower scores than if they had guessed at items, the measure UKCATskipped giving the overall number of skipped items, which had a median of 4, only 25.9% of candidates answering all items. Some candidates were allowed extra time because of special needs, which is indicated by the variable UKCATexamSeriesCode; on average these candidates had higher overall scores than other candidates.
In their analyses of BMAT [36], Emery et al. reported that candidates from schools with more extensive experience of the test performed somewhat differently, and therefore a contextual variable, UKCATcandPerSchool, was provided by HIC which counted the number of candidates taking UKCAT in a student’s school since the test’s inception.
Schooling measures SelectiveSchool Some information on schooling, including school codes, was available from UCAS, and the school codes could also be linked into contextual data available from the Department for Education (DfE; formerly DFES) at Key Stage 5 for the academic year 2010 (file created May 2011), for schools in England. The merging of the two datasets was carried out by HIC. School type was available from two separate sources, UCAS and DFES. In UCAS’s data, of 4,811 students, 69 had missing information, 360 were in UCAS’s ‘Unknown’ category, 219 were ‘Apply Online UK’, and 86 were ‘Other’. Of 4,077 students for whom information was available, 1,941 (47.6%) were classified as coming from Selective Schools (‘Grammar School’ or ‘Independent School’), and 2,136 (52.4%) from non-Selective Schools (‘Comprehensive School’, ‘Further/Higher education’, ‘Sixth Form Centre’ and ‘Sixth Form College’). The DFES database also had a measure of Selective Schooling, with information on 2,830 individuals available, of whom 1,387 (49.0%) attended selective schools. The overlap of the UCAS and DFES classifications was good, but not perfect. Our final measure, entitled SelectiveSchool had a value of 1 if either UCAS or DFES data suggested a school was selective, and otherwise was 0. Altogether of the 4,811 individuals in the Primary Database, information was available for one or both sources in 4,114 cases, of whom 1,986 (48.3%) had evidence of having attended a selective school.
Contextual school measures. The DfE data had a total of 22 contextual measures on schools. After a range of preliminary, exploratory analyses we confined the analyses to three variables: DFESshrunkVA, which is a measure of value added between Key stages 4 and 5, and was available for the schools of 2,561 students; DFES.AvePointStudent, which is a measure of the average points gained by each student at a school across all of that school’s examination entries, and was available for the schools of 2,586 students; and DFES.AvePointScore, which is a similar measure to the previous one except that the average is at the level of examination entries (rather than students), and was available for the schools of 2,582 students.
Demographic measures UK Nationality was based on the online information provided when students took UKCAT; of 4,811 students, 4,598 (95.6%) were UK nationals, 176 (3.7%) were EU/EEA nationals and 37 (0.8%) were from outside the EU/EEA; the binary variable was called UK.
CAND.AgeGT21 Sex was based on information provided by UCAS; of 4,811 students, 2,081 (43.3%) were male and 2,730 (56.7%) were female. The variable was called UCAS.male, scoring 1 = male and 0 = female.
UCAS.Ethnic2. Age was based on stated age in years when taking the UKCAT test, and ranged from 17 to 45 (mode = 18, mean = 19.55, SD = 2.84). Age was missing in 45 cases, 28.9% of students were aged 21+, and 1.3% were aged 30+. The variable was called CAND.Age. Additional 0/1 variables were created to indicate whether candidates were 21 or older or 30 or older (CAND.AgeGT21, CAND.Age30plus).
Ethnicity was based on the standard 23 categories in the UCAS coding. Ethnicity was missing in 69 cases, for 214 was coded as Unknown, and for 192 was coded as ‘Not given’. On a simplified six category basis there were 3,057 White, 577 Indian sub-continent, 223 Other Asian, 92 Black, 140 Mixed and 60 Other. For simplicity, and as in many other studies [37]) we grouped students as White (n = 3,057, 73.7%) and Non-White (n = 1,092, 26.3%), in a variable called UCAS.Ethnic2.
Socio-economic measures CAND.NSSEC Socio-economic classification (SEC), variable CAND.NSSEC, was based on the online information provided by students taking UKCAT, who completed the abbreviated version of the self-coded questionnaire (NS-SEC) provided by UK National Statistics2. SEC was calculated separately for each parent (if provided), and the higher SEC used. Of 4,091 individuals with usable information, 3,740 (91.4%) were in SEC group 1, 105 (2.6%) in group 2, 146 (3.6%) in group 3, 38 (0.9%) in group 4, and 62 (1.5%) in group 5, where group 1 has the highest status.
IMD1IncomeDecile(with two subscales)
IMD4EducationDecile (with two subscales), Socio-economic contextual measures. For applicants living in England, postcodes for place of residence were used to link to small-area census statistics collected as part of The English Indices of Deprivation [38] and which generate a series of Indices of Multiple Deprivation (IMD). For ease of analysis, HIC converted the measures to deciles, low scores indicating greater deprivation. IMDOverallQualityDecile provides an overall single indicator of deprivation. In addition there are 15 more detailed scales and subscales, whose names are moderately self-explanatory: IMD1IncomeDecile (with two subscales), IMD2EmploymentDecile, IMD3HealthDisabilitySkillsDecile, IMD4EducationDecile (with two subscales), IMD5HousingAndServicesDecile (with two subscales), IMD6CrimeDecile, and IMD7LivingEnvironmentDecile (with two sub-scales). Note that although these scales are described in terms of deprivation, they are scored as 1 = high deprivation and 10 = low deprivation, and therefore are renamed as ‘Quality’ so that higher scores indicate a higher quality on the measure.
IMD5HousingAndServicesDecile (with two subscales)
IMD7LivingEnvironmentDecile (with two sub-scales).

McManus et al.

McManus et al. BMC Medicine 2013 11:244   doi:10.1186/1741-7015-11-244

Open Data