Probable End-of-Year Rates

The overlap in values between the two groups shows whether or not they are likely to have the same end-of-year rate. If only a small part of the two distributions overlaps, there is a low chance that the groups will have the same end-of-year rate. If the distributions overlap quite a bit, then there is a higher chance the groups will have the same end-of-year rate. The further along the school year is (school progress) or the larger the student groups, the narrower the distributions will be, reflecting the increased precision of the rate measurement.

Rate Precision Across Year

This chart shows how the estimate of students' end-of-year outcome rate improves and becomes more precise over time. At the start of the year, a wide range of outcome rates are possible. As the school year progresses the possible range of outcome rates shrinks as unusual observations carry less weight. This means that, by a certain number of days into the school year, the group's end-of-year outcome rate can be precisely predicted -- if no changes are made by the school. Smaller student groups have wider possible ranges further into the school year, reflecting the slower rate of learning about their outcome rate compared to larger student groups.

What is the Outcome Rate Simulator?

This is a tool to help you determine whether two student groups are likely to have different end-of-year outcomes and, if so, what magnitude the difference is likely to be. Early in the year, it can be hard to evaluate whether bumps in the attendance or suspension rate are meaningful and reflect a true trend for a student group or are just noisy variation. This is especially true for small student groups. The simulator can determine how early in the school year you can get an accurate estimate of the end-of-year rate.

How to use the simulator

First select the outcome – either attendance or suspension – from the dropdown. Then gather 5 pieces of information:

  1. How far along the school year is, as a percentage of the year (e.g., 80% complete)
  2. How many students are in the first student group you want to examine
  3. The attendance rate – or number of suspensions – for that student group so far this year
  4. How many students are in the second student group you want to examine
  5. The attendance rate – or number of suspensions – for the second student group so far this year

With this information in hand:

  1. Drag the slider to show about how much of the school year has elapsed (e.g., 10%, 30%)
  2. Focus on one group of students (e.g., eighth-graders eligible for free or reduced price lunch). Enter the number of students in that group. Then drag the slider to show the attendance rate of students in that group to-date, or enter the number of suspensions.
  3. Focus on a second group of students (e.g., eighth-graders not eligible for free or reduced price lunch). Enter the number of students in this group and drag the slider to show this group’s attendance rate, or enter the number of suspensions.

How to interpret the results

Rate and margin of error

For suspensions, the simulator will first compute each group’s suspension rate. Then, for either outcome, the simulator will compute each group’s corresponding margin of error and display it in the box below the last slider. The margin of error is the expected variation around the rate. It depends on the size of the group, the portion of the school year that has elapsed, and the desired confidence level. In this case, the margin of error uses a 90% confidence level. This means there is a 90% likelihood that the end-of-year attendance or suspension rate will falls within the range of the rate ± the corresponding margin of error.

“Probable End-of-Year Rates” chart

This chart shows each group’s likely end-of-year attendance or suspension rate. For each group, the chart’s peak shows the most likely end-of-year rate. The spread shows the range of possible end-of-year rates. Rates further from the peak are less likely. Whether the distribution is tightly clustered around the peak or more spread out depends on the student group size and progress through the year; smaller student groups have a wider range of possible values because one suspension or a few students missing a week of school has a larger effect on a smaller student group.

“Rate Precision Across Year” chart

This chart shows how the estimate of students’ end-of-year attendance or suspension rate improves and becomes more precise over time. The red line in the chart shows where the school year progress slider is set, assuming a 180 day school year.

At the start of the school year, a wide range of attendance or suspension rates are possible. As the school year progresses the possible range of rates shrinks as unusual observations carry less weight. How quickly the range narrows depends on the student group size. By a certain number of days into the school year, the group’s end-of-year attendance rate can be precisely predicted – if no changes are made by the school. Similarly, suspension rates can be predicted more precisely as time goes on, but for small groups of students, a single suspension can greatly affect the rate at any time.


Worked example - Attendance

20% of the school year has elapsed. Student group 1 has 20 students and a current attendance rate of 85%. Student group 2 has 200 students and a current attendance rate of 90%.

The top chart shows that – without intervention – it is very unlikely that the two groups will finish the school year with the same attendance rate. It also shows that, only one-fifth of the way into the school year, a model can predict with high precision the likely end-of-year attendance rate for the second group of 200 students. However, for the first group of 20 students, a much wider range of possible end-of-year attendance rates from about 82% to about 88% is plausible.

The bottom chart shows that, in the first ten days or so of school, the two groups’ attendance rates overlap in terms of their possible distributions. However, by around 20 days into the year, it is clear that – without intervention – student group 2 is likely to have a substantially higher end-of-year attendance rate than student group 1.

Worked example - Suspension

40% of the school year has elapsed. Student group 1 has 200 students and 25 suspensions so far this year. Student group 2 has 75 students and 5 suspensions so far.

The top chart shows that student group 1 is likely to have an end-of-year suspension rate between 25% and 40% whereas a wider range of suspension rates is possible for group 2. In group 2, with fewer students and a lower current suspension rate than group 1, an end-of-year suspension rate from 5% to 25% is plausible.

The bottom chart shows that, while the two groups’ probable suspension rates still overlap at this point in the school year, the rates are diverging. As more days accumulate and the model has more information about each group’s suspension rate, the estimates show that group 2 is likely to have a lower end-of-year suspension rate than group 1.


More use cases

Calculate suspension rate quickly

If you only know the number of suspensions and number of students in a group, it can be difficult to calculate the annual suspension rate. The annual – or end-of-year – suspension rate takes into account a full year’s worth of data. Simply dividing the number of suspensions by the number of students partway through the year, without taking into account how much of the year has progressed, will give too low of an estimate. The estimate produced will imply that there will be no further suspensions this year. Instead, use this tool to calculate the suspension rate taking into account how much of the year has elapsed. The tool assumes the suspension rate will be consistent throughout the year and calculates a margin of error accounting for the group size, rate, and school year progress.

Measure progress against benchmark year

You can use the simulator to compare last year’s numbers against this year’s numbers to date. For example, if you know the attendance rate or number of suspensions for 8th graders last year, you can compare to this year’s eighth-graders to see if they are on track for similar end-of-year rates. In this case, use group 1 as the benchmark value from the prior year, and use group 2 for the current value for that same student group this year.

Compare two very different sized groups

Attendance

Consider one group with 250 students and a current attendance rate of 92% vs. a second group with 20 students and an attendance rate of 93%. Only 40% of the way into the school year, we can be confident that the first group is likely to have an end-of-year attendance rate close to 92%. However, the attendance rate for the second group is less certain; it could be the same as that of group 1 (92%) or could be two percentage points higher.

Suspensions

Consider one group with 250 students and 25 suspensions halfway through the school year vs. a second group with 20 students and 3 suspensions to-date. Group 1 has a suspension rate of 20% with a margin of error of about 6 percentage points. As both charts show, we can be fairly certain that Group 1’s final suspension rate will be between 15% and 25%. Despite having only 3 suspensions to-date, Group 2 has a higher suspension rate because the group size is much smaller. For this group, the probable end-of-year rate is 30% with a high margin of error, reflecting that the group’s suspension rate could be 15% to 50%.


Connection to ABM

The simulator is a simple version of the forecasting component of ABM. Here the forecast is from 10,000 simulations based on student group size, the current outcome rate, and how much of the year has elapsed. It assumes no changes in the rate over time. This means that the simulator does not take into account spikes in absences around holidays, seasonal drops in attendance, or increases in behavioral events at certain times of the year. It is a simple approximation that illustrates the variability in outcome rates throughout the year, but assumes the rate of events is constant at all times.

In contrast, the forecast in ABM takes prior school year data into account, learning from the past year’s patterns in terms of when attendance or suspensions spike or decline within the year. This gives the ABM forecast greater precision.

The Binomial distribution

The simulation uses the binomial distribution to define the range of plausible outcome rates consistent with the true outcome rate, dependent on the sample size. This treats the current attendance rate as a function of a true outcome rate and sampling error proportional to the group size.

The binomial distribution takes two parameters: the number of trials and the probability of an event. As an example, consider a coin flip with a 50% probability of being heads. If we specify 1 trial, then the binomial distribution will return a single value, either 0 or 1. If we specify 10 trials, then the binomial distribution will record the number of times out of 10 heads was observed. The most likely value will be 5, but other values are plausible though increasingly unlikely, all the way from 0 to 10.

Applications to attendance and suspension

The simulator uses this binomial distribution to construct a sampling distribution around the outcome rates input by the user. From the user inputs, the number of trials and probability for each group defines the appropriate sampling distribution from the binomial, which is displayed in the charts and used to calculate the margin of error.

Deriving the number of trials

For both outcomes, the number of trials is the number of student-days for each student group. Trials are calculated using a formula and not input directly by the user. To calculate the student-days, we first identify the number of days elapsed in the school year by multiplying the proportion of the school year selected by the progress slider by 180 days (a standard school year). This represents the number of student-days per student, which we then multiply by the group size to get the number of student-days possible for each student group.

For example, if the user has selected 20% progress and specified a group size of 100, then the number of student days would be 180 x 0.2 x 100 = 3,600. This number of student days is used as the number of trials in the binomial distribution for the attendance calculation. For the suspension calculation, the number of student-days is rescaled to student-years to reflect an annualized rate of suspension.

Deriving the probability

For attendance, the probability of attending is the attendance rate input by the user using the slider.

For suspensions, the simulator calculates the annualized suspension rate from the user inputs: the number of students, the number of suspensions, and the progress through the school year, The calculation is the number of suspensions / number of student-years enrolled for each group. A student-year is defined as 180 student days and is just a rescaling of the student-days described above.

For example, if a user inputs a group size of 100, 3 suspensions, and 50% school year progress, the suspension rate would be:

Student-years = (100 * 180 * 0.5) / 180 = 50 Suspensions = 3 Suspension rate = 3 / 50 = 6%

This suspension rate is displayed for each group in the side panel so the user can see the result of the calculation at all times.

Computing the sampling distribution

Now, with the number of trials and probability, we compute the sampling distribution. We use a random number generator to take 10,000 draws from the binomial distribution defined by these parameters for each group. These draws allow us to quantify how likely different rates are given the reported probability.

An example below illustrates how this works:

# Example R code
probability_a <- 0.93
probability_b <- 0.95
size_a <- 100
size_b <- 36
progress <- 0.4 # 40% progress
days_a <- size_a * 180 * progress 
days_b <- size_b * 180 * progress

# Sample from the binomial distribution for each group 10,000 times
dist_a <- rbinom(10000, size = days_a, prob = probability_a)
dist_b <- rbinom(10000, size = days_b, prob = probability_b)

# Rescale to be a rate
dist_a <- dist_a / days_a
dist_b <- dist_b / days_b

The 10,000 draws for each group are used to calculate the margin of error, illustrate any overlap in the distributions between the two groups, and illustrate how the outcome rate changes over time.

Margin of error

The margin of error is defined as 1.645 * the standard deviation of the sampling distribution for each group. 1.645 is the critical value for a two-tailed 90% confidence interval. This approximation is useful for most group sizes.

Chart: “Probable End-of-Year Rates”

The top chart preserves all of the simulated values, allowing the user to see unlikely regions that shape the distribution around the true rate. Both sampling distributions are charted and their overlap, if any, is visually represented.

Chart: “Rate Precision Across the Year”

The width of the band for each group in this chart is defined by the 5th and 95th percentiles of the sampling distribution for both groups. This range represents the 90% most likely values from each sampling distribution. The sampling distribution is recalculated for each of the 180 days in the chart, so from left to right, the number of trials is increasing while the probability is staying the same.