Citadel Interview Questions
47 real interview questions for Quant Researcher roles at Citadel.
Showing 1–30 of 47 questions
Explain the difference between Lasso and Ridge regression.
Given three variables x, y, and z, where x is distributed as N(0,1), z is distributed as N(0, sigma^2), and y = x + z, with x independent of z: For a given observed value of y, what is the conditional distribution of x given y, i.e., p(x|y)?
What is an O(n log n) sorting algorithm? Give an example and briefly explain how it works.
Suppose you want to predict y and you have predictor vectors, each with known accuracy, variance, and sample length. How do you combine these predictors to minimize the residual standard error (RSE)?
There are three random variables, X, Y, Z. The correlations between each pair of variables are the same, i.e., ρ = Corr(X,Y) = Corr(Y,Z) = Corr(Z,X). What is the tightest bound you can give for ρ? How about the general case for n random variables?
What are the ways to construct a uniform distribution given only access to a fair coin?
If 75 customers are randomly assigned to three equal-sized databases, and all partitions are equally likely, what is the probability that two randomly selected customers, Bob and Ben, are in the same database?
Suppose you have several random variables that all have equal pairwise correlation. What is the possible range of this correlation value?
What is the probability that X > Y, where X is distributed as N(0, 2) and Y is distributed as N(0, 1)?
What is the expected number of samples drawn from a uniform distribution on [0,1] required so that their sum exceeds 1?
What is the angle between the minute hand and the hour hand at 12:15?
What happens to the optimal parameters of a linear regression if you feed in the same data twice? How about the R^2 and z-scores?
For the task of classifying news articles into their subject type (e.g., finance, science, politics), what is a current state-of-the-art approach? Describe this approach in detail and discuss its latency. Additionally, suggest simpler, classical models that can achieve faster throughput while maintaining high accuracy for this task.
1. In the context of the German tank problem, explain how to construct a statistical estimator for the maximum number of tanks produced, given a sample of observed serial numbers. 2. Calculate the mean and standard deviation of this estimator.
Suppose there are 10 lions and a piece of meat. If any one of the lions eats the meat, that lion falls asleep. While asleep, any other lion can eat her and will also fall asleep. This process continues in the same way. What will happen at the beginning? Will any lion eat the meat?
Given that the probability of getting heads on a coin flip is p, what is the expected number of flips required to get three heads in a row?
What data structure is used to implement a dictionary in Python?
Calculate the variance of x, denoted var(x), given that the data points are distributed uniformly on the surface of a 3D sphere.
Write a program to find the square root of a number.
Using stock data from the last five years, how can we build a model to predict the next day's price?
You have r red balls and w white balls in a bag. If you keep drawing balls from the bag until only balls of a single color remain (i.e., you run out of one color), what is the probability you run out of white balls first? Express your answer in terms of r and w.
What is correlation? What is covariance? Draw a graph where the correlation is equal to 1 and another where it is equal to -1.
Create a class that implements a singly linked list data structure.
Nine fair coins are tossed. What is the probability that an odd number of heads will land?
Why is an L2 penalty added to linear regression models? Explain the effect of the L2 penalty on the solution.
What is the expected number of coin flips required to see 2 heads in a series of fair coin tosses?
The chance that a student passes a test is 10%. What is the probability that at least 50 out of 400 students pass the test? Choose the closest answer from: 5%, 10%, 15%, 20%, 25%.
Why is regularization important, for example using ridge or lasso regression compared to ordinary least squares (OLS), even when the number of samples is greater than the number of parameters?
What are virtual functions in C++? How are they used?
What is the difference between regressing y on x and regressing x on y?
Want the full solutions?
Get detailed walkthroughs for all 141+ Citadel questions with Quant Blueprint.
Get Started