Some Questions about Mechine Learning

2025-02-25

Q1: When training a model, if number of training examples increases to infinity, how it affects your model variance?
A1: As the number of training examples approaches infinity, the model variance decreases. This is because a larger dataset provides a more comprehensive representation of the true data distribution, reducing the model’s sensitivity to fluctuations in individual training samples. As a result, the model generalizes better and becomes less dependent on specific training subsets leading to more stable predictions.

Q2: Discuss overfitting and under-fitting in regression problem.
A2: In a regression problem, the goal is to model the relationship between the input features and a continuous output variable. Overfitting and underfitting are two common issue that affect model performance.
Overfitting occus when a model learns not only the underlying pattern in the training data but also noise and random fluctuation. It typically happens when the model is too complex(e.g. using high degree polynomial regression). While it performs well on training data, it generalizes poorly to new data, leading to high variance.
Underfitting occurs when a model is too simple to capture the true relationship between input and output variables. It happens when the model lacks sufficient complexity(e.g. using linear regression for a nonlinear relationship). Underfitting results in poor performance on both training and test data, leading to high bias.

Q3: What are the techniques to prevent overfitting?
A3: To prevent overfitting, us regularization(L1/L2), data augmentation, increasing training data, dropout, early stopping, reducing model complexity, batch normalization etc methods .

Q4: How we can solve the issues of local minimum?
A4: 1.Random restarts: run the optimization algorithm multiple times with different initial starting points. 2.Gradient-based methods with Momentum, momentum help the optimizer overcome small local minima by considering past gradients in the update step. 3.Stochastic gradient descent(SGD) introduces noise into the gradient estimates, wich can help escape shallow local minima. 4.Higher-order optimizaiton methods, use some methods that consider second-order derivatives to better navigate the optimization landscape.5.Regularization, add regularization terms(e.g., L1 or L2) to the objective function to smooth the landscape and reduce the likelihood of getting stuck in local minima. 6.Transfer learning, use knowledge from related problem or pre-trained models to initialize the optimization process in a better region of the solution space. 7.Learning rate scheduling, dynamically adjusting the learning rate during training, starting with a larger rate to escape local minima and gradually reducing it for precise convergence..

Q5: Assume we have a set of data from a robot which includes joint position, velocity, acceleration(i.e., inputs). A set of features (e.g., joint motor torques(output)) have been recorded fro each joint of robot. For the new task, we need new trajectory. Our goal is to predict the join torques for new (inputs) (a) We have decided to use a neural network to solve this problem. We have two choices:
ii> either to train a separate neural network for each of the joints(i.e., multi-input-single output model MISO) or
iii> to train a single neural network with one output neuron for each joint torque( Multi-Input-Multi-Output MIMO) model, but with a shared hidden layer. Which method do you suggest? Explain your answer.
Hint: Consider dependencies between outputs.
A5: I suggest using the Multi-Input-Multi-Output(MIMO) model with shared hidden layers for this task. Here is why:
1. Dependencies between outputs: Joint for torques are likely interdependent, as the torque required for one joint may depend on other joints. A MIMO model can capture these dependencies effectively with shared hidden layers.
2. Efficiency: A single neural network with shared layers can learn common feature across all joints, making it more efficient than training separate models for each joint(MISO).
3. Scalability: MIMO models are easier to scale as you can add more output neurons without needing separate networks for each joint, unlink MISO.
Overall, MIMO with shared hidden layers is more efficient, better at captureing interdependencies, and scalable.

Q6-1: (True/Flase)Training error provides an unbiased estimate of true error.
A6-1: False. Training error is typically biased and underestimates the true error because the model is evaluated on the same data it was trained on, leading to overfitting. A better estimate of true error comes from validation or test error.

Q6-2: (True/False)Test error provides an unbiased estimate of true error.
A6-2: True. Because training and test set examples are recorded independently at random test.

Q6-3: (True/False)Training accuracy minus test accuracy provides an unbiased estimate of the degree of overfitting.
A6-3: True. If training accuracy is very close to test accuracy, then most likely overfitting has not occured.

Q7: Model evalution can be performed in four different ways:
A7:
1. Under-fitting - Validation and training error high
2. Overfitting - Validation error is high, training error low
3. Good fit - Validation error low, slightly highter than the training error
4. Unknown fit - Validation error low, training error ‘high’
5. Overfitting is a modeling error that introduces bias to the model because it is to closely related to the training data set