Hello! What an ambitious and interesting project, with the potential to help in the health field! I have some questions:
1) I am not an expert in machine learning. Can you please explain some terms for me in a nonspecialist way? Specifically, what are epochs, learning rates, and mean-pearson R^2 scores?
2) In Slide 10, you display a histogram, and in your video described it as representing the varying solubilities of the datasets you used. It's clearly a rather normal histogram; do you think that a more uniform distribution would have made your model perform better or worse? For instance, your histogram shows that you used very few data sets (or maybe just data points?) with solubility greater than 2 or less than -2; if you had had more such points, do you think your model would have performed the same?
3) I'm curious what the "definition" is of toxicity. Bio-availability has those four characteristics (ADME), but toxicity is a term alone. How was toxicity measured?