Learning Goal: I’m working on a data analytics question and need an explanation
Learning Goal: I'm working on a data analytics question and need an explanation and answer to help me learn.Questions:Question 1 (40 points):Regression and MLE We are interested in estimating the median home value in New England. For this, we employ a regression from the origin (β1=0)(β1=0) as presented below:Yi=βXi+εiYi=βXi+εiWhere YiYi is median home value in New England town ii, and XiXi is a binary variable that equals to 1 if the house is in town ii and equals to 0 otherwise.Let Y1,Y2,…,YnY1,Y2,…,Yn be independent whereYiεi∼N(βXi,σ2),∼N(0,σ2).Yi∼N(βXi,σ2),εi∼N(0,σ2).(15 points) Find the MLE of ββ, β̂ MLEβ^MLE. (15 points) Find the MLE of σ2σ2, σ̂ 2MLEσ^MLE2. (10 points) Show that sums of squares of error, SSE, can be written as: SSE=∑i=1ny2i−β̂ ∑i=1nxiyiSSE=∑i=1nyi2−β^∑i=1nxiyiQuestion 2 (40 points): Confidence IntervalLet YiYi still be the median home value in New England town ii. Let the generated YY below to be the entire population data on median value of NEw England homes, where μ=\$329,108μ=\$329,108 and σ=\$50,000σ=\$50,000.set.seed(12) Y=rnorm(1000, mean=329108, sd=50000)For steps 1 and 2 to let’s present we do not know μμ.(5 points) Take 100 samples of size 30 (without replacement) from the population of YY’s (10 points) Calculate a 95% confidence interval for μμ for all of the 100 samples. (10 points) How many of these samples include the true mean μ=μ=? (15 points) Repeat steps b and c for 90% confidence intervals. Question 3 (20 points) Regression Estimation(7 points) Using the synthetic data provided below on median home values (YY) and towns in New England (X)(X), estimate the regression from question 1, i.e., Yi=βXi+εiYi=βXi+εiAre the coefficients statistically significant? Do not forget to use factor(X) as opposed to X in your regression!!housing=read.table("https://unh.box.com/shared/static/twmyqbvx0toxhvdv0n23c55e5cc3ipe4.csv", header = TRUE, sep=",", dec=".") head(housing)## Y X ## 1 426419.3 7 ## 2 416306.1 8 ## 3 344116.1 9 ## 4 453613.3 7 ## 5 303323.9 5 ## 6 314420.3 6(13 points) Check the residuals of the model. Are the assumptions satisfied? Why? Why not? Requirements: math