Predicting the total number of confirmed Covid cases Using Prophet Model.
!pip install pystan
!pip install fbprophet
import fbprophet
from fbprophet import Prophet
import pandas as pd
from fbprophet.diagnostics import cross_validation
from fbprophet.plot import plot_cross_validation_metric
from fbprophet.diagnostics import performance_metrics
dir(Prophet)
df=pd.read_csv('covid.csv')
df.head()
df.shape
df.dtypes
df['Date']=pd.to_datetime(df['Date'])
df.dtypes
df.isnull().sum()
We can see there is lots of missing value in Province/State column so we need to handle it.
df['Date'].nunique()
total=df.groupby(['Date'])['Confirmed','Deaths','Recovered','Active'].sum().reset_index()
total
df_prophet=total.rename(columns={'Date':'ds','Confirmed':'y'})
df_prophet.head()
m=Prophet()
model=m.fit(df_prophet)
model.seasonalities
# Simulate the trend using the extrapolated generative model.
# Returns pandas dataframe
future_global=model.make_future_dataframe(periods=30,freq='D')
future_global.head()
future_global.shape
df_prophet['ds'].tail()
future_global.tail()
prediction=model.predict(future_global)
prediction
prediction[['ds','yhat','yhat_lower','yhat_upper']].tail()
model.plot(prediction)
This is what our prediction looks like. The direction of overall case numbers is probably true, u will observe how cases rises exponentially.
model.plot_components(prediction)
from fbprophet.plot import add_changepoints_to_plot
#Add markers for significant changepoints to prophet forecast plot.
#Example:
#fig = m.plot(forecast)
#add_changepoints_to_plot(fig.gca(), m, forecast)
fig=model.plot(prediction)
a=add_changepoints_to_plot(fig.gca(),model,prediction)
## horizon='365 days'--> for how many days we have to cross validate=====
## Computes forecasts from historical cutoff points Beginning from..
## (end - horizon) it means it is going to take that date that is (end - horizon) bcz on these date we have to just
## cross-validate for the new dataset that we have to find out
## period=180 as from documentation of func as period=0.5*365=180 or {period=1/2*horizon value}
## initial -How many total no. of days we actually want-- 3*365 from documentation of function or {initial=3*horizon}
df_cv=cross_validation(model,horizon='30 days',period='15 days',initial='90 days')
df_cv.head()
df_cv.shape
df_performance=performance_metrics(df_cv)
df_performance.head()
Plotting all the metrics with days of cross validated data.
df_performance=plot_cross_validation_metric(df_cv,metric='rmse')
df_performance=plot_cross_validation_metric(df_cv,metric='mse')
df_performance=plot_cross_validation_metric(df_cv,metric='mape')