본문 바로가기
파이썬 머신러닝ML

파이썬 다항회귀 과적합 해결 규제화 Ridge, LASSO

by 양기호니 2022. 12. 16.
728x90
반응형

혼자 공부하려고 정리했어요~

sin 곡선을 기반으로 해서 데이터를 만들고 여러 차수를 넣어서 다항회귀를 수행해봤습니다.

차수가 높아질 수록 데이터에 억지로 맞추는듯한 그래프들이 나옵니다.

이런 문제를 과적합 문제라고 합니다.

def sin(X):
    return np.sin(1.5 * np.pi * X)

m = 30
np.random.seed(3)
X = np.sort(np.random.rand(m))
y = sin(X) + np.random.randn(m) * 0.1

degrees = (1, 4, 18)
plt.figure(figsize=(15,5))

for i, degree  in enumerate(degrees):
    poly = PolynomialFeatures(degree=degree, include_bias=False)
    lr = LinearRegression()
    pipe = Pipeline([('poly', poly), ('linear_regression', lr)])
    pipe.fit(X.reshape(-1,1), y)
    
    X_ = np.linspace(0,1, 100)
    plt.subplot(1, len(degrees), i+1)
    plt.plot(X, y, 'b.', label='Samples')
    plt.plot(X_, sin(X_), 'g-', label='Ideal')
    plt.plot(X_, pipe.predict(np.expand_dims(X_, axis=1)), 'r--', label='Model')
    plt.xlim((0,1)); plt.ylim((-2,2))
    plt.legend()
    plt.title(f"Degree:{degree}");

 

 

차수를 20으로 입력했지만, 과적합되지 않도록 규제하는 모델들입니다.

 

Ridge 모델

from sklearn.linear_model import Ridge

np.random.seed(0)
m = 25
X = np.sort(np.random.rand(m))
y = sin(X) + np.random.randn(m) * 0.1

alphas = (0, 0.01, 0.5, 10)
plt.figure(figsize=(15,5))
coef_df = pd.DataFrame()
for i, alpha  in enumerate(alphas):
    poly = PolynomialFeatures(degree=20, include_bias=False)
    ridge = Ridge(alpha=alpha)
    pipe = Pipeline([('poly', poly), ('ridge', ridge)])
    pipe.fit(X.reshape(-1,1), y)
    
    X_ = np.linspace(0,1, 100)
    plt.subplot(1, len(alphas), i+1)
    plt.plot(X, y, 'b.', label='Samples')
    plt.plot(X_, sin(X_), 'g-', label='Ideal')
    plt.plot(X_, pipe.predict(np.expand_dims(X_, axis=1)), 'r--', label='Model')
    plt.xlim((0,1)); plt.ylim((-2,2))
    plt.legend()
    plt.title(f"Degree:20, Alpha:{alpha}")
    coef_df[f'alpha:{alpha}'] = pd.Series(data=ridge.coef_)
display(coef_df)

 

 

LASSO 모델

np.random.seed(0)
def sin(X):
    return np.sin(1.5 * np.pi * X)

m = 25
X = np.sort(np.random.rand(m))
y = sin(X) + np.random.randn(m) * 0.1

alphas = (0, 0.0001, 0.005, 0.1)
plt.figure(figsize=(20,5))
coef_df = pd.DataFrame()
for i, alpha  in enumerate(alphas):
    poly = PolynomialFeatures(degree=20, include_bias=False)
    if alpha==0:
      reg = LinearRegression()
    else:
      reg = Lasso(alpha=alpha)
    pipe = Pipeline([('poly', poly), ('reg', reg)])
    pipe.fit(X.reshape(-1,1), y)
    X_ = np.linspace(X.min(),X.max(), 100)
    plt.subplot(1, len(alphas), i+1)
    plt.plot(X, y, 'b.', label='Samples')
    plt.plot(X_, sin(X_), 'g-', label='Ideal')
    plt.plot(X_, pipe.predict(np.expand_dims(X_, axis=1)), 'r--', label='Model')
    plt.ylim((-2,2))
    plt.legend()
    plt.title(f"Degree:20, Alpha:{alpha}")
    series = pd.Series(data=reg.coef_)
    coef_df[f'alpha: {alpha}'] = series 
display(coef_df)

반응형

댓글