-
0이 많은 count 분포데이터분석 2021. 2. 16. 20:52
Analysis methods you might consider
Below is a list of some analysis methods you may have encountered. Some of the methods listed are quite reasonable while others have either fallen out of favor or have limitations.
- Zero-inflated Poisson Regression – The focus of this web page.
- Zero-inflated Negative Binomial Regression – Negative binomial regression does better with over dispersed data, i.e. variance much larger than the mean.
- Ordinary Count Models – Poisson or negative binomial models might be more appropriate if there are no excess zeros.
- OLS Regression – You could try to analyze these data using OLS regression. However, count data are highly non-normal and are not well estimated by OLS regression.
from stats.idre.ucla.edu/r/dae/zip/
Zero-Inflated Poisson Regression | R Data Analysis Examples
stats.idre.ucla.edu
poisson 분포를 그대로 사용할 경우 평균과 분산이 같다는 포아송 분포의 가정이 만족되지 않아
Negative Binomial을 사용하여야 한다
과대산포 (overdispersion)는 이름처럼 데이터 값의 분포가 넓게 퍼져 있어 로지스틱 혹은 포아송 회귀 모형이 제대로 값을 예측하지 못하는 상태를 의미합니다.
'데이터분석' 카테고리의 다른 글
Regression: Ordinary Regression VS Logistic Regression (0) 2021.02.18 train/ test dataset 나누기(sklearn) (0) 2021.02.16 Bayesian (0) 2021.02.16 Bias-Variance Trade Off (0) 2021.02.16 SVM (0) 2021.02.16