{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a6b51ad5",
   "metadata": {},
   "source": [
    "# scipy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1879c72",
   "metadata": {},
   "source": [
    "## 稀疏矩阵"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "dea1d311-9fb8-439f-a651-9fb142053dc3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0 0 3 0 0]\n",
      " [0 0 0 0 0]\n",
      " [4 0 0 5 0]\n",
      " [0 0 0 0 0]\n",
      " [0 0 6 0 0]]\n",
      "[0 2 2 4]\n",
      "[2 0 3 2]\n",
      "[3 4 5 6]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "from scipy import sparse\n",
    "\n",
    "# 1️⃣ 记录非零元素的行索引\n",
    "row = np.array([0, 2, 2, 4])  # 3 在第0行，4 在第2行，5 在第2行，6 在第4行\n",
    "\n",
    "# 2️⃣ 记录非零元素的列索引\n",
    "col = np.array([2, 0, 3, 2])  # 3 在第2列，4 在第0列，5 在第3列，6 在第2列\n",
    "\n",
    "# 3️⃣ 记录非零元素的值\n",
    "data = np.array([3, 4, 5, 6])  # 这些索引位置对应的值\n",
    "\n",
    "# 4️⃣ 生成 5x5 的稀疏矩阵\n",
    "matrix = sparse.coo_matrix((data, (row, col)), shape=(5, 5))\n",
    "\n",
    "# 打印矩阵\n",
    "print(matrix.toarray())  # 转换回普通的二维数组看下\n",
    "print(matrix.row)\n",
    "print(matrix.col)\n",
    "print(matrix.data)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "348b3575-306b-4879-9729-4ad4e83b321d",
   "metadata": {},
   "source": [
    "## 概率分布\n",
    "norm为例"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c8a0b5fa-0ac6-4b9b-8a6e-146857d15a36",
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.stats import norm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e3442ee7-e388-4a22-b76e-9acc82e8389f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A normal continuous random variable.\n",
      "\n",
      "    The location (``loc``) keyword specifies the mean.\n",
      "    The scale (``scale``) keyword specifies the standard deviation.\n",
      "\n",
      "    As an instance of the `rv_continuous` class, `norm` object inherits from it\n",
      "    a collection of generic methods (see below for the full list),\n",
      "    and completes them with details specific for this particular distribution.\n",
      "    \n",
      "    Methods\n",
      "    -------\n",
      "    rvs(loc=0, scale=1, size=1, random_state=None)\n",
      "        Random variates.\n",
      "    pdf(x, loc=0, scale=1)\n",
      "        Probability density function.\n",
      "    logpdf(x, loc=0, scale=1)\n",
      "        Log of the probability density function.\n",
      "    cdf(x, loc=0, scale=1)\n",
      "        Cumulative distribution function.\n",
      "    logcdf(x, loc=0, scale=1)\n",
      "        Log of the cumulative distribution function.\n",
      "    sf(x, loc=0, scale=1)\n",
      "        Survival function  (also defined as ``1 - cdf``, but `sf` is sometimes more accurate).\n",
      "    logsf(x, loc=0, scale=1)\n",
      "        Log of the survival function.\n",
      "    ppf(q, loc=0, scale=1)\n",
      "        Percent point function (inverse of ``cdf`` --- percentiles).\n",
      "    isf(q, loc=0, scale=1)\n",
      "        Inverse survival function (inverse of ``sf``).\n",
      "    moment(order, loc=0, scale=1)\n",
      "        Non-central moment of the specified order.\n",
      "    stats(loc=0, scale=1, moments='mv')\n",
      "        Mean('m'), variance('v'), skew('s'), and/or kurtosis('k').\n",
      "    entropy(loc=0, scale=1)\n",
      "        (Differential) entropy of the RV.\n",
      "    fit(data)\n",
      "        Parameter estimates for generic data.\n",
      "        See `scipy.stats.rv_continuous.fit <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.fit.html#scipy.stats.rv_continuous.fit>`__ for detailed documentation of the\n",
      "        keyword arguments.\n",
      "    expect(func, args=(), loc=0, scale=1, lb=None, ub=None, conditional=False, **kwds)\n",
      "        Expected value of a function (of one argument) with respect to the distribution.\n",
      "    median(loc=0, scale=1)\n",
      "        Median of the distribution.\n",
      "    mean(loc=0, scale=1)\n",
      "        Mean of the distribution.\n",
      "    var(loc=0, scale=1)\n",
      "        Variance of the distribution.\n",
      "    std(loc=0, scale=1)\n",
      "        Standard deviation of the distribution.\n",
      "    interval(confidence, loc=0, scale=1)\n",
      "        Confidence interval with equal areas around the median.\n",
      "\n",
      "    Notes\n",
      "    -----\n",
      "    The probability density function for `norm` is:\n",
      "\n",
      "    .. math::\n",
      "\n",
      "        f(x) = \\frac{\\exp(-x^2/2)}{\\sqrt{2\\pi}}\n",
      "\n",
      "    for a real number :math:`x`.\n",
      "\n",
      "    The probability density above is defined in the \"standardized\" form. To shift\n",
      "    and/or scale the distribution use the ``loc`` and ``scale`` parameters.\n",
      "    Specifically, ``norm.pdf(x, loc, scale)`` is identically\n",
      "    equivalent to ``norm.pdf(y) / scale`` with\n",
      "    ``y = (x - loc) / scale``. Note that shifting the location of a distribution\n",
      "    does not make it a \"noncentral\" distribution; noncentral generalizations of\n",
      "    some distributions are available in separate classes.\n",
      "\n",
      "    Examples\n",
      "    --------\n",
      "    >>> import numpy as np\n",
      "    >>> from scipy.stats import norm\n",
      "    >>> import matplotlib.pyplot as plt\n",
      "    >>> fig, ax = plt.subplots(1, 1)\n",
      "    \n",
      "    Calculate the first four moments:\n",
      "    \n",
      "    \n",
      "    >>> mean, var, skew, kurt = norm.stats(moments='mvsk')\n",
      "    \n",
      "    Display the probability density function (``pdf``):\n",
      "    \n",
      "    >>> x = np.linspace(norm.ppf(0.01),\n",
      "    ...                 norm.ppf(0.99), 100)\n",
      "    >>> ax.plot(x, norm.pdf(x),\n",
      "    ...        'r-', lw=5, alpha=0.6, label='norm pdf')\n",
      "    \n",
      "    Alternatively, the distribution object can be called (as a function)\n",
      "    to fix the shape, location and scale parameters. This returns a \"frozen\"\n",
      "    RV object holding the given parameters fixed.\n",
      "    \n",
      "    Freeze the distribution and display the frozen ``pdf``:\n",
      "    \n",
      "    >>> rv = norm()\n",
      "    >>> ax.plot(x, rv.pdf(x), 'k-', lw=2, label='frozen pdf')\n",
      "    \n",
      "    Check accuracy of ``cdf`` and ``ppf``:\n",
      "    \n",
      "    >>> vals = norm.ppf([0.001, 0.5, 0.999])\n",
      "    >>> np.allclose([0.001, 0.5, 0.999], norm.cdf(vals))\n",
      "    True\n",
      "    \n",
      "    Generate random numbers:\n",
      "    \n",
      "    >>> r = norm.rvs(size=1000)\n",
      "    \n",
      "    And compare the histogram:\n",
      "    \n",
      "    >>> ax.hist(r, density=True, bins='auto', histtype='stepfilled', alpha=0.2)\n",
      "    >>> ax.set_xlim([x[0], x[-1]])\n",
      "    >>> ax.legend(loc='best', frameon=False)\n",
      "    >>> plt.show()\n",
      "    \n",
      "\n",
      "    \n"
     ]
    }
   ],
   "source": [
    "# help\n",
    "print(norm.__doc__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "62d2a73d-59b7-48ba-97e8-48864260dfa2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'a', 'args', 'b', 'cdf', 'dist', 'entropy', 'expect', 'interval', 'isf', 'kwds', 'logcdf', 'logpdf', 'logsf', 'mean', 'median', 'moment', 'pdf', 'ppf', 'random_state', 'rvs', 'sf', 'stats', 'std', 'support', 'var']\n"
     ]
    }
   ],
   "source": [
    "# 属性 方法\n",
    "rv = norm()\n",
    "print(dir(rv))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "998603bd-7c3a-4cbd-b94b-5b425e1f90cb",
   "metadata": {},
   "source": [
    "### **主要方法**\n",
    "- rvs：随机变量 随机采样\n",
    "- pdf：概率密度函数。 （单点处的概率是0，但可以讨论区间上的概率,即pdf区间积分. pdf值表示一点处的可能性密度)\n",
    "- cdf：累积分布函数\n",
    "- sf：生存函数 (1-CDF)\n",
    "- ppf：百分点函数（CDF 的逆函数）\n",
    "- isf：逆生存函数（SF 的逆函数）\n",
    "- stats：返回均值、方差、（Fisher 的）偏度或（Fisher 的）峰度\n",
    "- moment：分布的非中心矩\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "b8c5b7fc-3c43-4fac-a799-a6eb53d3704b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(0.5)"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "norm.cdf(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "e2399ae9-0a52-492a-8141-9055c3b0ae83",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.15865525, 0.5       , 0.84134475])"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "norm.cdf([-1, 0, 1]) # 多个点的cdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "f1e21024-ff2c-4a4b-900a-23b16479e9bc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(0.0)"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "norm.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "6fcce61c-53ac-4018-ad45-39bede345406",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(1.0)"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "norm.std()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "33f80c29-8937-4367-92c0-9916987586fc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(1.0)"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "norm.var()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "1474a743-c07c-4c52-bd44-4e48c5745fa0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(0.0)"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "norm.ppf(0.5) # 寻找对应分位点"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "cf7be2f5-fc75-4cbc-b4ba-0cce22b24a69",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(0.3989422804014327)"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "norm.pdf(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d640ede7-184e-4bcc-9244-524ea41f67f7",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "6dc3c41f-c419-4bfa-93b8-bc42b1629c68",
   "metadata": {},
   "source": [
    "### 生成分布随机变量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "59d2dd2a-3e16-4aaf-a7af-824a95f1fc65",
   "metadata": {},
   "outputs": [],
   "source": [
    "from numpy.random import default_rng\n",
    "rng = default_rng()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "c4198657-d1b9-4e5a-ad53-934864c5ed01",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 1.54168013,  0.63091442,  0.63829445, -0.40356971, -1.61088559])"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "norm.rvs(size = 5, random_state= rng)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "442bc179-546d-4189-9404-85aff2b6fd69",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "a927f0c7-07c6-41e2-8585-2d9e94c85a71",
   "metadata": {},
   "source": [
    "### 移位和缩放\n",
    "使用均值和标准差。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "9d3c3cde-e7de-4ffe-bdb3-75ac3c0d4040",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(np.float64(3.0), np.float64(16.0))"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "norm.stats(loc=3, scale=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "364f9145-a0d3-43f1-b10e-54c4c4ca0a04",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "900163a5-774d-4663-a3f0-a005a07785e4",
   "metadata": {},
   "source": [
    "### 形状参数\n",
    "gamma类似分布"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "20b944e1-de64-48f0-9b94-18d928d4484b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.stats import gamma"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "fc3ada04-47bb-45b6-a96f-44dfeab56422",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<scipy.stats._distn_infrastructure.rv_continuous_frozen at 0x1f2f3719d00>"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gamma(a=1, scale=2.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5b20145f-215f-4a92-973e-d0175c1a0a98",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "a92ce97b-2597-4908-b032-582d97758cbb",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "ed2b557d-850c-4660-84df-ddc4e9a0c870",
   "metadata": {},
   "source": [
    "### zscore 标准分数，（常在正态分布）\n",
    "衡量一数值在总体中相对位置。\n",
    "\n",
    "$$\n",
    "z = \\frac{x - \\mu}{\\sigma}\n",
    "$$\n",
    "\n",
    "假设一个班的数学成绩均值是 70，标准差是 10。你考了 85 分：$z = \\frac{85 - 70}{10} = 1.5$.表示你比平均水平高了 1.5 个标准差。\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e58d0bb-46fe-48b9-bc5e-a90170894b92",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7273771e-9b7c-48c4-95f8-0bc467775d8e",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ab79c66-eeb1-494b-8c35-861ab95e960d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a5378147-12d4-4c36-8d6a-3c30ecf812ae",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.23"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}