{ "cells": [ { "cell_type": "markdown", "id": "a6b51ad5", "metadata": {}, "source": [ "# scipy" ] }, { "cell_type": "markdown", "id": "e1879c72", "metadata": {}, "source": [ "## 稀疏矩阵" ] }, { "cell_type": "code", "execution_count": 1, "id": "dea1d311-9fb8-439f-a651-9fb142053dc3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 0 3 0 0]\n", " [0 0 0 0 0]\n", " [4 0 0 5 0]\n", " [0 0 0 0 0]\n", " [0 0 6 0 0]]\n", "[0 2 2 4]\n", "[2 0 3 2]\n", "[3 4 5 6]\n" ] } ], "source": [ "import numpy as np\n", "from scipy import sparse\n", "\n", "# 1️⃣ 记录非零元素的行索引\n", "row = np.array([0, 2, 2, 4]) # 3 在第0行,4 在第2行,5 在第2行,6 在第4行\n", "\n", "# 2️⃣ 记录非零元素的列索引\n", "col = np.array([2, 0, 3, 2]) # 3 在第2列,4 在第0列,5 在第3列,6 在第2列\n", "\n", "# 3️⃣ 记录非零元素的值\n", "data = np.array([3, 4, 5, 6]) # 这些索引位置对应的值\n", "\n", "# 4️⃣ 生成 5x5 的稀疏矩阵\n", "matrix = sparse.coo_matrix((data, (row, col)), shape=(5, 5))\n", "\n", "# 打印矩阵\n", "print(matrix.toarray()) # 转换回普通的二维数组看下\n", "print(matrix.row)\n", "print(matrix.col)\n", "print(matrix.data)\n" ] }, { "cell_type": "markdown", "id": "348b3575-306b-4879-9729-4ad4e83b321d", "metadata": {}, "source": [ "## 概率分布\n", "norm为例" ] }, { "cell_type": "code", "execution_count": 8, "id": "c8a0b5fa-0ac6-4b9b-8a6e-146857d15a36", "metadata": {}, "outputs": [], "source": [ "from scipy.stats import norm" ] }, { "cell_type": "code", "execution_count": 9, "id": "e3442ee7-e388-4a22-b76e-9acc82e8389f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A normal continuous random variable.\n", "\n", " The location (``loc``) keyword specifies the mean.\n", " The scale (``scale``) keyword specifies the standard deviation.\n", "\n", " As an instance of the `rv_continuous` class, `norm` object inherits from it\n", " a collection of generic methods (see below for the full list),\n", " and completes them with details specific for this particular distribution.\n", " \n", " Methods\n", " -------\n", " rvs(loc=0, scale=1, size=1, random_state=None)\n", " Random variates.\n", " pdf(x, loc=0, scale=1)\n", " Probability density function.\n", " logpdf(x, loc=0, scale=1)\n", " Log of the probability density function.\n", " cdf(x, loc=0, scale=1)\n", " Cumulative distribution function.\n", " logcdf(x, loc=0, scale=1)\n", " Log of the cumulative distribution function.\n", " sf(x, loc=0, scale=1)\n", " Survival function (also defined as ``1 - cdf``, but `sf` is sometimes more accurate).\n", " logsf(x, loc=0, scale=1)\n", " Log of the survival function.\n", " ppf(q, loc=0, scale=1)\n", " Percent point function (inverse of ``cdf`` --- percentiles).\n", " isf(q, loc=0, scale=1)\n", " Inverse survival function (inverse of ``sf``).\n", " moment(order, loc=0, scale=1)\n", " Non-central moment of the specified order.\n", " stats(loc=0, scale=1, moments='mv')\n", " Mean('m'), variance('v'), skew('s'), and/or kurtosis('k').\n", " entropy(loc=0, scale=1)\n", " (Differential) entropy of the RV.\n", " fit(data)\n", " Parameter estimates for generic data.\n", " See `scipy.stats.rv_continuous.fit `__ for detailed documentation of the\n", " keyword arguments.\n", " expect(func, args=(), loc=0, scale=1, lb=None, ub=None, conditional=False, **kwds)\n", " Expected value of a function (of one argument) with respect to the distribution.\n", " median(loc=0, scale=1)\n", " Median of the distribution.\n", " mean(loc=0, scale=1)\n", " Mean of the distribution.\n", " var(loc=0, scale=1)\n", " Variance of the distribution.\n", " std(loc=0, scale=1)\n", " Standard deviation of the distribution.\n", " interval(confidence, loc=0, scale=1)\n", " Confidence interval with equal areas around the median.\n", "\n", " Notes\n", " -----\n", " The probability density function for `norm` is:\n", "\n", " .. math::\n", "\n", " f(x) = \\frac{\\exp(-x^2/2)}{\\sqrt{2\\pi}}\n", "\n", " for a real number :math:`x`.\n", "\n", " The probability density above is defined in the \"standardized\" form. To shift\n", " and/or scale the distribution use the ``loc`` and ``scale`` parameters.\n", " Specifically, ``norm.pdf(x, loc, scale)`` is identically\n", " equivalent to ``norm.pdf(y) / scale`` with\n", " ``y = (x - loc) / scale``. Note that shifting the location of a distribution\n", " does not make it a \"noncentral\" distribution; noncentral generalizations of\n", " some distributions are available in separate classes.\n", "\n", " Examples\n", " --------\n", " >>> import numpy as np\n", " >>> from scipy.stats import norm\n", " >>> import matplotlib.pyplot as plt\n", " >>> fig, ax = plt.subplots(1, 1)\n", " \n", " Calculate the first four moments:\n", " \n", " \n", " >>> mean, var, skew, kurt = norm.stats(moments='mvsk')\n", " \n", " Display the probability density function (``pdf``):\n", " \n", " >>> x = np.linspace(norm.ppf(0.01),\n", " ... norm.ppf(0.99), 100)\n", " >>> ax.plot(x, norm.pdf(x),\n", " ... 'r-', lw=5, alpha=0.6, label='norm pdf')\n", " \n", " Alternatively, the distribution object can be called (as a function)\n", " to fix the shape, location and scale parameters. This returns a \"frozen\"\n", " RV object holding the given parameters fixed.\n", " \n", " Freeze the distribution and display the frozen ``pdf``:\n", " \n", " >>> rv = norm()\n", " >>> ax.plot(x, rv.pdf(x), 'k-', lw=2, label='frozen pdf')\n", " \n", " Check accuracy of ``cdf`` and ``ppf``:\n", " \n", " >>> vals = norm.ppf([0.001, 0.5, 0.999])\n", " >>> np.allclose([0.001, 0.5, 0.999], norm.cdf(vals))\n", " True\n", " \n", " Generate random numbers:\n", " \n", " >>> r = norm.rvs(size=1000)\n", " \n", " And compare the histogram:\n", " \n", " >>> ax.hist(r, density=True, bins='auto', histtype='stepfilled', alpha=0.2)\n", " >>> ax.set_xlim([x[0], x[-1]])\n", " >>> ax.legend(loc='best', frameon=False)\n", " >>> plt.show()\n", " \n", "\n", " \n" ] } ], "source": [ "# help\n", "print(norm.__doc__)" ] }, { "cell_type": "code", "execution_count": 10, "id": "62d2a73d-59b7-48ba-97e8-48864260dfa2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'a', 'args', 'b', 'cdf', 'dist', 'entropy', 'expect', 'interval', 'isf', 'kwds', 'logcdf', 'logpdf', 'logsf', 'mean', 'median', 'moment', 'pdf', 'ppf', 'random_state', 'rvs', 'sf', 'stats', 'std', 'support', 'var']\n" ] } ], "source": [ "# 属性 方法\n", "rv = norm()\n", "print(dir(rv))" ] }, { "cell_type": "markdown", "id": "998603bd-7c3a-4cbd-b94b-5b425e1f90cb", "metadata": {}, "source": [ "### **主要方法**\n", "- rvs:随机变量 随机采样\n", "- pdf:概率密度函数。 (单点处的概率是0,但可以讨论区间上的概率,即pdf区间积分. pdf值表示一点处的可能性密度)\n", "- cdf:累积分布函数\n", "- sf:生存函数 (1-CDF)\n", "- ppf:百分点函数(CDF 的逆函数)\n", "- isf:逆生存函数(SF 的逆函数)\n", "- stats:返回均值、方差、(Fisher 的)偏度或(Fisher 的)峰度\n", "- moment:分布的非中心矩\n" ] }, { "cell_type": "code", "execution_count": 49, "id": "b8c5b7fc-3c43-4fac-a799-a6eb53d3704b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(0.5)" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "norm.cdf(0)" ] }, { "cell_type": "code", "execution_count": 36, "id": "e2399ae9-0a52-492a-8141-9055c3b0ae83", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.15865525, 0.5 , 0.84134475])" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "norm.cdf([-1, 0, 1]) # 多个点的cdf" ] }, { "cell_type": "code", "execution_count": 37, "id": "f1e21024-ff2c-4a4b-900a-23b16479e9bc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(0.0)" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "norm.mean()" ] }, { "cell_type": "code", "execution_count": 38, "id": "6fcce61c-53ac-4018-ad45-39bede345406", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(1.0)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "norm.std()" ] }, { "cell_type": "code", "execution_count": 39, "id": "33f80c29-8937-4367-92c0-9916987586fc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(1.0)" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "norm.var()" ] }, { "cell_type": "code", "execution_count": 40, "id": "1474a743-c07c-4c52-bd44-4e48c5745fa0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(0.0)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "norm.ppf(0.5) # 寻找对应分位点" ] }, { "cell_type": "code", "execution_count": 41, "id": "cf7be2f5-fc75-4cbc-b4ba-0cce22b24a69", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "np.float64(0.3989422804014327)" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "norm.pdf(0)" ] }, { "cell_type": "code", "execution_count": null, "id": "d640ede7-184e-4bcc-9244-524ea41f67f7", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "6dc3c41f-c419-4bfa-93b8-bc42b1629c68", "metadata": {}, "source": [ "### 生成分布随机变量" ] }, { "cell_type": "code", "execution_count": 42, "id": "59d2dd2a-3e16-4aaf-a7af-824a95f1fc65", "metadata": {}, "outputs": [], "source": [ "from numpy.random import default_rng\n", "rng = default_rng()" ] }, { "cell_type": "code", "execution_count": 43, "id": "c4198657-d1b9-4e5a-ad53-934864c5ed01", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1.54168013, 0.63091442, 0.63829445, -0.40356971, -1.61088559])" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "norm.rvs(size = 5, random_state= rng)" ] }, { "cell_type": "code", "execution_count": null, "id": "442bc179-546d-4189-9404-85aff2b6fd69", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "a927f0c7-07c6-41e2-8585-2d9e94c85a71", "metadata": {}, "source": [ "### 移位和缩放\n", "使用均值和标准差。" ] }, { "cell_type": "code", "execution_count": 44, "id": "9d3c3cde-e7de-4ffe-bdb3-75ac3c0d4040", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(np.float64(3.0), np.float64(16.0))" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "norm.stats(loc=3, scale=4)" ] }, { "cell_type": "code", "execution_count": null, "id": "364f9145-a0d3-43f1-b10e-54c4c4ca0a04", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "900163a5-774d-4663-a3f0-a005a07785e4", "metadata": {}, "source": [ "### 形状参数\n", "gamma类似分布" ] }, { "cell_type": "code", "execution_count": 45, "id": "20b944e1-de64-48f0-9b94-18d928d4484b", "metadata": {}, "outputs": [], "source": [ "from scipy.stats import gamma" ] }, { "cell_type": "code", "execution_count": 46, "id": "fc3ada04-47bb-45b6-a96f-44dfeab56422", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gamma(a=1, scale=2.)" ] }, { "cell_type": "code", "execution_count": null, "id": "5b20145f-215f-4a92-973e-d0175c1a0a98", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "a92ce97b-2597-4908-b032-582d97758cbb", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "ed2b557d-850c-4660-84df-ddc4e9a0c870", "metadata": {}, "source": [ "### zscore 标准分数,(常在正态分布)\n", "衡量一数值在总体中相对位置。\n", "\n", "$$\n", "z = \\frac{x - \\mu}{\\sigma}\n", "$$\n", "\n", "假设一个班的数学成绩均值是 70,标准差是 10。你考了 85 分:$z = \\frac{85 - 70}{10} = 1.5$.表示你比平均水平高了 1.5 个标准差。\n", "\n" ] }, { "cell_type": "markdown", "id": "4e58d0bb-46fe-48b9-bc5e-a90170894b92", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": null, "id": "7273771e-9b7c-48c4-95f8-0bc467775d8e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "0ab79c66-eeb1-494b-8c35-861ab95e960d", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "a5378147-12d4-4c36-8d6a-3c30ecf812ae", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.23" } }, "nbformat": 4, "nbformat_minor": 5 }