To learn how to build more maintainable and usable Python libraries, I’ve been reading some of the most widely used Python packages. Along the way, I learned some things about Python that are off the beaten path. Here are a few things I didn’t know before.
为了学习如何构建更可维护和可用的Python库,我一直在阅读一些使用最广泛的Python包装。一路上,我学到了一些关于python的知识,这些知识是不受欢迎的道路。这是我以前不知道的一些事情。
Using super() in base classes
在基础类中使用super()
Python’s super() lets us inherit base classes (aka super or parent classes) without having to explicitly refer to the base class. It’s usually used in the __init__ method. While this might be simply a nice-to-have in single inheritance, multiple inheritance is almost impossible without super() .
Python的Super()让我们可以继承基本类(又称超级类或父级),而无需明确参考基础类。它通常用于__init__方法中。尽管这可能只是单个继承中的一个不错的选择,但如果没有超级(),多个继承几乎是不可能的。
However, one interesting use of super() is calling it in the base class. I find noticed this in requests’ BaseAdapter.
但是,超级()的一种有趣用途是在基类中调用。我在请求的baseadapter中发现了这一点。
class BaseAdapter :"""The Base Transport Adapter""" def __init__ ( self ): super (). __init__ () def send ( self , request , stream = False , timeout = None , verify = True , cert = None , proxies = None ):
baseadapter:“”“基本传输适配器”“” def __init__(self):super()。__init__()def send(self,request,stream = false,timeout = note = none,verify = true,cert = none,proxies = none):
Given that the base class doesn’t inherit from anything, why call super() in it?
鉴于基类并不能继承任何东西,为什么在其中呼叫super()?
After a bit of digging, here’s what I learned: Using super() in the base class allows for cooperative multiple inheritance. Without it, the __init__ calls of parent classes—after a non-supered class—are skipped. Here’s an example with a base class (BaseEstimator) and mixin (ServingMixin), both of which will be inherited by our DecisionTree class.
经过一番挖掘,这是我学到的:在基类中使用super()允许合作多重继承。没有它,就跳过了父母课程的__init__呼叫(在一个非诱惑的课程之后)。这是基类(LoaseStimator)和Mixin(ServingMixin)的一个示例,这两个示例都将由我们的决策室类继承。
First, we have a BaseEstimator that doesn’t call super() in its __init__ method. It has a basic __repr__ method to print attributes.
首先,我们有一个不会在其__init__方法中调用super()的supstimator。它具有打印属性的基本__repr__方法。
class BaseEstimator : def __init__ ( self , name , ** kwargs ): self . name = name def __repr__ ( self ): return f ', ' . join ( f ' { k } : { v } ' for k , v in vars ( self ). items ())
级别质测量器:def __init__(self,name,** kwargs):self。名称= name def __repr__(self):返回f','。join(f'{k}:{v}'for k,v in vars(self)。项目())
Next, we inherit BaseEstimator via the DecisionTree subclass. Everything works fine—printing the DecisionTree instance shows the attributes of BaseEstimator and DecisionTree.
接下来,我们通过DecisionTree子类继承了贝斯测试器。一切正常 - 打印决策者实例显示了贝斯台词和决策的属性。
class DecisionTree ( BaseEstimator ): def __init__ ( self , depth , ** kwargs ): super (). __init__ ( ** kwargs ) self . depth = depth dt = DecisionTree ( name = 'DT' , depth = 1 ) print ( dt ) > name : DT , depth : 1
班级决策tree(superestimator):def __init__(self,depth,** kwargs):super()。__init__(** kwargs)自我。深度=深度dt = deciestTree(name ='dt',depth = 1)print(dt)>名称:dt,depth:1
Now, let’s also inherit ServingMixin and create an instance of DecisionTree.
现在,让我们也继承服务木蛋白并创建决策者的实例。
class ServingMixin : def __init__ ( self , mode , ** kwargs ): super (). __init__ ( ** kwargs ) self . mode = mode class DecisionTree ( BaseEstimator , ServingMixin ): def __init__ ( self , depth , ** kwargs ): super (). __init__ ( ** kwargs ) self . depth = depth dt = DecisionTree ( name = 'Request Time DT' , depth = 1 , mode = 'online' ) print ( dt ) > name : Request Time DT , depth : 1 dt . mode > AttributeError : 'DecisionTree' object has no attribute 'mode'
类ServingMixin:def __init__(self,mode,** kwargs):super()。__init__(** kwargs)自我。模式= Mode类决策tree(severingMixin):def __init__(self,depth,** kwargs):super()。__init__(** kwargs)自我。深度=深度dt = deciestTree(name ='请求时间dt',depth = 1,mode ='在线')print(dt)>名称:请求时间dt,depth:1 dt。模式>attributeError:'decisionTree'对象没有属性'模式'
You’ll notice that ServingMixin isn’t inherited properly: The ServingMixin attribute (mode) doesn’t show when we print our decision tree instance and if we try to access the mode attribute, it doesn’t exist.
您会注意到,服务粘合剂未正确继承:ServingMixin属性(模式)在我们打印决策树实例时不会显示,如果我们尝试访问模式属性,则不存在。
This is because, without super() on the BaseEstimator, DecisionTree doesn’t call the next parent class in the method resolution order.
这是因为,在LoeseStimator上没有Super(),决策者在方法分辨率顺序中没有调用下一个父类。
We can fix this by calling super() in the BaseEstimator and DecisionTree works as expected.
我们可以通过在LoeseStimator中调用Super()和决策者的工作原理来解决此问题。
class BaseEstimator : def __init__ ( self , name , ** kwargs ): self . name = name super (). __init__ ( ** kwargs ) def __repr__ ( self ): return f ', ' . join ( f ' { k } : { v } ' for k , v in vars ( self ). items ()) class ServingMixin : def __init__ ( self , mode , ** kwargs ): super (). __init__ ( ** kwargs ) self . mode = mode class DecisionTree ( BaseEstimator , ServingMixin ): def __init__ ( self , depth , ** kwargs ): super (). __init__ ( ** kwargs ) self . depth = depth dt = DecisionTree ( name = 'Request Time DT' , depth = 1 , mode = 'online' ) print ( dt ) > name : Request Time DT , mode : online , depth : 1 dt . mode >'online'
级别质测量器:def __init__(self,name,** kwargs):self。名称= name super()。__init__(** kwargs)def __repr__(self):返回f','。join(f'{k}:{v}'for v in vars(self).ports(self).potem())class servingmixin:def __init__(self,mode,** kwargs):super(super()。__init__(** kwargs)自我。模式= Mode类决策tree(severingMixin):def __init__(self,depth,** kwargs):super()。__init__(** kwargs)自我。深度=深度dt = deciestTree(name ='请求时间dt',depth = 1,mode ='在线')print(dt)>名称:请求时间dt,模式:在线,深度:1 dt。模式>“在线”
And that’s why we might want to call super() in a base class.
这就是为什么我们可能想在基类中调用super()。
Further reading:
进一步阅读:
When to use a Mixin
何时使用混合物
A mixin is a class that provides method implementations for reuse by multiple child classes. It is a limited form of multiple inheritance, and is a parent class that simply provides functionality for subclasses, does not contain state, and is not intended to be instantiated. Scikit-learn uses mixins liberally where they have ClassifierMixin, TransformerMixin, OutlierMixin, etc.
混合蛋白是提供方法实现的类,以通过多个儿童类重复使用。这是多个继承的有限形式,是一个亲本类,仅为子类提供功能,不包含状态,并且不打算实例化。Scikit-Learn在具有分类,变形金刚,Outliermixin等的地方自由使用Mixins。
When should we use mixins? They are appropriate when we want to (i) provide a lot of optional features for a class and (ii) when we want to use a particular feature in a lot of different classes. Here’s an example of the former. We start with creating a basic request object in werkzeug.
我们什么时候应该使用Mixins?当我们想(i)为班级提供许多可选功能时,它们是合适的,并且(ii)想在许多不同类中使用特定功能时。这是前者的例子。我们从Werkzeug中创建基本请求对象开始。
from werkzeug import BaseRequest class Request ( BaseRequest ): pass
从工具导入BaseRequest类请求(BaseRequest):通过
If we want to add accept header support, we would update it as follows.
如果要添加接受标头支持,我们将按照以下方式进行更新。
from werkzeug import BaseRequest , AcceptMixin class Request ( AcceptMixin , BaseRequest ): pass
从工具导入BaseRequest,AcceptMixin类请求(AcceptMixin,BaseRequest):通过
Need support for user agent, authentication, etc? No problem, just add the mixins.
需要支持用户代理,身份验证等吗?没问题,只需添加混合物即可。
from werkzeug import BaseRequest , AcceptMixin , UserAgentMixin , AuthenticationMixin class Request ( AcceptMixin , UserAgentMixin , AuthenticationMixin , BaseRequest ): pass
从工具导入buserequest,acceptmixin,useragentMixin,身份验证mixin类请求(acceptmixin,useragentMixin,authentication comperiation mixin,baseRequest):通过
By having these features modularized as mixins—instead of adding them to the base class—we prevent our base class from getting bloated with features that only a few subclasses may use. In addition, these mixins can now be reused by other child classes (that may not inherit from BaseRequest).
通过将这些功能模块化为Mixins(而不是将它们添加到基类),我们可以防止我们的基类被仅使用少数子类使用的功能肿。此外,这些混合物现在可以通过其他儿童班重复使用(可能不会从BaseRequest继承)。
Further reading:
进一步阅读:
Using relative imports (almost all the time)
使用相对进口(几乎一直都有)
Relative imports ensure we search the current package (and import from it) before searching the rest of the PYTHONPATH . We use it by adding . before the package imported. Here’s an example from sklearn’s base.py .
相对导入,请确保我们在搜索PythonPath的其余部分之前搜索当前软件包(并从中导入)。我们通过添加来使用它。在输入软件包之前。这是Sklearn's Base.py的一个例子。
from .utils.validation import check_X_y from .utils.validation import check_array
来自.utils.validation导入check_x_y from .utils.validation import check_array
What happens if base.py doesn’t use relative imports? If we have a package named utils in our script’s directory, during import, Python will search our utils package instead of sklearn’s utils package, thus breaking sklearn. The . ensures sklearn’s base.py searches its own utils first.
如果base.py不使用相对导入,会发生什么?如果我们在脚本的目录中有一个名为utils的软件包,则在导入期间,Python将搜索我们的UTILS软件包,而不是Sklearn的Utils软件包,从而破坏Sklearn。这 。确保Sklearn的基础。
(That said, is there a reason not to use relative imports? Please comment below!)
(也就是说,有理由不使用相对进口吗?请在下面发表评论!)
Further reading
进一步阅读
When to add to __init__.py
什么时候添加到__init__.py
__init__.py marks directories as Python package directories. The common practice is to leave them empty. Nonetheless, many libraries I read had non-empty and sometimes long __init__.py files. This led me to dig into why we might add to __init__.py .
__init__.py将目录标记为Python包装目录。普遍的做法是将它们留空。尽管如此,我读过的许多库都有非空的,有时是__init__.py文件。这使我挖掘了为什么我们可能会添加到__init__.py中。
First, we might add imports to __init__.py when we want to refactor code that has grown into multiple modules without introducing breaking changes to existing users. Say we have a single module ( models.py ) that contains implementation for DecisionTree and Bandit . Over time, that single module grows into a models package with modules for tree and bandit . To ensure a consistent API for existing users, we might add the following to the __init__.py in the models package.
首先,当我们要重组成长为多个模块的情况下而不向现有用户引入破坏更改时,我们可能会将导入添加到__init__.py中。假设我们有一个单个模块(models.py),该模块包含决策树和强盗的实现。随着时间的流逝,该模块将成长为带有树和强盗模块的模型包。为了确保现有用户的一致API,我们可以在模型软件包中的__init__.py添加以下内容。
from .tree import DecisionTree , RandomForest from .bandit import Bandit , TSBandit
来自.tree Import decientTree,banyforest,来自.bandit导入强盗,tsbandit
This ensures existing users can continue to import via from models import DecisionTree instead of from models.tree import DecisionTree . To them, there’s no change in API and existing code doesn’t break.
这样可以确保现有用户可以从模型导入导入决策树而不是从型号导入。对他们来说,API没有变化,现有代码不会破裂。
This brings us to another reason why we might add to __init__.py —to provide a simplified API so users don’t have to dig into implementation details. Consider the example package below.
这使我们带来了另一个原因,为什么我们可以添加到__init__.py中,以提供简化的API,以便用户不必挖掘实施详细信息。考虑下面的示例软件包。
app __init__.py model_implementation.py data_implementation.py
app __init__.py model_implementation.py data_implementation.py
Instead of having users figure out what to import from model_implementation and data_implementation , we can simplify by adding to app’s __init__.py below.
我们可以通过在下面的___init__.py中添加来简化用户从model_implementation和data_implementation中进口什么。
from .model_implementation import SimpleModel from .data_implementation import SimpleDataLoader
来自.model_implementation importememodel从.data_implementation导入import simperDataloader
This states that SimpleModel and SimpleDataLoader are the only parts of app that users should use, streamlining how they use the app package (i.e., from app import SimpleModel, SimpleDataLoader ). And if they know what they’re doing and want to import directly from model_implementation , that’s doable too.
这表明SimpleModel和SimpleDataloader是用户应使用的应用程序的唯一部分,从而简化了他们如何使用应用程序包(即,来自App Import SimpleModel,SimpleDataloader)。而且,如果他们知道自己在做什么,并且想直接从model_implementation中导入,那也是可行的。
Libraries that do this include Pandas, where datatypes, readers, and the reshape API are imported in __init__.py , and Hugging Face’s Accelerate.
这样做的图书馆包括熊猫,数据类型,读取器和重塑API在__init__.py中导入,并拥抱Face的加速。
Other than what’s mentioned above, we might also want to (i) initialize a logger in the main package’s __init__.py for use across multiple modules and (ii) perform compatibility checks.
除上述内容外,我们可能还希望(i)在主包的__init__.py中初始化一个记录器,以供多个模块使用,并且(ii)执行兼容性检查。
Further reading
进一步阅读
When to use instance, class, and static methods
何时使用实例,类和静态方法
A quick recap of the various methods we can implement for a class:
快速回顾一下我们可以为类实施的各种方法:
Instance methods need a class instance and can access the instance through self
实例方法需要一个类实例,并且可以通过自我访问实例
Class methods don’t need an instance. Thus, they can’t access the instance ( self ) but have access to the class ( cls )
类方法不需要实例。因此,他们无法访问实例(自我),但可以访问类(CLS)
) but have access to the class ( ) Static methods don’t have access to self or cls . They work like regular functions but belong to the class namespace.
)但是可以访问class()静态方法无法访问自我或CLS。它们像常规功能一样工作,但属于类名称空间。
When should we use class or static methods? Here are some basic guidelines I found.
我们什么时候应该使用类或静态方法?这是我发现的一些基本准则。
We use class methods when we want to call it without creating an instance of the class. This is usually when we don’t need instance information but need class information (i.e., its other class or static methods). We might also use class methods as a constructor. The benefit of class methods is that we don’t have to hardcode the class, thus allowing subclasses to use the methods too.
当我们想在不创建类实例的情况下调用类方法时,我们会使用类方法。这通常是当我们不需要实例信息而需要类信息(即其其他类或静态方法)时。我们还可以使用类方法作为构造函数。类方法的好处是,我们不必将类列表进行编码,从而允许子类也可以使用这些方法。
We use static methods when we don’t need class or instance arguments, but the method is related to the class and it is convenient for the method to be in the class’s namespace. For example, utility methods specific to the class. By decorating a method as a static method, we improve readability and understanding by telling others that the method doesn’t depend on the class or instance.
当我们不需要类或实例参数时,我们会使用静态方法,但是该方法与类有关,并且该方法在类的名称空间中很方便。例如,实用程序方法特定于类。通过将方法作为静态方法进行装饰,我们通过告诉他人该方法不取决于类或实例来提高可读性和理解。
Further reading:
进一步阅读:
A hidden feature of conftest.py
conftest.py的隐藏功能
The common use of conftest.py is to provide fixtures for the entire directory. By defining fixtures in conftest.py , they can be used by any test in the package without having to import them. Beyond that, it’s also used to load external plugins and define hooks such as setup and teardown methods.
Conftest.py的常见用途是为整个目录提供固定装置。通过定义Conftest.py中的固定装置,可以在软件包中的任何测试中使用它们,而无需导入它们。除此之外,它还用于加载外部插件并定义钩子,例如设置和拆卸方法。
However, while browsing sklearn, I came across an empty conftest.py which had this interesting comment.
但是,在浏览Sklearn时,我遇到了一个空的conftest.py,有一个有趣的评论。
# Even if empty this file is useful so that when running from the root folder # ./sklearn is added to sys.path by pytest. See # https://docs.pytest.org/en/latest/explanation/pythonpath.html for more # details. For example, this allows to build extensions in place and run pytest # doc/modules/clustering.rst and use sklearn from the local folder rather than # the one from site-packages.
#即使空该文件很有用,因此从根文件夹运行时#/sklearn将通过pytest添加到sys.path中。请参阅#https://docs.pytest.org/en/latest/explanation/pythonpath.html有关更多#详细信息。例如,这允许在适当的位置构建扩展名,并运行pytest#doc/模块/clustering.rst,并使用本地文件夹中的sklearn,而不是从站点包装中的#。
It turns out that sklearn was taking advantage of a lesser-known feature of conftest.py : By having it in the root path, it ensures that pytest recognizes the modules without having to specify the PYTHONPATH . In the background, pytest modifies the sys.path by including all submodules found in the root path.
事实证明,Sklearn利用了conftest.py的鲜为人知的功能:通过将其放在根路径中,它可以确保Pytest识别模块而无需指定PythonPath。在背景中,Pytest通过包括根路径中的所有子模块来修改SYS.PATH。
Further reading:
进一步阅读:
Papers that explain a library’s design principles
解释图书馆设计原理的论文
Other than reading code, we can also learn by reading papers explaining a library. Let’s focus on the design principles of each library.
除了阅读代码外,我们还可以通过阅读解释库的论文来学习。让我们专注于每个库的设计原理。
Scikit-learn’s design principles include (i) consistency, where all objects share a consistent interface composed of a limited set of methods, and (ii) composition where objects are implemented via existing building blocks wherever feasible.
Scikit-Learn的设计原理包括(i)一致性,其中所有对象共享一个一致的接口,由有限的方法组成,以及(ii)组合物,在可行的地方通过现有的构件实现对象。
As a result, most machine learning models and data transformers have a fit() method. In addition, machine learning models have a predict() method and data transformers have a transform() method. This consistency and simplicity contributes to sklearn’s ease of use. The principle of composition also explains why sklearn is built on multiple inheritance of base classes and mixins.
结果,大多数机器学习模型和数据变压器具有fit()方法。此外,机器学习模型具有预测()方法,并且数据变压器具有转换()方法。这种一致性和简单性有助于Sklearn的易用性。构图的原理还解释了为什么Sklearn建立在基础类和混合物的多个继承上。
Another example is fastai which uses a layered approach. It provides a high-level API that provides ready-to-use functionality to train models for various applications. The high-level API is built on a hierarchy of lower-level APIs which provide composable building blocks. This layered approach enables one to quickly build a prototype before customizing by tweaking the middle-layer APIs.
另一个示例是使用分层方法的Fastai。它提供了一个高级API,可为各种应用程序培训模型提供现成的功能。高级API建立在低级API的层次结构上,该层次是提供可组合的构件。这种分层方法使人们能够通过调整中层API在自定义之前快速构建原型。
PyTorch also shared its design principles such as (i) provide pragmatic performance and (ii) worse is better. The former states that, to be useful, a library needs to deliver compelling performance but not at the expense of ease of use. Thus, PyTorch is willing to trade off 10% speed, but not 100% speed, for a significantly simpler to use library. The latter states that it’s better to have a simple but slightly incomplete solution than a comprehensive but hard-to-maintain design.
Pytorch还分享了其设计原则,例如(i)提供务实的性能,并且(ii)更糟。前者指出,要有用,图书馆需要提供令人信服的性能,但不能以易用性为代价。因此,Pytorch愿意以10%的速度(而不是100%)进行交易,以使使用库更简单。后者指出,与全面但难以维护的设计相比,拥有简单但稍微不完整的解决方案更好。
• • •
••••
Those are some of the uncommon usages of Python I’ve learned while reading several libraries such as requests, flask, fastapi, scikit-learn, pytorch, fastai, pydantic, and django. I’m sure I only scratched the surface—did I miss anything? Please comment below!
这些是我在阅读多个库时学到的python的一些罕见用法,例如请求,烧瓶,fastapi,scikit-learn,pytorch,fastai,pydantic和django。我敢肯定我只抓挠表面 - 我想念什么吗?请在下面评论!
If you found this useful, please cite this write-up as:
如果您发现这有用,请引用此文章为:
Yan, Ziyou. (Jul 2022). Uncommon Uses of Python in Commonly Used Libraries. eugeneyan.com. https://eugeneyan.com/writing/uncommon-python/.
Yan,Ziyou。(2022年7月)。在常用库中不常见的使用Python。eugeneyan.com。https://eugeneyan.com/writing/uncommon-python/。
or
或者
@article{yan2022python, title = {Uncommon Uses of Python in Commonly Used Libraries}, author = {Yan, Ziyou}, journal = {eugeneyan.com}, year = {2022}, month = {Jul}, url = {https://eugeneyan.com/writing/uncommon-python/} }
@Article {yan20222python,title = {不常见的python在常用的库中},作者= {yan,ziyou},journal = {eugeneyan.com},年= {2022},月= {2022},月= {jul},url},url = {jull = {
Share on:
分享: