How Django Works (4) URL Resolution

Why We Need URL Resolution

Theoretically, we could write a single function to process all incoming HTTP requests. It would work well for very simple websites. For any real websites, it would be much more appropriate to categorize HTTP requests based on their URLs and process them separately in different handlers.

For example, BASEURL/polls/ would be an application for polls whereas BASEURL/blogs/ would be an application for blogs.

This way, different URLs can be considered different logic applications, so we can think of websites in terms of logic applications – a higher level of abstraction that HTTP request/response pairs.

Fortunately, many modern web frameworks come default with mechanisms that do the categorization of HTTP requests based on URIs, which spares us the effort of having to reinvent the wheel.

 

Django's URL Resolution

The official Django document has following description about how Django processes a request. https://docs.djangoproject.com/en/1.8/topics/http/urls/

When a user requests a page from your Django-powered site, this is the algorithm the system follows to determine which Python code to execute:

1. Django determines the root URLconf module to use. Ordinarily, this is the value of the ROOT_URLCONF setting, but if the incoming HttpRequest object has an attribute called urlconf (set by middleware request processing), its value will be used in place of the ROOT_URLCONF setting.

2. Django loads that Python module and looks for the variable urlpatterns. This should be a Python list of django.conf.urls.url() instances.

3. Django runs through each URL pattern, in order, and stops at the first one that matches the requested URL.

4. Once one of the regexes matches, Django imports and calls the given view, which is a simple Python function (or a class based view). The view gets passed the following arguments:
  • An instance of HttpRequest.
  • If the matched regular expression returned no named groups, then the matches from the regular expression are provided as positional arguments.
  • The keyword arguments are made up of any named groups matched by the regular expression, overridden by any arguments specified in the optional kwargs argument to django.conf.urls.url().
5. If no regex matches, or if an exception is raised during any point in this process, Django invokes an appropriate error-handling view. See Error handling below.

 

Main Processing Routine get_response

get_response (site-packages/django/core/handlers/base.py) is where WSGIRequest objects are read and WSGIResponse objects are produced. As is discussed in the first blog of this series, get_response is invoked by WSGIHandler.__call__.



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def get_response(self, request):
        "Returns an HttpResponse object for the given HttpRequest"

        # Setup default url resolver for this thread, this code is outside
        # the try/except so we don't get a spurious "unbound local
        # variable" exception in the event an exception is raised before
        # resolver is set
        urlconf = settings.ROOT_URLCONF
        urlresolvers.set_urlconf(urlconf)
        resolver = urlresolvers.RegexURLResolver(r'^/', urlconf)
        try:
            response = None
            # Apply request middleware
            for middleware_method in self._request_middleware:
                response = middleware_method(request)
                if response:
                    break

            if response is None:
                if hasattr(request, 'urlconf'):
                    # Reset url resolver with a custom urlconf.
                    urlconf = request.urlconf
                    urlresolvers.set_urlconf(urlconf)
                    resolver = urlresolvers.RegexURLResolver(r'^/', urlconf)

                resolver_match = resolver.resolve(request.path_info)
                callback, callback_args, callback_kwargs = resolver_match
                request.resolver_match = resolver_match

                # Apply view middleware
                for middleware_method in self._view_middleware:
                    response = middleware_method(request, callback, callback_args, callback_kwargs)
                    if response:
                        break

Highlighted are the relevant lines of code of URL Resolution.

Internal Data Structure



The basic unit of URL Resolution is RegexURLResolver objects, which are organised in a tree-like data structure.

Below is the URL Resolution tree for the django tutorial polls application.

<RegexURLResolver 'mysite.urls' (None:None) ^/> 


     <RegexURLResolver <module 'polls.urls' from '/home/flyingwangcai/Django/djangotutorial/mysite/polls/urls.py'> (None:polls) ^polls/>

         
         <RegexURLPattern index ^$>

         <RegexURLPattern detail ^(?P<pk>[0-9]+)/$>

         <RegexURLPattern results ^(?P<pk>[0-9]+)/results/$>,

         <RegexURLPattern vote ^(?P<question_id>[0-9]+)/vote/$>


     
     <RegexURLResolver <RegexURLPattern list> (admin:admin) ^admin/>

         
         <RegexURLPattern index ^$>

         <RegexURLPattern login ^login/$>

         <RegexURLPattern logout ^logout/$>

         <RegexURLPattern password_change^password_change/$>

         <RegexURLPattern password_change_done ^password_change/done/$>

         <RegexURLPattern jsi18n ^jsi18n/$>

         <RegexURLPattern view_on_site ^r/(?P<content_type_id>\d+)/(?P<object_id>.+)/$>

         <RegexURLResolver <RegexURLPattern list> (None:None)^auth/group/>

         <RegexURLResolver <RegexURLPattern list> (None:None) ^polls/question/>

         <RegexURLResolver <RegexURLPattern list> (None:None) ^auth/user/>

         <RegexURLPattern app_list ^(?P<app_label>auth|polls)/$>]


The leaves of the tree must be RegexURLPattern objects. Non-leaf nodes are RegexURLResolver objects. The children of a RegexURLResolver object are accessed through property url_patterns.

The tree above corresponds to two urls files below.

mysite/urls.py

urlpatterns = [
    url(r'^polls/', include('polls.urls', namespace='polls')),
    url(r'^admin/', include(admin.site.urls)),
]
 
polls/urls.py 

urlpatterns = [
    # /polls/
    url(r'^$', views.IndexView.as_view(), name='index'),
    # /polls/5/
    url(r'^(?P<pk>[0-9]+)/$', views.DetailView.as_view(), name='detail'),
    # /polls/5/results/
    url(r'^(?P<pk>[0-9]+)/results/$', views.ResultsView.as_view(), name='results'),
    # /polls/5/vote/
    url(r'^(?P<question_id>[0-9]+)/vote/$', views.vote, name='vote'),
] 


There are three different kinds of RegURLResolver objects depending on the type of self.urlconf_name

1. self.urlconf_name is a string (e.g., resolver = urlresolvers.RegexURLResolver(r'^/', urlconf)), This only happens with root node of the tree;

2. self.urlconf_name is a module;

   url(r'^polls/', include('polls.urls', namespace='polls'))

3. self.urlconf_name is a list.
    url(r'^admin/', include(admin.site.urls))

Both url and include are functions in site-packages/django/conf/urls/__init__.py.
They got exported using __all__.

 

Float Chart of url function and include function

 

 


The outer circle represents the invocation of url function whereas the inner circle depicts the invocation of include function.

url function returns a RegexURLResolver object or a RegexURLPattern object depending on the second argument -- view.

When include function is used, its return value serves as view argument, url function creates a RegexURLResolver object.
 
A RegexURLPattern will be generated, indicating that this is one of the "final" matches -- a match that mapps to a function or class view (e.g.,url(r'^(?P<pk>[0-9]+)/$', views.DetailView.as_view(), name='detail'))

include function returns a tuple (urlconf_module, app_name, namespace). If the first argument is of string type, include function tries to load module by the name of the first argument, otherwise, the first argument has to be an list of RegexURLPattern objects and/or RegexURLResolver objects.

Because urlconf_module is either a module with urlpatterns defined or a list, urlpatterns can be accessed with getattr with an default argument.

patterns = getattr(urlconf_module, 'urlpatterns', urlconf_module)

 
 

Popular posts from this blog

LeetCode 68 Text Justification (C++, Python)

Python Class Method and Class Only Method