• My thoughts on Python vs Java

    After working in both Python and Java for a while, I want to share my thoughts on the two languages.

    Popularity

    My current project has a REST API to let users query prices data, our users are mostly big corporations, such as energy companies, financial institutions. To help users getting started with the API, we also provide few code samples, in both Python (our primary language) and Java. We added Java because we think it’s more popular in big companies.

    In the last 2 years, we received many questions about the code sample in Python, but 0 question about the Java sample. why? one reason cloud be the Java sample is perfect and everybody understands it well. but after checking the users email signatures, I found that most of the users are not developers, they’re analysts, traders, they just want to copy some simple code and run it.

    in terms of popularity in non-developers, for sure Python is the much more popular, IMO:

    • if I’m going to start a new project which involves business users, probably I should start with Python.
    • if I need to release an SDK related to data, I probably should start with Python.

    Delivery Speed

    If I need to build some PoC projects quickly, I’ll start with a python script. yes, just a script. why bother with a Java project? The language is simple and easy to communicate with other non-technical users, and most importantly, there are so many open source libraries and modules available.

    Bigger projects? more developers?

    What happen if the PoC script went well? project getting bigger, more developer joining? we need to be split the big project to smaller modules (but not small projects yet to save the time to release the internal packages). what happen if I want to have a common module? currently, we’re using “Poetry”:

    To depend on a library located in a local directory or file, you can use the path property: my-package = { path = “../my-package/”, develop = false }

    This works for simple projects, but for a project with different module structures, some workarounds are needed, such as creating a symbolic link. but in Java side, “Maven” supports this perfectly.

    with more developers onboard, more runtime exceptions may happen due to the nature of dynamic language, Java is a much safer language to use. the compiler reduces many runtime errors that cloud happen in Python.

    In short, Java is more enterprise-ready than Python, but I believe python is catching up.

    Ecosystem, supply chain

    “Spring” is the most popular Java framework, backed by a listed company. it almost has everything, for some java developers, they can’t survive without Spring. However, in python side, libraries are less commercialized, which means you may need to raise a pull request to fix your own issue.

  • Pull requests should be treated as database transaction: all kinds of changes should be included

    A few weeks ago, I received a request to update the pricing logic for certain products. I made the code change, a silly example:

    def get_price(product: Product) -> Decimal:
        if product.pricing_stragety == "10-percent-off":
            return product.price * Decimal("0.9")
        else:
            return product.price
    

    of course, I also have unit tests to cover the change, everything is fine, so I pushed to production and told my business users that everything is sorted out. I know I also need to update the product configs in database, but I think I can do it manually right after the release.

    but I didn’t, before I get back to the “manual” change in db, a production issue reported: price is not discounted, customers are not happy. then I spent hours to fix all the impacted orders.

    I cloud easily avoid this issue by adding a db migration script into my pull request. the lesson I learned is to treat a pull request as a database transaction: a pull request should contain all changes: code, data, infra etc,.

  • Python Decorator

    In my first few weeks with Python, I was shocked that I cloud pass a function around as a parameter, for example:

    def foo():
        pass
    synchronized_foo = synchronized(lock)(foo)
    synchronized_foo()
    

    and there’s a better version with decorator:

    @synchronized(lock)
    def foo():
        pass
    

    Since I come from Java world, I immediately linked this with AOP in Java. but decorator seems so light and easy to use. as described in PEP 318 – Decorators for Functions and Methods

    What?! Decorator on a decorator

    def covert_to_upper_case(f):
        """
        A simple decorator to covert return string upper case.
        """
        def uppercase(*args, **kwargs):
            print("upper stats....")
            r = f(*args, **kwargs)
            return r.upper()
        return uppercase
    
    
    def add_prefix(f):
        """
        A simple decorator to add a prefix to return value
        """
        def pre(*args, **kwargs):
            r = f(*args, **kwargs)
            return f"[prefix] {r}"
        return pre
    
    
    def add_prefix_and_covert_to_upper(f):
        """
        A combination of `covert_to_upper_case` and `add_prefix`
        """
        @covert_to_upper_case
        @add_prefix
        def covert(*args, **kwargs):
            r = f(*args, **kwargs)
            return r
        # also work:
        # covert = add_prefix(covert)
        # covert = covert_to_upper_case(covert)
        return covert
    
    
    # @add_prefix
    # @covert_to_upper_case
    @add_prefix_and_covert_to_upper
    def hello():
        return "Python"
    
    
    print(f"output: {hello()}")
    

    In the above example: @add_prefix = add_prefix(f), @add_prefix_and_covert_to_upper = covert_to_upper_case(add_prefix(f))

    in a debugger: hello is:

    <function covert_to_upper_case.<locals>.uppercase at 0x10e8da200>
    

    hello can still be a hello if ` @wraps(f)` is added in the decorator, e.g.:

    def covert_to_upper_case(f):
         @wraps(f)
        def uppercase(*args, **kwargs):
            print("upper stats....")
            r = f(*args, **kwargs)
            return r.upper()
        return uppercase
    

    Hello is <function hello at 0x10a3bd200> now!

    @wraps is a decorator to:

    Update a wrapper function to look like the wrapped function

    What about context manager as Decorator?

    contextlib.ContextDecorator

    A base class that enables a context manager to also be used as a decorator. Context managers inheriting from ContextDecorator have to implement enter and exit as normal exit retains its optional exception handling even when used as a decorator.

    How does it work?

    contextlib.ContextDecorator:

    def __call__(self, func):
        @wraps(func)
        def inner(*args, **kwds):
            with self._recreate_cm():
                return func(*args, **kwds)
        return inner
    

    so that a context manager can be used in both way:

    @mycontext()
    def function():
        print('The bit in the middle')
    
    # or:
    with mycontext():
        print('The bit in the middle')
    

    What about adding more arguments?

    example:

    def async_task(name: str):
        def decorator(f):
            @wraps(f)
            def wrapper(*args, **kwargs):
                submit_task(target=f, args=args, kwargs=kwargs, name=name)
                print(f"{name} task submitted")
            return wrapper
        return decorator
    
    
    @async_task("my_task")
    def my_task():
        pass
    

    Summary

    • a decorator in Python is a function that takes a function as a parameter
    • add @wraps() to keep the function signature unchanged
  • GraphQL Server-side Journey with Python

    After playing around with few Python GraphQL libraries for a few weeks, I realized that a good GQL python lib should: - be less invasive, work on top of the existing stack (FastAPI/starlette), reuse as much code as possible(Pydantic) - generate GQL schema from python code, ideally from built-in types and Pydantic types - supports Subscriptions out of the box

    Currently, I’m happy with Ariadne in a code-first approach. This post tracks the journey with issues we found and the workarounds/solutions.

    Graphene

    Both graphql.org and fastapi point to https://graphene-python.org/, so we get started with it. 

    as you may or may not know, GraphQL has a concept called “Schema”,  Graphene took “a code-first approach”, which is cool:  

     Instead of writing GraphQL Schema Definition Language (SDL), we write Python code to describe the data provided by your server.

    ## Hello world works weel, but it’s too verbose

    import graphene
    
    class Query(graphene.ObjectType):
      hello = graphene.String(name=graphene.String(default_value="World"))
    
      def resolve_hello(self, info, name):
        return 'Hello ' + name
      
    
    schema = graphene.Schema(query=Query)
    result = schema.execute('{ hello }')
    print(result.data['hello']) # "Hello World"
    

    Looks simple yet still complex.  there’re too many graphenes, why I need to learn another typing system?  which can be done by the framework,  what about this one? 

    # hello = graphene.String(name=graphene.String(default_value="World"))
    hello: str = "World" 
    

    Reuse Pydantic types with graphene-pydantic

    Since we’re using Pydantic,  which has all the typing details, why not simply use Pydantic?! https://github.com/graphql-python/graphene-pydantic is exactly what we need! but even with graphene-pydantic an adaptor layer is required between Pydantic and Graphene, e.g.: 

    
    class PersonInput(PydanticInputObjectType):
        class Meta:
            model = PersonModel
            # exclude specified fields
            exclude_fields = ("id",)
    
    class CreatePerson(graphene.Mutation):
        class Arguments:
            person = PersonInput()
        # more code trimmed
    

    Still very verbose, but much better than the original one. 

    ## Subscriptions is not well supported yet

    The document is super confusing: https://docs.graphene-python.org/projects/django/en/latest/subscriptions/:

    To implement websocket-based support for GraphQL subscriptions, you’ll need to do the following: Install and configure django-channels. Install and configure* a third-party module for adding subscription support over websockets. A few options include: graphql-python/graphql-ws datavance/django-channels-graphql-ws jaydenwindle/graphene-subscriptions Ensure that your application (or at least your GraphQL endpoint) is being served via an ASGI protocol server like daphne (built in to django-channels), uvicorn, or hypercorn.

    • Note: By default, the GraphiQL interface that comes with graphene-django assumes that you are handling subscriptions at the same path as any other operation (i.e., you configured both urls.py and routing.py to handle GraphQL operations at the same path, like /graphql).

    what? why Django gets mentioned? I’m not interested and I’m lost!

    Maybe it’s time to move on.

    Ariadne

    This is from the Graphene’s “Getting started”:

     Compare Graphene’s code-first approach to building a GraphQL API with schema-first approaches like Apollo Server (JavaScript) or Ariadne (Python). Instead of writing GraphQL Schema Definition Language (SDL), we write Python code to describe the data provided by your server.

    Yeah, schema-first is not cool, but Ariadne’s documetation looks much better than Graphene.

    Subscriptions, it just works

    After the experience with Graphene, the first feature I check was subscriptions:   https://ariadnegraphql.org/docs/subscriptionsit’s simple and it just works!  the documentation is clean and no django mentioned at all!

    import asyncio
    from ariadne import SubscriptionType, make_executable_schema
    from ariadne.asgi import GraphQL
    
    type_def = """
        type Query {
            _unused: Boolean
        }
    
        type Subscription {
            counter: Int!
        }
    """
    
    subscription = SubscriptionType()
    
    @subscription.source("counter")
    async def counter_generator(obj, info):
        for i in range(5):
            await asyncio.sleep(1)
            yield i
    
    
    @subscription.field("counter")
    def counter_resolver(count, info):
        return count + 1
    
    
    schema = make_executable_schema(type_def, subscription)
    app = GraphQL(schema, debug=True)
    

    Schema first? it doesn’t have to be

    What if I change the counter_generator to return str? I need to update the Schema. if I forgot that, I’m lying to my users.  I hate it. 

    In the above example, type_def is kind fo the duplication of the method counter_generator (if we add return type) like:

    async def counter_generator(obj, info) -> int
    

    the Schema looks reasonably easy to generate, why cannot we generate a schema from python code? especially with Pydantic? if we define a method with proper tying, we cloud to generate the Schema easily:

    
    class HelloMessage(BaseModel):
        body: str
        from_user: UUID
    
    
    query = QueryType()
    
    
    @query.field('hello')
    def resolve_hello(_, info) -> HelloMessage:
        request = info.context['request']
        user_agent = request.headers.get('user-agent', 'guest')
        return HelloMessage(
            body='Hello, %s!' % user_agent,
            from_user=uuid4(),
        )
    
    
    # Generate type_defs from Pydantic types in the query definition.
    type_defs = generate_gql_schema_str([query])
    
    schema = make_executable_schema(
        type_defs, query, snake_case_fallback_resolvers,
    )
    app = GraphQL(schema, debug=True)
    

    the details could be found here: https://github.com/gary-liguoliang/ariadne-pydantic/blob/master/example/main.py

    With a small schema generation utility,  we managed to run Ariadne in a code-first approach

    • Code is much simpler than the original version of both Ariadne and Graphene 
    • Resuing Pydantic typing 
    • the GQL query definition method is very simple, take input, forward to the core application, return the output. 
  • Speed Up Your Django Tests

    I read the book “Speed Up Your Django Tests” this week, a few interesting items: 

    Background/disclaimer: I’m new to Django,  I use pytest to run many integration Django tests. so the points listed here are purely from my point of view.

    1. Override settings: with @override_settingsin case you want to override a setting for a test method, Django provides the override_settings() decorator (see PEP 318).
    2. Show slow tests with pytest --durations 10
    3. Tests marker, categorize/tag tests so that can run different subsets.  like JUnit categories for more details: https://docs.pytest.org/en/latest/example/markers.html
    4. Reduce pytest test collection by setting norecursedirs
    5. Run in parallel with pytest-xdist
    6. Django’s RequestFactory: This is similar to the test client, but instead of making requests, “provides a way to generate a request instance that can be used as the first argument to any view” Django Doc
    7. Django’s SimpleTestCase:  a subclass of unittest.TestCase, it “disallows database queries by default.”,  however, you till can turn it on.
    8. Avoid Fixture Files[11.1],  “For data you need in individual tests, you’re better off creating it in the test case or test method.” I have to see it’s very easy to set up test data with fixtures, but shortly it becomes unmanageable few valid points: 

      Fixture ˉles are separate from the tests that use them. This makes it hard to determine which tests use which objects. The ˉles tend to become “append-only,”…when a new test needs a new object, it tends to be added to an existing file…if there’s some data that most of your application depends on, using a fixture, causes unnecessary reloading. It will be loaded and then rolled back for each test case, even when the next test case needs the exact same data.

    Overall, I would say it’s a good Django testing book for newbies like me, the book also covers many other topics, such as “Profiling”, “Mocking” etc, and many topics and links for me to explore Django. overall, I would say it’s a good Django testing book for newbies like me.

    However, slow tests generally indicate design issues. all the techniques mentioned in the book definitely can help to speed up the testing(itself), if we take one steps further, should we start thinking about the design?

    Abstraction

    from: Architecture Patterns with Python

    if we cloud fundamentally resolve some design issues, I believe we’ll get much fewer integration tests.


subscribe via RSS